Compressive ghost imaging through scattering media with deep learning.

Imaging through scattering media is challenging since the signal to noise ratio (SNR) of the reflection can be heavily reduced by scatterers. Single-pixel detectors (SPD) with high sensitivities offer compelling advantages for sensing such weak signals. In this paper, we focus on the use of ghost imaging to resolve 2D spatial information using just an SPD. We prototype a polarimetric ghost imaging system that suppresses backscattering from volumetric media and leverages deep learning for fast reconstructions. In this work, we implement ghost imaging by projecting Hadamard patterns that are optimized for imaging through scattering media. We demonstrate good quality reconstructions in highly scattering conditions using a 1.6% sampling rate. © 2020 Optical Society of America under the terms of the OSA Open Access Publishing Agreement


Introduction
Imaging through scattering media has many applications such as in underwater imaging and navigation in foggy environments. However, absorption and scattering [1,2] by scatterers can heavily reduce the signal to noise ratio (SNR) of target signals, making it challenging to use focal plane arrays based on CCD/CMOS technologies. Compared to CCD/CMOS sensors, the single-pixel detector (SPD) such as a photomultiplier tube and a single photon counter has much better sensitivity and SNR performance, making it a preferable detector for imaging through scattering media. However, SPD collects all photons into a single bucket without providing 2D information of objects.
Ghost imaging is an established method for reconstructing 2D images using only a single SPD [3][4][5]. In projection-based ghost imaging, the illumination is spatially encoded by a spatial light modulator (SLM) with a pre-know pattern displayed on the SLM. Then, the encoded beam illuminates the object and the reflection is directed into the SPD. During the measurement, the modulation patterns (M) are updated on the SLM and corresponding measurements (y) are made with the SPD. The problem of ghost imaging is to reconstruct the object (x) by solving the inverse problem of y = Mx. Compressed sensing can be utilized to reconstruct objects with fewer measurements than dictated by Nyquist's sample theorem [6].
Han et al. [7] first demonstrated that ghost imaging in the presence of scattering media where the scatters greatly reduce the SNR. Bina et al. [8] proposed to reconstruct objects immersed in a turbid media using differential ghost imaging with a backscattering configuration. Tajahuerce et al. [9] demonstrated the computational ghost imaging through a dynamic scattering media. Zhao et al. [10] proposed a passive ghost imaging scheme, which uses caustics illumination patterns to detect large objects in low-light underwater conditions. Le et al. [11] investigated underwater ghost imaging with different turbidity and illumination angles. Although ghost imaging has been demonstrated to image through scattering media, direct reflections from scatterers can severely degrade reconstruction quality.
To suppress direct reflection from scatterering media, polarization sensing has been exploited to improve the ghost imaging reconstructions in underwater imaging [12][13][14][15][16]. Polarimetric ghost imaging has also been used to separate reflections from different objects since the polarization of reflections from different surface roughness and materials can be different [17][18][19][20]. In polarimetric ghost imaging, the illumination is linearly polarized, and there is a linear polarizer in front of the detector. By rotating the linear polarizer, two measurements recording different polarization components are used to minimize scatterers' direct reflections [16] (more details in Sec. 2.1). Two SPDs with different polarization status can also be used. We refer to this process as a polarization correction. Although polarization correction improves reconstruction quality, it also blocks a portion of light incident on the detector, reducing SNR, and further complicating the problem of imaging through scattering media. Therefore, a ghost imaging reconstruction algorithm that is robust to noise is needed for polarimetric ghost imaging through scattering media.
Deep Learning [21] has been recently used to solve inverse problems [22,23] such as denoising [24] and phase retrieval [25]. Higham et al. [26] also use a three layer convolutional neural network for single pixel imaging, which has comparable reconstructions compared to iterative optimization methods. He et al. [27] and Lyu et al. [28] use deep learning to further improve the quality of reconstructions from iterative optimization in ghost imaging. Compared to previous algorithms, deep learning based methods demonstrate faster and more robust reconstructions.
In this paper, we leverage polarimetric ghost imaging and deep learning to improve the reconstructions for imaging through scattering media. An experimental prototype is built up to verify the proposed method. We summarize our contributions as below: • We propose an end-to-end learning method for imaging through scattering media with polarimetric ghost imaging. The proposed method can reconstruct the object in less than 1ms and produce more robust reconstructions compared to iterative methods, especially in high scattering conditions.
• We analyze the effect of spatially encoded illumination patterns in the scattering media, and suggest an optimal sequence of projected Hadamard patterns for imaging through scattering media which demonstrates robust reconstructions in high scattering conditions.
• We demonstrate a reconstruction with a very low sampling rate of 1.6%.

Methods
In this section, we firstly review the reconstruction with polarimetric ghost imaging through scattering media. We then present an optimal Hadamard sequence for imaging through scattering media. Lastly, we explain the deep learning based reconstruction algorithm.

Imaging through scattering media with polarimetric ghost imaging
For the ghost imaging scenario considered in this paper, a beam is spatially encoded with a pattern M displayed on an SLM, which then illuminates an object after first propagating through the scattering media. A portion of photons may be directly scattered back to the detector from the scattering media. Transmitted light continues to propagate and illuminate the object. Broadly, we can classify the reflections to the detector I m in two parts: direct reflections from the scatters S, and reflections from the object O, which can be represented as below: where ∆t is the exposure time of the detector. N x and N y are the number of pixels in x and y axis. M m is the measurement matrix for the system with encoding the spatial modulation pattern. We relaxedly assume the absorption of the scattering media is uniform. n m is the measurement noise. The scatterers and the object may have different polarization properties, therefore, the degree of polarization for scatterer's and object's reflections can be different [13,14,17,29]. We use this observation in the reconstruction scheme. During the measurement, the illumination beam is linear polarized. We place a linear polarizer in front of the detector, and two measurements (I ,⊥ ) are acquired by rotating the linear polarizer to align ( ) or be perpendicular (⊥) to the polarization status of the illumination beam [16].
where I ,⊥ S and I ,⊥ O represent reflections from scatters and the object to the detector respectively. The goal is to suppress the scatter's reflections and achieve the reflection from the object to the detector ( . We refer this step as the polarization correction. If we look closely at the noise variance after this polarization correction in Eq. (6), it actually accumulates the noise which complicates the reconstruction [13].
Assume the measurement matrix of the system as A, the detector measurement can be simplified as: If we remove scatterers' reflections from detector readout, the removal is also applied in the reconstruction. We then reconstruct the object using the measurement I O (Eq. (5)) containing only the object's reflection. where . J is the number of measurements with different spatially encoding patterns on the SLM, and K is the size of the vector form of the reconstruction. The reconstruction is an ill-posed problem, we therefore formulate it as an optimization problem with a regularizer for the penalty.
where a TV regularizer Φ(O) is applied and λ is the weight. The TwIST algorithm [30] is used for the reconstruction.

Spatial encoded patterns for the illumination
In this work, we used Hadamard patterns to encode the illumination. The sequence of Hadamard pattern is generated using the Walsh Hadamard transform. The dimension of the reconstruction is 128 × 128. If we define the sample rate as J/(128 × 128), a sequence of 16384 Hadamard patterns is needed for the sample rate of 1 (one Hadamard pattern corresponds to one SPD measurement). Dense scattering can cause blur in the projected illumination patterns. Blurring artifacts in the illumination restrict the spatial frequencies reaching the object surface and therefore the maximum spatial frequencies in the object that can be recovered. The point spread function (PSF) of the illumination can be approximated as consisting of the ballistic PSF and the Gaussian scattered PSF. In weak scattering conditions, the ballistic PSF provides a significant contribution. Previous work has established that high frequency components of the illumination patterns still reach to the object [12,13,31,32], and this work follows in this regime.
As discussed above, one of the goals of this paper is to reconstruct the object reflectance using very low sampling rates. Since these low sampling rates exploit only a fraction of possible Hadamard patterns and the polarization process further reduces the SNR of the reflection, it is desirable to determine an 'optimal sequence' of Hadamard patterns that is robust to noise in high scattering conditions and delivers the best possible result.
We simulate SPD measurements with different Hadamard patterns for averaging 200 natural images from the STL-10 dataset [33]. We reorder the Hadamard patterns with delivering energy (display with absolute values) from high to low marked with the red line in Fig. 1(a), which has a similar strategy with previous work [26]. We display the pattern containing relatively high energies in Fig. 1(b), which demonstrates lower spatial frequencies compared to the patterns carrying low energies ( Fig. 1(c)). It becomes clear that patterns displaying relatively low spatial frequencies are a good choice since they preserve information in the presence of strong scattering and are relatively robust against decreased SNRs.  1. (a). We simulate SPD measurements with the first 3000 Hadamard frames (delivering energy from high to low) for averaging of 200 natural images. The red line marks the average intensity and the black line represents the standard deviation. X-axis marks different Hadamard patterns and y-axis represents amplitude. We display an example of Hadamard patterns with high energies in (b) and low energies in (c).
In both simulation and experiments, we pick the first 255 and 450 Hadamard patterns shown in Fig. 1(a) with sampling rates of 1.6% and 2.7%, and compare their results with those from the first 1500 and 3000 Hadamard patterns (sampling rates of 9.5% and 18.3%) in Fig. 1(a).

Reconstruction with deep learning
We developed a convolutional neural network to provide high quality 2D reconstructions from small numbers of noisy ghost imaging measurements. The training is performed with STL-10 dataset [33], where 90000 images are used for the training. Since the size of the reconstruction is 128×128 in both simulations and experiments, we resize all the images in the dataset to be 128×128. For each training image (or scene), we simulate its SPD measurement with different Hadamard patterns (more details in Section 3). For the training, no extra noise is added into the simulated SPD measurements.
As shown in Fig. 2, the input to the neural network is SPD measurements. If we train for the sampling rate of 1.6% (corresponding to 255 different Hadamard patterns), the input to the neural network is then a vector of 1×255. In training, the batch size is 30, and the input is then 30×255 for each batch. Firstly, a fully connected (FC) layer is used and followed by a rectified linear unit (ReLU). As discussed in Section 2.1, every element in X contributes to the single pixel detector measurement (y). The FC layer directly maps the measurement to the scene, which can be treated as an inverse transform for the measurement (y) to estimate the unknown object (X) following the linear measurement matrix (A). It is noticed that the weights of FC layer can be fixed during training which reduces the number of trainable parameters with the best linear completion scheme [34]. For simplicity, we use the native FC layer and all the weights are optimized during training. The output of the FC layer is 30×1×128×128, and is directed into a convolutional layer containing a batch normalization and an activation ReLU function. The kernel of this convolution is 9 × 9 with a padding of 4. The output of this layer has 64 channels (30×64×128×128). Then, an eighteen-layer residual network [35] is used for further processing with preserving the size (30×64×128×128). The kernel size used in the residual block is 3 × 3. The output of the residual block goes through another convolutional layer (kernel size of 3×3), and then is concatenate with the input to the residual block. The concatenation then goes through the last convolutional layer (kernel size of 9×9 with a padding of 4) with output feature of 1, and this is the final reconstruction with a size of 128×128. In training, ADAM optimization [36] is used with a learning rate of 0.001, and coefficients for computing running averages of gradient and its square as 0.9 and 0.999. Twenty epochs are used for the training process. The neural network is installed with the PyTorch framework, and the training process is performed on a computer with NVIDIA TITAN X GPU [37]. We compare the mean squared error between the output and the ground truth as the lost function: where f (·) represents the activation functions. Θ are the weights for the neural network. I O is the single pixel detector measurements, and O is the object for reconstruction. We retrain the neural network for other sampling rates of 18.3%, 9.5% and 2.7%, and the training time can vary from 9 to 12 hours depending on the sampling rate.

Simulation
In simulations, we assume the polarization correction completely removes back reflections from the scattering medium and simply reduces the SNR of measurements. Increasing the density of the scatterers results in stronger absorption and scattering which further reduces SNR. Therefore, we apply different SNRs to simulate measurements under different scattering conditions.
For each Hadamard pattern displayed on the SLM, we simulate its SPD measurement by multiplying the scene with the Hadamard pattern and summarizing all the pixels into one readout.  7% (b, d).
In a high scattering condition (SNR=10dB), we also compare the reconstructions with DL and CS for the sampling rate of 18.3% (e, g) and 2.7% (f, h). A quantitative comparison is shown in the table.
Different white Gaussian noises of 10, 15, 20, 25dB are added to simulate different scattering conditions.
We simulate the SPD measurements for 'Lena' with different Hadamard patterns in low and high scattering conditions (or equivalently SNRs). For each sampling rate, we reconstruct 'Lena' with independently trained neural networks and compare against rule based compressed sensing reconstruction [30]. The results are shown in Fig. 3, and reconstructions from deep learning (DL) in Figs. 3(a) and (e) demonstrate better visual results compared to those from compressed sensing (CS) in Figs. 3(c) and (g) especially for low sampling rates as shown in in Figs. 3(b) and (d) and (f) and (g). From the quantitative comparison shown in the table, DL can reconstruct the object in less than 1ms. In high scattering conditions (SNR=10dB), DL performs more robust reconstructions than CS in terms of peak signal to noise ratio (PSNR) and structural similarity index (SSIM).
We further reconstruct for 'Cameraman' using DL with different sequences of Hadamard patterns (or sampling rates) we select. We compare the performance in different scattering conditions as shown in Fig. 4(a). As we can see, high sampling rate (18.3%) help visualize more details of the object in low scattering conditions. If we compare the reconstructions with sampling rates of 9.5% and 2.7% in high scattering conditions (SNR=10 or 15dB), higher sampling rate does not help improve the reconstruction quality as shown in Figs. 4(b) and (c). Moreover, reconstructions with sampling rates of 2.7% and 1.6% demonstrate robustness to different scattering conditions, which suggests that these Hadamard sequences are preferable for imaging through strongly scattering media.

Experimental setup
The experimental setup is shown in Fig. 5(a). The light source is a customized LED, and the LED's emission is collimated and projected on to the digital micromirror device (DMD, Texas Instrument Discovery 4100). The beam is modulated with the Hadamard pattern displayed on the DMD. It then goes through a linear polarizer (Daheng Optics, GCL-05) to be polarized via a relay lens. The polarized beam then goes through a lens (Nikon AF-S 24-120mm f/4G ED VR) and illuminates the object through the scattering media. The scattering media in the tank is a mix of water and milk. Different scattering conditions (7FTU, 20FTU, 32FTU and 40FTU. FTU: formazine turbidity unit [16,38]) are created by adding different amounts of milk drops into the water. A toy skull with a rough surface is placed behind the water tank as the target. The distance between the light source and the object is about one meter. The reflection from the object goes through the scattering media, and is then collected by an SPD (Hamamatsu H10493-012). We use another linear polarizer in front of the SPD, and measure twice by rotating its polarization to be the same or perpendicular to that in the illumination path. In practice, we can also use two SPDs with different linear polarizers to capture simultaneously.
The optical system in the illumination path is optimized so that the Hadamard pattern with high spatial frequency is not cut off. We demonstrate the CS reconstruction using the sampling rate of 1 with clear water as shown in Fig. 5(c), where details of the object can be visualized.

Reconstructions with deep learning and compressed sensing
We preprocess the SPD measurements by applying the polarization correction as described by Eq. (5). We then perform DL and CS reconstructions with the sampling rate of 18.3% (Fig. 6(a)) and 2.7% (Fig. 6(b)) under different scattering conditions.
The reconstruction with DL is orders of magnitude faster than iterative retrieval algorithms. Moreover, DL reconstructions (Figs. 6(a) and (b): upper rows) demonstrate better visual results compared to CS reconstructions (Figs. 6(a) and (b): bottom rows) with the closeup as an example. When the density of the scatterers increases, we observe that the CS reconstruction degrades dramatically. For example, it is very hard to separate the reconstruction of the skull from background noise when the density of scatterers is high (right columns in Figs. 6(a) and (b)). On the other hand, deep learning can still reconstruct the object with a reasonable quality such as the reconstruction with the sampling rate of 2.7% under 40 FTU. This demonstrates the robustness of our deep learning algorithm in imaging through scattering media.

Hadamard sequences for deep learning reconstructions
In low scattering (7 FTU), high sampling rates result in higher quality reconstructions, as expected. For example, we can observe the nose and eyes of the skull with a high level of detail in Fig. 7(a) while these details are not visualized in low sample rate measurement as shown in Fig. 7(d). This suggests that the Hadamard sequences with higher spatial frequencies result in better reconstruction quality in low scattering. Does this expectation still hold in high scattering?
In high scattering (40 FTU), reconstructions with the sampling rates of 18.3% and 9.5% degrade significantly compared to Figs. 7(a) and (b). On the other hand, reconstructions with a low sample rate of 2.7% 7(g) or 1.6% 7(h) can preserve the quality in high scattering conditions.
We further analyze the SPD measurements with different Hadamard frames in the high scattering condition in Fig. 8. We can observe the same phenomena with that shown in the simulation ( Fig. 1(a)), where certain Hadamard patterns provide higher energies (display with absolute values). As mentioned earlier, the input to the neural network is the SPD 'output' (I) after the polarization correction from I ( Fig. 8(a)) and I ⊥ (Fig. 8(b)). The noise is accumulated after applying the polarization correction as shown in Fig. 8(c), which further reduces the SNR. Therefore, it is preferable to pick up the Hadamard sequences providing high energies for sensing and reconstruction, which is more robust to the noise and still provides valuable information. In other words, this suggests that the sequence of Hadamard patterns providing high energies can be chose for high scattering conditions. At the same time, the selected sequence of Hadamard patterns has a lower sampling rate (2.7% or 1.6%) which reduces the acquisition time. . We perform the polarization correction from the two measurements and achieve the final SPD measurement (c) for the post processing. In each plot, x-axis is the index of the Hadamard pattern, and y-axis represents the amplitude.

Discussion
In high scattering conditions, reconstructions from ghost imaging with high sampling rates (e.g., 18.3%) degrade significantly. However, high sampling rates are needed to reconstruct more details of the object. Can we introduce some prior information about scattering media such as the SNR or scattering properties into the training procedure? If so, the neural network may learn the optimized reconstruction for high sampling rates in high scattering conditions.
To evaluate our hypothesis, we apply an extra white Gaussian noise of 25dB and 10dB to the previous 90000 STL-10 images, and there is a total of 270000 images for the training. We retrain the neural network for the sampling rate of 18.3% using the same parameters, and test for 'Lena' and the experimental measurement.
For 'Lena', compared to previous reconstructions in the upper row of Fig. 9(a), reconstructions using a network trained with noisy images are improved greatly in high scattering conditions (low SNRs) as shown in the bottom row of Fig. 9(a). Moreover, SSIM (Fig. 9(c)) and PSNR ( Fig. 9(d)) in reconstructions using a network trained with noisy images have been greatly improved (red lines) compared to those in previous reconstructions (dark lines). We also test for the experimental measurements. In the sampling rate of 18.3%, training with noisy images helps improve the performance in high scattering conditions as shown in Fig. 9(b) (bottom row) compared to previous results ( Fig. 9(b), upper row).
In both simulation and experiments, training with noisy data enables increases reconstruction quality for high scattering conditions, while also decreasing reconstruction image quality for lower scattering conditions. This indicates that a good characterization of measurement noise can lead to significant improvements in performance for deep learning based reconstructions [41]. In this work, we simulate SPD measurements under different scattering conditions by adding different Gaussian noise, which may not represent the scatter properties very well. A physical rendering might be used to generate a dataset with considering the scatter's properties and power reductions, which can better represent the detector response when imaging through scattering media. With this new dataset, we can potentially further improve the reconstruction quality with high sampling rates in high scattering conditions. This dataset may be profitable for imaging through scattering media especially when using deep learning for reconstructions.

Conclusion
In this paper, we demonstrate a deep neural network can help reconstructions for imaging through scattering media with polarimetric ghost imaging in both simulated and real experiments. Deep learning demonstrates more efficient and robust reconstructions in high scattering conditions compared to iterative reconstructions. We believe the proposed method can be used as a fast and robust imaging tool for imaging through scattering media such as underwater imaging and imaging through tissues.

Appendix 1: Derivation for Eq. (5)
As mentioned previously, we have the degree of polarization for scatterers (β S ) and the object (β O ) and the two measurements (I ⊥ and I ) as: We take these into Eq. (13) as: By combining Eq. (12) and Eq. (16), we can have: By combining Eq. (14) and Eq. (17), we can have:

Funding
Defense Advanced Research Projects Agency (REVEAL Program (HR0011-16-C-0028)); National Science Foundation CAREER Award (IIS-1453192); National Natural Science Foundation of China (61501077); Fundamental Research Funds for Central Universities of the Central South University (3132018186, 3132020202).