Scattering imaging as a noise removal in digital holography by using deep learning

Imaging through scattering media is one of the main challenges in optics while the deep learning (DL) technique is well known as one of the promising ways to handle it. However, most of the existing DL approaches for imaging through scattering media adopt the end-to-end strategy, which significantly limits its generalization capability for various or dynamic scattering media. In this work, we propose an alternative DL-based method to achieve the goal of imaging through different scattering media under the framework of off-axis digital holography. As a result, the severe ill-posed inverse problem in scattering imaging is simplified as a relatively easy denoising issue for a deteriorated hologram. The experimental results of the proposed method show good generalization for not only different scattering media but also different types of objects.


Introduction
Imaging through scattering media (e.g., biological tissue and atmospheric turbulence) is a long-standing issue in the fields [1][2][3][4][5][6], such as biomedical imaging, astronomical imaging, and remote sensing imaging. It is well known that the interactions between light and microparticles in scattering media destroy the traditional imaging mechanism and usually result in a speckle pattern [7]. To reconstruct the objects hidden behind a scattering media, many efforts have been made. The transmission matrix (TM) [8][9][10] directly gives the mathematical relationship between the input and output of a scattering layer. However, this measured mapping matrix is very sensitive to the medium's situation let alone the time-consuming. Due to the memory effect [11,12], it is possible to use deconvolution methods to retrieve the hidden object from intensity patterns produced by the object and a light point source placed into the object [13][14][15][16], although the field of view (FOV) is limited. Speckle-autocorrelation techniques are also based on the memory effect [17][18][19][20][21] and can reconstruct the hidden object in a noninvasive way with only one exposure, but the FOV is still as small as for the deconvolution methods. More recently, the deep learning (DL) strategy has been employed to simulate the scattering imaging model [22][23][24][25][26]. However, most of the existing works use an end-to-end strategy to train a neural network using object images and speckle patterns, which limits the generalization for different scattering media, as shown in figure 1(a).
On the other hand, back to the 1960s, Leith et al and Goodman et al proposed a similar idea to see through a scattering medium [27,28], where a hologram is assumed to freeze the scattered light field and then generate its complex conjugate counterparts while reversing the light paths to reconstruct the hidden object. In 2014, Singh et al presented a holographic method to retrieve a hidden object by introducing the off-axis digital holography (DH) strategy [29]. Under the framework of off-axis DH, one advantage must be mentioned that the coherent-light-based holography technique makes phase-info recovery possible. But, Figure 1. Flowchart of learning-based methods for imaging through scattering media, (a) the traditional methods in which the scattering medium should be fixed, (b) the proposed method by taking advantage of the off-axis DH where the scattering medium could be changed. due to coherent illumination, the interference between phase-vary lights coming out from the scattering media will cause 'speckle noise' on the interference pattern (hologram). It should be noted that these speckles would influence the object reconstruction and usually are totally different while the scattering media differs. Therefore, a time-averaging process was introduced to reduce the coherent noise of the captured speckle pattern [29], so in that case, the scattering media (ground glass) is assumed to keep rotating rapidly. More recently, Kodama et al introduced the phase-shifting-based in-line DH to enhance the FOV and anti-noise ability while increasing the recording time [30]. This approach will still work if the scattering medium changes slightly.
We found that in the DH-based scattering imaging [29], if dropping the glass-rotating-based averaging process, the captured speckle pattern is basically a speckle-contaminated hologram, namely, a speckle hologram, rather than a noise pattern in the traditional sense. This fact pushed us to think about how to remove the speckle noise on the hologram and then take it to reconstruct the object. Normally, the speckle noise in traditional DH is produced by coherent light and various methods have been proposed to reduce it, such as the median and mean filtering [31], discrete Fourier filtering [32], wavelet filtering [33], bidimensional empirical-mode decomposition [34], and partial differential equation-based homomorphic filtering [35]. Recently, a nonlocal mean algorithm has been presented to maintain sharp image edges while denoising [36], followed by an improved version called block-matching and 3D filtering [37]. However, those methods would fail when they faced the speckle noise produced by the scattering medium since the degradation of the hologram is much more serious.
In this regard, we proposed an alternative DL-based method to make the off-axis hologram immune to the speckle noise introduced by the scattering medium. It is expected that the hidden object can be reconstructed from a calculated speckle-free hologram which is the output of a designed neural network. As a result, the severe ill-posed inverse problem can be transformed into a relatively simple problem consisting of the removal of speckle noise from a captured speckle hologram. This scheme is assumed to be still effective even when the scattering medium is rotated with an arbitrary angle or replaced by another one, as shown in figure 1(b).

Scattering imaging model based on off-axis digital holography
To obtain holograms with high quality, we employ a special off-axis holography technique, i.e., Fourier transform holography (FTH) [38][39][40]. A sketch of the setup for the FTH-based scattering imaging model is shown in figure 2. An object hidden behind the scattering diffuser is illuminated by a coherent light. The reference wave in FTH is created by diffraction from a point-like pinhole [38]. Assuming that (x, y) are the coordinates in the object plane, and the position of the pinhole is (x 0 , y 0 ), the total transmission function in the object plane can be written as: Since the FTH technique records not the object wave, but its Fourier spectrum, the propagation distance d must satisfy the far-field condition. The formed hologram is therefore in fact a Fraunhofer diffraction pattern of the transmission function in the object plane: where (u, v) are the coordinates in the diffuser plane; and FT{·} denotes the Fourier transform operation. The object can be reconstructed from its FTH hologram by simply calculating the Fourier transform of the hologram [40]. According to the Wiener-Khinchin theorem, the Fourier transform of the power spectrum of a function is equivalent to its autocorrelation: where the symbol ⊗ denotes the correlation operation. According to equation (1), the autocorrelation of the total transmission function is: In equation (4), the first two terms are the autocorrelation of the object and the sharp intensity peak appearing in the center, and the latter two terms are two symmetrically shifted images of the object.
Then a scattering diffuser is placed between the object and camera. The Fourier transform hologram formed on the front surface of the diffuser after passing through the diffuser is imaged on the camera plane by an imaging lens. Therefore, the camera captures the intensity distribution on the rear surface of the diffuser. Suppose that the random phase introduced by the diffuser is exp[iφ r (u, v)], and the complex field distribution on the rear surface of the diffuser is Note that the relative phase information between the object beam and the reference beam is not changed because both beams experience the same random phase. The diffuser, imaging lens and camera can be considered as an optical imaging system, and the distances u and v should satisfy the Gaussian imaging formula. The point spread function (PSF) of the imaging system is the Fourier transform of the pupil function of the lens. According to the diffraction theory, the complex amplitude of the scattering field on the camera can be obtained by the convolution of the scattering field with the PSF. Accordingly, the intensity distribution on the camera plane can be written as: We assume that the imaging lens is diffraction-limited and has the numerical aperture NA. If the NA of the lens is large enough to resolve the maximum spatial frequency of the holographic fringe, we can neglect the loss of the optical transfer function gain of the diffraction-limited imaging lens within the range of maximum spatial frequency. In this case, |U (ξ, η)| 2 = |U(ξ, η)| 2 , which means that the hologram captured by the camera is the same as the hologram formed on the front surface of the diffuser, and the ideal image of the object can be reconstructed directly from the hologram by inverse Fourier transform. However, it is difficult to satisfy the imaging condition that the NA of the lens is large enough in a real imaging condition. Under the condition of coherent illumination, the random phase introduced by the scattering media cannot be ignored, which is manifested as speckles on the recorded hologram. These speckles usually display different with the different scattering diffusers, which can be regarded as speckle noise. Therefore, we consider this kind of scattering imaging as a noise removal process in DH by using DL. As shown in  figure 1(b), we first record a set of speckle holograms, and then employ a deep neural network to learn how to remove of the speckle noise. After a good training, a speckled hologram captured by the camera can be fed into the trained neural network and output a speckle-free hologram.

Neural network framework
To remove of the speckle noise from the recorded speckled hologram, we employ a denoising convolutional neural network (DCNN) [41], which has been demonstrated to not only speed up the training but also boost the denoising performance. The framework of the DCNN is shown in figure 3. It contains 7 convolution blocks and 1 output layer. Each convolution block consists of three continuous operations: convolution (Conv), rectifying linear unit (ReLU) and batch normalization (BN). Different blocks are composed of convolution kernels with different sizes. The numbers of the 7 convolution block channels are set to 8,16,32,36,36,36, and 64 while the sizes of the convolution kernels are 3 × 3, 5 × 5, 7 × 7, 7 × 7, 5 × 5, 3 × 3, and 1 × 1. The output layer only contains Conv and ReLU operations, and the size of the convolution kernel is 1 × 1.
For the training process, the input and output of the DCNN are a set of speckled holograms and speckle-free holograms, respectively. Both the input and output images contain 128 × 128 pixels. The DCNN extracts the feature maps of the input images through 7 cascaded convolution blocks, and each feature map has the same size as the input image. Therefore, the size of the image will not change during the feature extraction process. In this way, an independent mapping relationship can be established between each pixel of the input image and the corresponding pixel of the output image, so that the feature of the hologram can be retained effectively. Compared with the standard CNN form and other popular neural networks commonly used in image processing, e.g., U-net, the DCNN used in our method does not contain the pooling layer (downsampling) and the upsampling layer. The main function of pooling layers is to reduce the number of parameters, improve the speed of the training process and maintain the invariance of the feature of the inputs, but it will cause the loss of some high frequency information in the image. In our method the holograms are composed of dense fringes, and it is necessary to retain the high-frequency information in the holograms during the training process. Therefore, we extract the feature maps of holograms by gradually changing the size of the convolution kernel and the number of channels in the DCNN model. The convolution kernels of different sizes can be used to extract the features of different scales, which can well retain the main features of the holograms and improve the generalization ability of the model. Note that in the last convolution block, we designed a 1 × 1 convolution and ReLU to provide better nonlinear activation leading to higher accuracy. Similarly, the output layer uses a 1 × 1 convolution instead of traditional fully connected layers and combines the feature maps extracted from the last convolution block to generate a speckle-free hologram. Replacing the fully connected layer with 1 × 1 convolution operation can greatly reduce the number of parameters, prevent overfitting, and accelerate the fitting speed of the data. The output of each convolution operation is activated by the function ReLU, which can increase the nonlinear mapping capability of the model. The BN operation can adjust the distribution of data, accelerate the training process, and alleviate the problem of gradient disappearance in the training process.
The mean absolute error (MAE) is selected as the loss function for training the DCNN model. We define: where N represents the pixel number of the image, and are the pixel values of the input and output images, respectively. The Adam optimization algorithm is selected to update the weights and biases of the DCNN

Experiments and results
To demonstrate the validity of the proposed method, we implemented optical experiments for imaging through a diffuser. A schematic diagram of the experimental setup is shown in figure 4. The light emitted by the laser (λ = 632.8 nm) passes through a spatial filter and a collimating lens (L1, f1 = 15 cm), it is split into a beam illuminating the object and a reference beam. The object is displayed on a spatial light modulator (SLM) (Holoeye, LC2002). A focusing lens (L2, f2 = 15 cm) is placed at the reference beam to produce a spherical wave. L2 in the reference beam should be placed to ensure that the virtual focus point is in the object plane. Both the object beam and reference beam pass through a scattering diffuser (Thorlabs, DG20-600 grit-MD). A scientific CMOS camera (PCO edge 4.2, 2048 pixels × 2048 pixels with a pixel size of 6.5 μm × 6.5 μm) is placed at the conjugate plane of the diffuser with an imaging lens (L3, f3 = 10 cm) to record the speckled hologram. In our experiments, the distance between the sample and the diffuser is d = 25 cm, the distance from the diffuser to the lens is u = 15 cm, and the distance from the lens to the camera is v = 30 cm, which satisfies the imaging formula of the lens.

The reconstruction results of the testing data set
First, we verify the feasibility of the proposed approach. We use the MNITS handwritten digit database [42] to train the DCNN. The holograms with speckle noise captured by the camera are taken as the input of the neural network, and the holograms without speckle noise are taken as the output of the neural network. We loaded 5000 digit-character images in MNIST data onto the SLM and collected 5000 holograms in total, where 4000 holograms were selected as the training data, and the other 1000 holograms were selected as the testing data. The sizes of both the input images and the output images are cropped to 128 × 128 pixels. Note that the whole training process took a total of 6 h. The training and testing curves of DCNN in the training process are shown in figure 5. After performing 10 epochs, the loss function (MAE) values of both the training curve and the testing curve decreased to approximately 0.04. The distance between the two curves in the training process is small and there is no large fluctuation. The trained DCNN is used to recover the hologram in the testing data. In this demonstration, the training data and the testing data are captured by using the same diffuser. The experimental results are shown in figure 6. The recorded holograms without and with diffuser are presented in figures 6(a) and (b), respectively. Figure 6(c) shows the holograms captured by rotating the diffuser. We input the speckle hologram ( figure 6(b)) into the trained DCNN, the output of which is shown in figure 6(d). After performing an inverse Fourier transform on the images in figures 6(a)-(d), the corresponding reconstructed images are shown in figures 6(e)-(h), respectively. Obviously, there is speckle noise in the hologram when the diffuser is there, the reconstructed image looks very noisy, and the contrast is poor. However, with the help of the trained DCNN, the fringes in the holograms gain high visibility and the quality of the reconstructed image is significantly improved. The speckle noise on the hologram can also be removed with time averaging by rotating the diffuser and then the object image can be reconstructed by an inverse Fourier transform operation. As shown in figure 6, both the time averaging method and our DL-based method can remove the speckle noise from the speckle holograms, which means that the proposed DL scheme is successful for solving the speckle denoising problem in holographic imaging. We calculated the MAE and peak signal-to-noise ratio (PSNR) between the neural network output speckle holograms and ground truth holograms as evaluation metrics. The MAE and PSNR of the time averaging

Generalization for different scattering media
Most of the existing DL-based scattering imaging technologies are end-to-end strategies, which use object images and corresponding speckle patterns as training data so that the trained neural network model depends strongly on the fixed scattering relationship. The movement and replacement of the scattering diffuser will lead to a sharp drop in the quality of the image reconstruction. In the second demonstration, we verify the generalization capability of the proposed scheme for different scattering media. Here, we use a diffuser to obtain the training data and use another diffuser to obtain the testing data.
Unlike the end-to-end methods, our approach does not train the direct mapping relation of the objects and the speckle patterns but uses the holograms as the training features to eliminate speckle noise from the holograms. The holographic technique encodes the information of the object into the holographic fringe pattern, which eliminates (to a certain extent) the interference of the random phase introduced by the scattering media. The typical recorded speckle holograms through another diffuser are displayed in figure 7(a). These speckled holograms are input into the trained DCNN, the output is shown in figure 7(b), and the reconstructed results are shown in figure 7(c). We can see that the quality of the reconstructed images is slightly decreased when the diffuser is changed, but the information of the handwritten digit in the images can still be recognized.

Generalization for different types of objects
To further verify the generalization capability of this method, we test the trained DCNN model for different types of objects. We still use holograms of MNITS handwritten digits to train the DCNN, but the testing data use holograms of MNIST handwritten letters. The holograms related to the input and output of DCNN are shown in figures 8(a) and (b). The reconstructed images of MNIST handwritten letters obtained from the trained DCNN model are shown in figure 8(c). The DCNN trained with handwritten digits can still reconstruct the image of handwritten letters. Although some detail information of the object is lost, most contour information can be reconstructed, which proves that the DCNN model has a good generalization ability for different types of data.

Neural network performance analysis
From the structure of the neural network model, DCNN is a simple CNN form since it does not use pooling layers and upsampling layers. On the one hand, image denoising is a type of relatively relaxed image processing task compared with other severe ill-posed inverse problems. On the other hand, a complex CNN is vulnerable to overfitting problems, which leads to a decrease in the generalization ability of the neural network. To quantitatively analyze the speckle denoising performance of neural networks, we compare the experimental results on the testing data set between a common CNN form (U-net) and DCNN. As mentioned above, we designed three testing data sets by varying diffuser or the type of objects: Group 1: same diffusers and same types of objects; Group 2: different diffusers and same types of objects;  Here, MAE and PSNR between the ground truth image and output image of the neural network are selected as evaluation metrics. The comparison results are presented in table 1, which shows that DCNN has higher accuracy.
In addition, we also use some different loss functions for the neural network and compare their performance on the testing data sets, including the MAE, mean squared error (MSE), root mean squared error (RMSE), mean square logarithmic error (MSLE), and root mean squared logarithmic error (RMSLE). The comparison results are shown in table 2.

Discussion
Considering that a spherical reference wave must be placed behind the scattering medium, it is basically a limitation of our approach. In some applications, the reference wave can be generated from a point resource [43], which is incoherent with the object illumination and has the role of a 'guide star' in biomedical imaging [3]. On the other hand, reference-less methods such as speckle autocorrelation are generally limited by the effective range of the optical memory effect, the FOV is small, and the phase retrieval algorithm needs hundreds of iteration times so that it is difficult to realize real-time imaging. Other methods for imaging through scattering media usually require some prior information in advance, such as the PSF or TM. In contrast, the holographic imaging method is free from the time-consuming iterative phase retrieval algorithms and poses no limitation on FOV. In addition, a trained neural network can achieve fast image reconstruction. It is worth noting that a hologram can record the phase information of objects, which means that the holographic imaging method has the potential to achieve the imaging of phase objects through scattering media. Furthermore, as shown in figure 6(b), some interference fringes remain after propagation through the diffuser, which depends on the grain size of the diffusers used in the experiments. Therefore, another restriction of our approach is that, to recover clear speckle-free holograms from the proposed DL method, the grain size of the diffuser must be small enough to ensure that the size of the speckles formed in the hologram will not be larger than that of the interference fringes. It should also be noted that, both the inputs and outputs of DCNN in the proposed DL method are holograms. Generally, the complexity of reconstructed objects (i.e., sparsity, structure and grayscale) corresponds to the density and grayscale levels of hologram fringes. Therefore, it is still a challenging to reconstruct objects with complex shapes and high grayscale levels since the errors in the holograms recovered from DCNN will further influence the reconstruction of objects.

Conclusion
In conclusion, we developed a new holographic DL strategy that could retrieve a hidden object even when the scattering media itself is rotated or replaced by another one. Neither media-rotated-based averaging [29], nor the phase-shifting-based denoising process [30] are needed. Instead, we introduced the DCNN to learn how to remove the noise from speckled holograms deteriorated by a scattering diffuser. Meanwhile, compared with the existing end-to-end DL-based scattering imaging methods, one obvious advantage is that our scheme has a better generalization ability for not only different scattering media but also different types of objects. On the other hand, since the hologram could recover the phase information of an object, the proposed approach may have the potential to retrieve phase objects through scattering media.