Single Pixel Imaging Based on Multiple Prior Deep Unfolding Network

Single-pixel imaging (SPI), an imaging technique based on the theory of compressed sensing, is limited in real-time imaging and high-resolution images due to its relatively slow imaging speed. In recent years, deep unfolding network compressed sensing reconstruction algorithms based on deep learning have proven to be an effective solution for faster and higher quality image reconstruction. However, existing deep unfolding networks mainly rely on a single piece of a priori information and may ignore other intrinsic structures of the image. Therefore, in this paper, we propose a deep unfolding network (MPDU-Net) that incorporates multiple prior information. To effectively fuse multiple prior information, we propose three different fusion strategies in the deep reconstruction sub-network. An unbiased convolutional layer is used to simulate the sampling reconstruction process to achieve joint reconstruction for effective removal of block artifacts. The sampling matrix is input into the deep reconstruction sub-network as a learnable parameter to achieve joint optimization of sampling reconstruction. Simulation and practical experimental results show that the proposed network outperforms existing compressed sensing reconstruction algorithms based on deep unfolding networks.


Single Pixel Imaging Based on Multiple Prior Deep Unfolding Network
Quan Zou , Qiurong Yan , Member, IEEE, Qianling Dai , Ao Wang, Bo Yang , Yi Li, and Jinwei Yan Abstract-Single-pixel imaging (SPI), an imaging technique based on the theory of compressed sensing, is limited in real-time imaging and high-resolution images due to its relatively slow imaging speed.In recent years, deep unfolding network compressed sensing reconstruction algorithms based on deep learning have proven to be an effective solution for faster and higher quality image reconstruction.However, existing deep unfolding networks mainly rely on a single piece of a priori information and may ignore other intrinsic structures of the image.Therefore, in this paper, we propose a deep unfolding network (MPDU-Net) that incorporates multiple prior information.To effectively fuse multiple prior information, we propose three different fusion strategies in the deep reconstruction sub-network.An unbiased convolutional layer is used to simulate the sampling reconstruction process to achieve joint reconstruction for effective removal of block artifacts.The sampling matrix is input into the deep reconstruction sub-network as a learnable parameter to achieve joint optimization of sampling reconstruction.Simulation and practical experimental results show that the proposed network outperforms existing compressed sensing reconstruction algorithms based on deep unfolding networks.Index Terms-Single pixel imaging, deep unfolding network, multiple prior information, joint optimization.

I. INTRODUCTION
S INGLE Pixel Imaging (SPI) is an imaging method based on compressed sensing.It uses a digital micromirror device (DMD) loaded with a mask to modulate the image, and uses a point detector to receive the modulated light intensity.Finally, the two-dimensional image is reconstructed using the received light intensity and the mask loaded on the DMD.SPI has two main advantages.First of all, this technology achieves twodimensional imaging by using a single-pixel detector without relying on spatial resolution, thereby greatly reducing costs.This advantage is especially highlighted in special wavelength ranges such as infrared and terahertz.Secondly, the detector in a single-pixel system can collect light intensity information from multiple pixels simultaneously, thereby significantly improving the signal-to-noise ratio and helping to improve the clarity and accuracy of the image.SPI plays an important role in many fields, including but not limited to biomedicine [1], [2], radar imaging [3], [4], [5], spectral imaging [6], [7], and optical applications [8], [9], [10], [11].
However, the application of SPI in real-time imaging and high-resolution images is limited because SPI takes a long time to reconstruct pictures.SPI has two reconstruction methods: iterative optimization and deep learning.Iterative optimization reconstruction methods include total variation augmented Lagrangian alternating direction algorithm (TVAL3) [12], iterative shrinkage threshold algorithm (ISTA) [13], approximate message passing (AMP) [14], alternating direction multiplier method (ADMM) [15], [16], etc.The solution process of the above algorithms does not require a large amount of additional image data, and image reconstruction can be completed simply by using the measurement matrix and corresponding measurement values.but it usually requires hundreds of iterations.There are problems such as complex reconstruction process and too long reconstruction time, which greatly limits the use scenarios of traditional algorithms.The reconstruction methods of deep learning can be classified into two situations, one is the reconstruction algorithm of deep non-unfolding networks.In 2016, Kulkarni et al. proposed a ReconNet model based on image super-resolution reconstruction [17], whose reconstruction effect is better than other traditional compressed sensing reconstruction algorithms.In 2017, Xie et al. studied the adaptive measurement matrix based on ReconNet [18], which enabled the network to perform adaptive measurements without the need for hand-made measurement matrices.This adaptive network reconstructs images better than the ReconNet network.In 2019, Yao et al. proposed DR2-Net based on residual learning [19], which introduced a residual optimization network to improve the quality of reconstructed images.The above-mentioned deep non-unfolding network realizes the mapping between measured values and reconstructed images, which not only avoids iterative operations, but also has better reconstruction quality than traditional compression reconstruction algorithms such as TVAL3.But it requires a lot of data fitting, and the structure is uninterpretable.The other type is the reconstruction algorithm of deep unfolding network, In 2016, Yang et al. proposed an ADMM-Net structure to redefine the ADMM for compressed sensing magnetic resonance imaging by using a deep network [20].It is widely used in fields such as biomedical imaging.The disadvantage of the above-mentioned deep unfolding network is that most existing deep unfolding networks only unfolding a specific iterative recovery algorithm into the form of a network and only use an image prior.This single prior design will greatly hinder the transmission of network information and lose more image details.Therefore, we propose a deep unfolding network (MPDU-Net) that integrates multiple prior information.All parameters involved in MPDU-Net (such as nonlinear transformation, shrinkage threshold, step size, etc.) are learned through backpropagation.Our work contributions can be roughly summarized as: 1) We propose a new deep expansion network MPDU-Net, which fully considers the prior information of different traditional iterative algorithms and fuses modules with different prior information.In this way, different prior information can be used to capture sufficient features more effectively, thereby restoring more details and textures.2) We use unbiased convolutional layers to simulate the sampling reconstruction process, making full use of interblock relationships to achieve joint reconstruction.The sampling matrix is input into the subsequent reconstruction process as a learnable parameter to achieve joint optimization of sampling and reconstruction.3) We propose to simultaneously introduce the binary constraints and orthogonal constraints of the sampling matrix and the sparse constraints of the image into MPDU-Net, thereby making the entire network suitable for hardware and improving performance.4) Verified by multiple sets of simulation experiments, MPDU-Net is superior to existing reconstruction algorithms By training the measurement matrix as a binary matrix, the network we designed can be directly used in SPI systems, and this was verified through actual experiments.

A. Compressed Sensing and Traditional Iterative Algorithms
In SPI based on compressed sensing, the imaging process can be expressed as (1).
where x ∈ R m is the target image and y ∈ R n is the measurements.Φ ∈ R n * m is the measurement matrix and w is the noise.Compressed sensing can solve x from a random measurement value y.This inverse problem is a typical ill-posed problem.Compressed sensing reconstruction algorithms based on iterative optimization can solve ill-posed problems by incorporating current prior knowledge.The least absolute shrinkage and selection operator (LASSO) problem points out that using sparse prior [24], l 1 norm is used as a regularization term and λ is used as a regularization parameter, if x is sparse under the transformation matrix ψ, then x can be solved as: ISTA is very suitable for solving such problems.It solves the compressed sensing reconstruction problem in (2) by iterating between the following two steps [13]: where k is the number of iterations, ρ is the iteration step size, Φ T is the transpose of the sampling matrix.( 4) is a special case of proximal mapping, which can be solved by (5): where W is the transformation matrix and soft is the soft threshold function.ISTA requires a lot of iterations and calculations to get good results.The transformation matrix W and all parameters are predefined and do not change with the number of iterations, which makes it difficult to adjust these parameters.Approximate message passing (AMP) can also be used to solve (2) [14].The specific process is as follows: where A T is the transpose of the sampling matrix A, T k is the nonlinear function.
Define the initialization input and original data as x 0 and x, and extend it to the kth iteration, then can get: Where I is the identity matrix, the item A ij of the sampling matrix is independently and identically distributed, and )represents the noise term, so (8) can be regarded as the sum of the original signal and the noise term.

B. Deep Unfolding Network
In recent years, the use of deep unfolding networks for image reconstruction has become a research hotspot.Deep unfolding networks can expand the iterative process of interpretable iterative optimization reconstruction algorithms into multi-layer neural networks.Zhang et al. successfully expanded ISTA into a multi-layer neural network and proposed ISTA-Net.The specific method is to update (3)-( 5) to ( 9)-( 11) [21]: Algorithm 1: The Forward Propagation of AMP-Net.
where f (•) is a sparse transformation operation, consisting of a convolution layer and activation function ReLU, and F (•) is the inverse process of f (•).
Zhang et al. expanded the iterative solution process of AMP into the form of a multi-layer neural network, and further analyzed the noise terms related to image compressed sensing to introduce denoising priors, thus proposing AMP-Net.The sampling matrix in AMP-Net is jointly optimized and trained with other parameters, which can enhance image reconstruction performance.The AMP-Net forward propagation process is shown in Algorithm 1 [22], where X is the raw data, B is the initialization matrix, S α is the control parameter, N(•) is the nonlinear learnable function, all trainable parameters are S Θ .B(•) has the same structure as N(•) but different parameters, where all trainable parameters are S Ω , vec(•) is the vectorization function that transforms the image block vector into a vector, and K is the number of iterations.
Although ISTA-Net and AMP-Net combine the advantages of traditional iterative algorithms and deep networks, they not only improve the performance of traditional iterative algorithms and give the network clear interpretability.But just expanding a specific iterative recovery algorithm into the form of a network and using only one image prior will greatly hinder the transmission of network information and lose more image details.Different from the deep unfolding network that only utilizes one kind of prior information, we propose a deep unfolding network that integrates multiple prior information, and we use multiple loss terms to make the network converge faster and obtain better performance.

A. Network Architecture
The MPDU-Net proposed in this article is shown in Fig. 1.MPDU-Net includes sampling subnetwork, initial reconstruction subnetwork and deep reconstruction subnetwork.The sampling subnetwork uses unbiased convolutional layers to implement the sampling process.The initial reconstruction subnetwork is used to initially reconstruct low quality images.In order to simultaneously utilize both sparse prior and denoising prior information to reconstruct high quality images.Inspired by ISTA-Net and AMP-Net, we propose Sparse Prior Information Module (SPIM) and Denoising Prior Information Module (DPIM).And to fuse these two modules containing different prior information into the deep reconstruction subnetwork, we designed three fusion methods: parallel, alternating cascade and sequential cascade, which will be explained in Section III-C.We introduce the measurement matrix into the deep reconstruction subnetwork to achieve joint optimization of sampling and reconstruction.

B. Sampling and Initial Reconstruction Subnetwork
Inspired by OPINE-Net sampling and initial reconstruction process [23], We cut the input image , and then reshape each image patch into a vector.Reconstruct each row of Φ into a convolution kernel of size Therefore, the sampling process can be described by a convolution with stride √ N as: Fig. 1 shows a tangible example of compressive sensing with a 0.25 sample rate applied to a 33 × 33 picture block.The sampling subnetwork reconstructs the measurement matrix Φ into 272 size 33 × 33×1 filters, which are then used to convolve the image patch of size 33 × 33 and obtain the measurements y of size 1 × 1×272.The advantage of using convolutional sampling in the sample subnetwork is that it scales seamlessly to multi-block reconstruction.
We use Φ T to complete the initial reconstruction process.Each row of Φ T is reconstructed into a convolution kernel of Authorized licensed use limited to the terms of the applicable license agreement with IEEE.Restrictions apply.size M × 1 × 1, that is, N filters are obtained, each filter is of size M × 1 × 1. Use the reconstructed Φ T to obtain a tensor Φ T y of size 1 × 1 × N, and then use PixelShuffle to reconstruct Φ T y into an initial reconstructed image of size √ N × √ N × 1. Fig. 3 shows the implementation process of the PixelShuffle layer.Fig. 1 shows that a 1 × 1 × 272 tensor is reconstructed into a 33 × 33 × 1 tensor through the initial reconstruction subnetwork.
So the initial reconstruction process can be expressed as: Although x (0) captures some basic characteristic information of groundtruth, its quality and details are significantly lower than groundtruth, which needs to be further optimized and enhanced through deep unfolding subnetworks.
Existing deep learning-based reconstruction methods usually introduce a large number of parameters during the initialization process [25], [26], [27], [28].The initialization reconstruction subnetwork in Fig. 1 only uses the measurement matrix and does not introduce additional parameters.Therefore, MPDU-Net can effectively reduce the number of parameters.

C. Deep Reconstruction Subnetwork
As shown in Fig. 1, our deep reconstruction subnetwork is composed of SPIM and DPIM, two modules containing different prior information.SPIM is responsible for learning image sparse priors.Sparse prior information helps preserve the texture and structure of images.DPIM is responsible for learning image denoising priors.Denoising priors help reduce noise in images.Considering that these two modules integrate different prior information, fusing different prior information together helps the deep neural network learn image features from multiple aspects, thereby obtaining a richer and more comprehensive feature representation.However, different fusion methods will have different impacts on network performance, so in order to better utilize the different prior information of the two modules, we propose three different fusion methods: parallel, sequential cascade and alternating cascade.
Sparse Prior Information Module (SPIM): Inspired by ISTA-Net, SPIM is defined according to (3) and ( 4), and SPIM consists of a fixed number of stages, each of which corresponds to an iteration, as shown in Fig. 4.Among them, the SPIM of the kth stage consists of two modules, r (k) and x (k) , which correspond to the above two updating steps (3) and ( 4) respectively.
The r (k) module is directly defined according to (3), where the step size ρ becomes a learnable parameter.The x (k) module that solves (4) is expressed as: Where D is N f filters, H represents sparse transformation, which consists of four convolution layers and three ReLU activation functions.H is the reverse operation of H. G consists of three convolution layers and two ReLU activation functions.δ is a parameter in the soft threshold function, which can be learned in the network Denoising Prior Information Module (DPIM): Inspired by the denoising perspective of the AMP algorithm, DPIM is established by unfolding the iterative denoising process.As shown in Fig. 5, DPIM consists of a denoising module and a deblocking module, which is obtained by mapping the iterative denoising process to a deep network.Each denoising module and deblocking module represents an iteration.
The denoising module of the kth DPIM can be expressed as: Where X k−1 is the input of the module, vec(•) is the operation of vectorizing the image block, N k (•) is a trainable nonlinear Authorized licensed use limited to the terms of the applicable license agreement with IEEE.Restrictions apply.Because the denoising module reconstructs image blocks, it may cause blockiness in the reconstruction results.Therefore, we design a trainable deblocking module to eliminate blockiness and further improve reconstruction performance.The deblocking process of the kth DPIM can be expressed as: Where X k In represents the output of DPIM, and the structure of B k (•) is the same as N k (•), but the parameters are different.

D. Parallel Fusion Method
Fig. 6 shows a schematic diagram of the deep reconstruction subnetwork structure based on the parallel fusion of SPIM and DPIM.There are two parallel stages 1 and 2 in the Fig. 6.Stage 1 contains 6 cascaded SPIM modules, which are responsible for learning the sparse prior information of the image.Stage 2 contains 6 cascaded DPIM modules, which are responsible for learning prior information for image denoising.The two stages accept the same input at the same time, and finally connect their outputs together.The parallel fusion method can simultaneously utilize the prior information of the two stages.
First, the initialization result x (0) is input into the two stages at the same time and the corresponding output is generated.Finally, their outputs are added and fused together to obtain the final result x rec .This process can be expressed as (19).
Where F 6 (•) represents 6 SPIM cascades, G 6 (•) represents 6 DPIM cascades.First, the initialization result x (0) is input into stage 1, and then the output x (1) of stage 1 is input into stage 2 to obtain the final result x rec .This process can be expressed as (20).

F. Alternating Cascade Fusion Method
Fig. 1 shows a schematic diagram of the deep reconstruction subnetwork structure based on the alternating cascade fusion method of SPIM and DPIM.There are three stages in the deep reconstruction subnetwork, each stage contains a SPIM and a DPIM, so sparse priors and denoising priors can be learned alternately.The input passes through three stages in sequence to obtain the final output.The alternating cascade method alternately cascades the outputs of SPIM and DPIM.The output of each module is used as the input of another module, and two different prior information can be used alternately.
The specific process is to input the initial reconstruction result x (0) obtained through the initial reconstruction subnetwork into stage 1, and in stage 1, the output of stage 1 is obtained through SPIM and DPIM in sequence.Then input the result of stage 1 into stage 2 and stage 3 to get the final result x rec .This process can be expressed as (21).
Where F k (•) represents the kth SPIM, and G k (•) represents the kth DPIM.

G. Loss Function Design
In order to make MPDU-Net learn the relevant parameters better, we designed a loss function adapted to the network.The image sparse constraints and measurement matrix orthogonality constraints included in the loss function can further constrain the loss function and accurately reconstruct the image.
The loss function we designed consists of four parts.The final reconstruction error term loss d is the mean square error between the final output x (K) i of the network and the original input x i , it is the main guide item that can effectively guide the network to learn the image reconstruction features.the symmetry constraint error term loss c is designed as the mean square error of H (k) (H (k) (r (k) )) and r (k) , its function is to ensure the symmetry of sparse transform and sparse inverse transformation, which can help the network to perform sparse positive inverse transformation of images better.The sparsity constrained error term loss s is designed to constrain the sparsely transformed image H (k) (r) in the proximal mapping module with l 1 regularization loss, Its function is to use sparsity constraint to promote the network to generate sparse representation and recover image details better.
For the measurement matrix orthogonal constraint, we design a measurement matrix orthogonal constraint error term loss o to constrain ΦΦ T = I, its purpose is to ensure the orthogonality of the measurement matrix and improve the stability and performance of the reconstruction.The loss function is shown below: Where K represents the number of network iterations, N b represents the total number of picture blocks, N is the size of a picture block, α, β, and γ are regularization parameters, inspired by the loss function in OPINE-Net [23], we set them to 0.01, 0.001, 0.01, respectively.r (k) is the output of the gradient descent module in the kth SPIM, and Φ ∈ R M ×N is the measurement matrix.

IV. RESULTS AND DISCUSSION
We trained and tested the network on an Asus small workstation with an Intel Core i7-11700K processor and an NVIDIA GeForce RTX 3060 graphics processor with 12GB of video memory and 64GB of RAM.We use 91 natural images from Train91 to create the dataset [17] and randomly crop them into 88912 image patches of size 33 × 33 using data augmentation techniques.that is, N b = 88912 and N = 1089.For a given measurement rate, an orthogonal random Gaussian matrix is generated as the measurement matrixΦ ∈ R M ×N .In terms of training the network, the pytorch-1.10version is adopted as a deep learning framework, with measurement rates set at 0.01, 0.04, 0.1, and 0.25.Each network is trained for a specific measurement rate, optimized using an Adam optimizer.During the training, the Epoch is 200, the learning rate is 0.0001, the batch size is 64, and the end-to-end overall sampling reconstruction is adopted to jointly optimize the training strategy.During the network testing phase, use Set11 as the test set.In the network testing phase, Set11 is used as the test set.In order to evaluate the network performance, common image reconstruction task evaluation indicators are used, including Peak Signal to Noise Ratio (PSNR) and Structural Similarity Index (SSIM).In terms of parameter setting, the filter size of each convolutional layer in SPIM and DPIM is 3 × 3, the padding size is set to 1, the N f in SPIM is 32, and the number of output channels of the four convolutional layers in DPIM nonlinear function is 32, 32, 32 and 1, respectively.

A. Comparing Different Fusion Methods
The three fusion methods of SPIM and DPIM we designed respectively constitute a deep reconstruction subnetwork based on parallel fusion, a deep reconsruction subnetwork based on sequential cascades and a deep reconstruction subnetwork based on alternating cascades.We compared MPDU-Net composed of these three deep reconstruction subnetworks.
In the testing phase, we crop the test image in Set11 into 33 × 33 image blocks and use block compressed sensing for reconstruction.After the reconstruction is completed, the reconstructed small images are spliced back.Calculate PSNR and SSIM of the stitched image and the original image.For the sake of fairness in the experiment, the network depths of the three types of networks are controlled to be the same.Among them, MPDU-Net based on parallel fusion uses 6 SPIMs in phase 1 and 6 DPIMs in phase 2. MPDU-Net based on the sequential cascade fusion method uses 3 SPIMs in phase 1 and 3 DPIMs in phase 2. Each of the three stages of MPDU-Net based on the alternating cascade fusion method contains 1 SPIM and 1 DPIM.Except for the different fusion methods of SPIM and DPIM in the deep reconstruction subnetwork, the three networks are exactly the same as the initial reconstruction subnetwork, and the loss function proposed in Section III-D is used.
Table I shows the average PSNR and SSIM of the networks composed of different fusion methods in reconstructing the Set11 image.The highest PSNR and SSIM are marked in bold.As can be seen from Table I, among the three proposed fusion methods of deep reconstruction subnetworks, the parallel fusion method achieved the best results at four measurement rates.It is proved that compared with the other two fusion methods, the parallel fusion method can better utilize the prior information of the two modules.It can simultaneously integrate sparsity and denoising prior information, thereby improving the ability to extract image feature information and enabling the network to achieve better performance.Taking into account the advantages of the parallel fusion method, subsequent simulation experiments will use MPDU-Net composed of deep reconstruction subnetworks based on the parallel fusion method.
The number of SPIM and DPIM in parallel fusion mode directly affects network performance.We designed experiments to evaluate the effects of different amounts of SPIM and DPIM on image reconstruction quality.By analyzing different amounts of SPIM and DPIM, we can find the optimal configuration and achieve the best image reconstruction effect.When the measurement rate is 25%, we modify the SPIM and DPIM to 3 and 9 for the experiment respectively.The experimental results show that when there are 3 SPIM and DPIM, the average PSNR and SSIM of reconstructed Set11 images are 34.02dB and 0.9484; when there are 9 SPIM and DPIM, the average PSNR and SSIM of reconstructed Set11 images are 34.33dB and 0.9504.It can be seen from the experimental results that in the parallel fusion mode, PSNR increases significantly when the number of SPIM and DPIM increases from 3 to 6 respectively, but when the number continues to increase to 9, the increase amplitude becomes smaller and the computational complexity increases more.Therefore, considering the balance between performance Authorized licensed use limited to the terms of the applicable license agreement with IEEE.Restrictions apply.

TABLE I AVERAGE PSNR (DB)/SSIM OF RECONSTRUCTED SET11 IMAGES UNDER DIFFERENT MEASUREMENT RATES UNDER 00TBDATA00-CAO-3407573DIFFERENT
FUSION METHODS IN THE DEEP RECONSTRUCTION SUBNETWORK Fig. 8. Joint reconstruction method structure diagram.and computational complexity, we set the number of SPIM and DPIM in parallel fusion mode to 6.

B. Comparing Different Reconstruction Methods
In Section IV-A, the sampling reconstruction method used in our test stage is to crop the large image into small nonoverlapping image blocks, and use block compressed sensing for sampling reconstruction, and finally perform image stitching.This independent reconstruction method of individual image blocks will cause blockiness and affect the reconstruction effect.To solve this problem, we extend the independent reconstruction of image patches of MPDU-Net in the test stage to the joint reconstruction of the entire image.As shown in Fig. 8, assuming that the test image size is 99 × 99, there is no need to crop it here, but directly use the complete image as input.Since the measurement matrix in MPDU-Net is convolutionally reconstructed with a convolution step size of 33, the image can be divided into 9 blocks for independent sampling during the sampling phase, and the image can be jointly reconstructed during the reconstruction phase.
We conduct comparative experiments on these two reconstruction methods.The input of the independent reconstruction method is cropped image blocks, and the input of the joint reconstruction method is the complete image.Except for the different inputs, all other parameters remain the same.We discuss the network performance when the measurement rate is 0.01, 0.04, 0.1 and 0.25.
Fig. 9 shows the results of MPDU-Net reconstruction at four measurement rates under the independent reconstruction mode and the joint reconstruction mode.As can be seen from Fig. 9, when the measurement rates are 0.25, 0.10 and 0.04, the visual effects of the two reconstruction methods are not much different.However, when the measurement rate is 0.01, the image obtained by independent reconstruction is not only fuzzy, but also presents a serious blocky artifact phenomenon.On the contrary, although the image obtained by the joint reconstruction method is fuzzy, it does not show blocky artifacts.These results show that the joint reconstruction method can effectively reduce the artifacts and discontinuities caused by block processing, and further improve the visual quality of reconstructed images.
Table II shows the average PSNR and SSIM of Set11 reconstructed by different reconstruction methods, where the highest PSNR and SSIM are marked in bold.The results show that the joint reconstruction method is better than the independent reconstruction method at four sampling rates.It shows that the joint reconstruction method can make up for the shortcomings of independent reconstruction that only utilizes intra-block relationships, and can make full use of inter-block relationships to improve image quality, remove artifacts caused by block processing, and improve image quality.We will use the joint reconstruction method in subsequent simulation experiments.

C. Comparison With Existing Algorithms
In the first two groups of experiments, it is verified that the parallel fusion method can improve the performance of the deep reconstruction subnetwork the most, and at the same time, a better joint reconstruction method is proposed.This section will integrate the above two experimental conclusions to complete the optimal MPDU-Net design.In this section, we compare our network with the traditional iteration-based reconstruction algorithm TVAL3 and two popular deep unfolding networks AMP-Net-BM and ISTA-Net + .The sampling matrix of AMP-Net-BM is learnable, ISTA-Net + uses a fixed random Gaussian sampling matrix and does not involve sampling matrix learning, and TVAL3 uses orthogonal random Gaussian matrix for sampling.In order to ensure the fairness of the comparison results, we also use the same 91 images, optimizer, learning rate, Epoch and batch size when training AMP-Net-BM and ISTA-Net+, and the number of iteration blocks is set to 6.  Fig. 10 shows the average PSNR line charts of reconstructed images with different reconstruction algorithms.As can be seen from Fig. 10, the proposed MPDU-Net achieves the best effect, the deep expansion network AMP-Net-BM has slightly worse effect, followed by ISTA-Net + .This is because MPDU-Net uses two kinds of prior information of the image, so it performs better when reconstructing the image than other deep expansion networks that only use one kind of prior information.TVAL3 is much lower than the other three deep unfolding networks, which verifies the advantages of the deep unfolding network method.Table III

D. Imaging Results on SPI System
In our previous research work, we built a single-pixel imaging system [29], the structure of which is shown in Fig. 12.The system works by utilizing parallel light emitted from a parallel light pipe to illuminate an imaging target, which is projected onto a DMD after passing through an imaging lens.The modulation of the optical signal is realized by loading a trained measurement matrix on the DMD.The modulated optical signal is focused by  the converging lens and received by the PMT, which converts the optical signal into discrete electrical pulse signals to obtain measurement values.Finally, these measurements are fed into the reconstruction network to realize the reconstruction of the image.In order to apply the network to the SPI system, the measurement matrix of the network needs to be trained as binary weights so that it can be loaded onto the DMD.At the same time, the hyperparameters of the network and training dataset remain unchanged.Through previous comparative experiments, it can be found that MPDU-Net based on parallel fusion and joint reconstruction can achieve better reconstruction results.Therefore, this section compares it with the mainstream reconstruction algorithm TVAL3.To ensure a fair comparison, we  used a binary random Gaussian measurement matrix to test TVAL3.We imaged the "camera" and "aircraft" in a custom calibration board in the SPI system at four measurement rates and reconstructed the images using different algorithms.The reconstructed image size is 33 × 33.Fig. 13 shows the imaging results of MPDU-Net and TVAL3 at different measurement rates.MPDU-Net achieves higher PSNR than TVAL3.Compared with the results of TVAL3, the images reconstructed using MPDU-Net show more image details.This is consistent with previous simulation results.
In order to verify the imaging performance of MPDU-Net on different objects, we imaged different targets again.As shown in Fig. 14, we imaged the English letters "N" and "U" in the custom calibration plate.We found that MPDU-Net cannot reconstruct clear images at the measurement rate of 0.01, but can reconstruct clear images at other measurement rates.This proves that the network proposed in this paper can be used to reconstruct different objects and can also reconstruct clear images in other situations.

V. CONCLUSION
A novel deep unfolding network MPDU-Net incorporating multiple prior information is proposed for SPI systems.To fully utilize multiple different prior information, we design SPIM with sparse prior information and DPIM with denoised prior information.In the deep reconstruction sub-network, we propose three fusion methods while inputting the sampling matrix as a learnable parameter into the subsequent deep reconstruction subnetwork.The experimental results show that the parallel fusion SPIM and DPIM can make use of different prior information more effectively, so as to obtain higher quality reconstructed images.We also introduce a joint reconstruction approach that capitalizes on the correlation between image blocks, which helps mitigate block artifacts in the reconstructed image and improves the reconstruction quality.To further optimize the network performance, we add binary constraints on the sampling matrix, orthogonality constraints, and sparsity constraints on the image to the loss function.Experimental results show that MPDU-Net has superior performance performance compared to currently available algorithms.MPDU-Net can effectively reduce the number of parameters in the initialization process, but in order to ensure that the deep reconstruction subnetwork can have enough flexibility and expressibility when combining SPIM and DPIM, two modules containing different prior information, we need a certain number of parameters to support the complexity of the model.As a result, although MPDU-Net has excellent performance in terms of image reconstruction quality, the number of parameters and running time of the entire network are not superior to other deep deployment networks.In future studies, we will further explore the method of parameter optimization and simplification.By training the sampling matrix to be binarized, MPDU-Net can be applied to SPI systems, but this binarized training method will adversely affect the network performance.In the future, we will study how to reduce the impact of measurement matrix binarization on network performance and further improve the quality of image reconstruction.

Fig. 6 .Fig. 7 .
Fig. 6.Schematic diagram of the deep reconstruction subnetwork structure based on the parallel fusion of SPIM and DPIM.

Fig. 7
Fig. 7 shows a schematic diagram of the deep reconstruction subnetwork structure based on the sequential cascade fusion of SPIM and DPIM.There are two cascaded stages 1 and 2

Fig. 10 .
Fig. 10.The average PSNR line graphs of images reconstructed by different algorithms.
Fig.10shows the average PSNR line charts of reconstructed images with different reconstruction algorithms.As can be seen from Fig.10, the proposed MPDU-Net achieves the best effect, the deep expansion network AMP-Net-BM has slightly worse effect, followed by ISTA-Net + .This is because MPDU-Net uses two kinds of prior information of the image, so it performs better when reconstructing the image than other deep expansion networks that only use one kind of prior information.TVAL3 is much lower than the other three deep unfolding networks, which verifies the advantages of the deep unfolding network method.TableIIIshows the PSNR and SSIM values of the natural images reconstructed from the Set11 dataset by different reconstruction methods.The data in bold are the highest PSNR and SSIM values among the four networks.Fig. 11 shows the reconstruction results of MPDU-Net and AMP-Net-BM Parrots at four measurement rates.The first column is the original image, and the second to fifth column is the network reconstruction pictures, where (a) is the reconstruction result of AMP-Net-BM, and (b) is the reconstruction result of MPDU-Net.In addition, the results of partially enlarged pictures are given for observation and analysis.It can be seen from the Fig. 11 that the proposed MPDU-Net is able to reconstruct more details and sharper edges.

Fig. 14 .
Fig. 14.SPI results on other objects using MPDU-Net.(a) Reconstruction results of "N" pattern.(b) Result of reconstruction of the "U" pattern.
In 2018, Zhang et al. proposed a new deep unfolding network ISTA-Net [21].It is inspired by the ISTA and is used to optimize a general L1 norm CS reconstruction model.Its reconstruction performance is better than existing iterative optimization and deep learning reconstruction methods.In 2021, Zhang et al. proposed a deep expansion model called AMP-Net [22].It does not learn regularization terms like traditional reconstruction algorithms, but is established through the iterative denoising process of the expanded approximate message passing algorithm.AMP-Net has better reconstruction accuracy than other deep unfolding network methods.

TABLE II AVERAGE
PSNR (DB)/SSIM OF RECONSTRUCTED SET11 DATASET IMAGES USING DIFFERENT RECONSTRUCTION METHODS

TABLE III PSNR
(DB)/SSIM OF RECONSTRUCTING SET11 DATASET IMAGES BY DIFFERENT ALGORITHMS