Optimization of light fields in ghost imaging using dictionary learning

Ghost imaging (GI) is a novel imaging technique based on the second-order correlation of light fields. Due to limited number of samplings in practice, traditional GI methods often reconstruct objects with unsatisfactory quality. To improve the imaging results, many reconstruction methods have been developed, yet the reconstruction quality is still fundamentally restricted by the modulated light fields. In this paper, we propose to improve the imaging quality of GI by optimizing the light fields, which is realized via matrix optimization for a learned dictionary incorporating the sparsity prior of objects. A closed-form solution of the sampling matrix, which enables successive sampling, is derived. Through simulation and experimental results, it is shown that the proposed scheme leads to better imaging quality compared to the state-of-the-art optimization methods for light fields, especially at a low sampling rate.

Different from conventional imaging techniques that are based on the first-order correlation of light fields, GI extracts information of an object by calculating the second-order correlation between the light fields of the reference and the object arms [4,5]. Theoretically, calculation of the second-order correlation requires infinite number of samplings of the light fields at both arms. In practice, however, the number of samplings is always finite, which often leads to reconstructed images of degraded signal-to-noise ratio (SNR) [19][20][21]. To address this issue, much effort has been made in designing more effective reconstruction methods for GI. On the one hand, approaches improving the second-order correlation have been proposed, which can increase the SNR of reconstructed images with theoretical guarantees [21][22][23][24]. On the other hand, by exploiting sparsity of the objects' images in transform bases (e.g., wavelets [25]), methods built upon the compressed sensing (CS) theory [26][27][28] have also been developed [29,30]. In general, the CS based methods have superior performance over those relying on the second-order correlation, especially for imaging smooth objects [31].
While improving the reconstruction methods has greatly promoted the practical applications of GI, there has been increasing evidence that the reconstruction quality of GI may be fundamentally restricted by the sampling efficiency [32,33], i.e., how well information of objects is acquired in the samplings. To enable a satisfactory reconstruction from limited number of samplings in GI, a natural way is to enhance the sampling efficiency. In fact, this can be realized by optimizing the light fields of GI; see [32][33][34][35] and the references therein. Considering an orthogonal sparsifying basis, Xu et al. [35] optimized the sampling matrix in order for their product, so called the equivalent sampling matrix, to have the minimum mutual coherence, which results in much refinement of the imaging quality. Though orthogonal basis is widely suitable for sparse representation of natural images, for images from a specific category, dictionary learning [36,37] usually produces much sparser representation coefficients, suggesting room for further improvements of the reconstruction quality. Motivated by this, in this paper we propose to optimize the light fields of GI for a sparsifying basis obtained via dictionary learning. By minimizing the mutual coherence of the equivalent sampling matrix, the proposed scheme enhances the sampling efficiency and thus achieves an improved reconstruction quality. In comparison with the state-of-the-art optimization methods for light fields in GI, the superiority of our scheme is confirmed via both simulations and experiments. The main advantages of the proposed scheme is summarized as follows: • Inspired from some previous researches in CS [38,39], we formulate the problem of minimizing the mutual coherence of the equivalent sampling matrix in GI as a Frobeniusnorm minimization problem, which yields a closed-form solution that depends on the sparsifying basis only. To the best of our knowledge, the suggested solution of the light fields is the first closed-form result in the GI optimization field.
• The proposed scheme enables successive samplings. In GI, successive samplings means that when more samplings are available (or needed), one can simply augment new rows to the currently optimized sampling matrix in order to form a new one, without the need to perform additional optimization over the entire matrix. Such feature can bring great convenience to the practical applications of GI and was not addressed in previous works.
It is worth mentioning that matrix optimization based on dictionary learning has also been studied in the CS literature, see, e.g., [38][39][40]. However, the optimizations in [38,39] were carried out over sampling matrices of fixed sizes, which does not allow successive samplings. Although Duarte's method [40] dealt with the matrix optimization problem of alterable sampling size, it is also not compatible to GI because of the demanding quantization accuracy. Moreover, those methods all fail to cope with the non-negative nature of sampling matrices in GI.

The Proposed Scheme
The detection process in GI can be approximately formulated as [30] where y ∈ R M stands for the signal measured by the detector in the object arm, Φ ∈ R M×N is the sampling matrix consisting of the light-field intensity distribution recorded by the detector in the reference arm, x ∈ R N signifies the object's information to be retrieved, and n denotes the detection noise. Let Ψ be the sparsifying basis obtained via dictionary learning, in which x can be sparsely represented as x = Ψz, where z is the sparse coefficient vector. Also, consider the equivalent sensing matrix D := ΦΨ. Then, (1) can be rewritten as Evidences from the CS theory have revealed that a matrix D well preserving information of the sparse vector z guarantees a faithful reconstruction [26][27][28]. As a powerful measure of information preservation, the mutual coherence µ(D) characterizes how incoherent each column pairs in D are [26,41], namely, with d i being the i-th column of D, K the number of columns in D and · 2 the 2 -norm. For its simplicity and ease of computation, the mutual coherence µ (D) has been widely used to describe the performance guarantees of CS reconstruction algorithms. For example, exact recovery of sparse signals via orthogonal matching pursuit (OMP) [42] is ensured by µ (D) < 1 2k−1 [43], where k is the sparsity level of input signals. In this work, with the aim of enhancing the sampling efficiency, we employ µ(D) as the objective function to be minimized in our optimization scheme. In particular, our proposed scheme consists of the following two main steps: • Firstly, an over-complete dictionary Ψ is learned from a collection of images, under the constraint that its first column has identical entries N −1/2 , while each of the other columns has entries summing to zero. Specifically, given X = [x (1) , x (2) , · · · , x (K) ] ∈ R N ×L , in which each column is a reshaped vector of the training image sample, the sparsifying dictionary Ψ ∈ R N ×K is learned by solving the following problem: where · F and · 0 are the Frobenius-and 0 -norm, respectively, Z = z 1 , z 2 , · · · , z L ∈ R K×L represents the sparse coefficient matrix of training images, and T 0 denotes the predetermined sparsity level of vectors z i . In this work, we shall employ K-SVD as a representative method to perform the dictionary learning task, which results in simultaneous sparse representation of input images in the learned dictionary Ψ. Readers are referred to [37] for more details of the K-SVD method.
• Secondly, the sampling matrix Φ is optimized by minimizing the mutual coherence of the equivalent sampling matrix D. Put formally, The non-negative constraint Φ i j ≥ 0 is imposed due to the fact that the intensity of light fields is always non-negative.
We now proceed to solve the optimization problem in (5). Without loss of generality, assume that matrix D has 2 -normalized columns, that is, d i 2 = 1 for i = 1, · · · , K. Then, To optimize µ(D), it suffices to minimize the off-diagonal entries of the Gram matrix D D, each of which corresponds to the coherence between two different columns in D (i.e., In particular, we would like the Gram matrix to be as close to the identity matrix as possible, namely, Ψ Φ ΦΨ ≈ I. Since replacing the identity matrix with Ψ Ψ yields a sampling matrix robust to the sparse representation error of images [44], we propose to optimize Φ via By multiplying Ψ and Ψ on the left-and right-hand sides of both terms inside the Frobenius norm, respectively, one has After substituting ΨΨ with its eigenvalue decomposition VΛV , and also denoting W := ΛV Φ , (8) can be rewritten as or equivalently, Denoting Λ = [r 1 , · · · , r N ], (10) further becomes Clearly problem (11) has the solution W = Λ 1 , where Λ 1 is the matrix consisting of the first M columns of Λ, which is obtained by setting w k = r k , k = 1, · · · , M. Recalling that W := ΛV Φ , the optimized sampling matrix Φ can be simply calculated as where matrix V 1 consists of the first M columns of V. When more samplings become available, interestingly, it suffices to update Φ by augmenting more rows of V to the previous one, thereby enabling a successive sampling. As aforementioned, the feature of successive sampling is of vital importance to the practical applications of GI. Furthermore, due to the fact that the intensity of light fields is always non-negative, additional treatments are needed to make sure that elements of the sampling matrix are non-negative (NN). To the end, we propose a NN lifting, which adds a constant matrix to the optimized matrix Φ in (12) as where 1 M×N is an M-by-N matrix with entries being ones and As aforementioned, the first column of Ψ has identical entries and other columns have entries summing to zero. Thus, It can be noticed that after the NN lifting, D and ΦΨ differ only in the first column. Nevertheless, the mutual coherence µ( D) is not much affected by the NN lifting, as confirmed by our extensive empirical test.

Simulations
To evaluate the effectiveness of the proposed optimization scheme, both simulations and experiments are performed. MNIST handwritten digits of size 28 × 28 pixels [45] are chosen to  be the imaging objects, and the dictionary Ψ is learned based on 20,000 digits randomly selected from the training set. Moreover, the optimized sampling matrix Φ is obtained from (12), followed by the NN lifting. A subset of atoms in the learned dictionary Ψ and the optimized light-field intensity distributions Φ are shown as Fig. 1(a) and Fig. 1(b), respectively. For comparative purposes, our simulation includes other four methods: 1) Gaussian method, 2) Duarte's method [40], 3) Xu's method [35] and 4) normalized GI (NGI) method [24]. Table 1 gives a brief summary of the methods under test. In the Gaussian method, the sampling matrices are random Gaussian matrices, whose entries are drawn independently from the standard Gaussian distribution (Φ i j ∼ N (0, 1)). To meet the NN constraint Φ i j ≥ 0 of GI, the matrices Φ's generated from the Gaussian and Duarte's methods are also inflicted with the NN lifting. For the Gaussian, Duarte's and our proposed methods, the images are retrieved in two steps. Firstly, the sparse coefficient vectorẑ of image under the learned dictionary Ψ is obtained by solving the 0 -minimization problem: via the OMP algorithm [42]. Secondly, the object's image is reconstructed as For Xu's method, the Discrete Cosine Transform (DCT) basis is chosen as the orthogonal basis. And the images in methods 3) and 4) are reconstructed via approaches proposed in their corresponding references. In our simulation, we first adopt matrices with entries of double-type in MATLAB. Fig. 2 shows the simulation results of different methods. In Fig. 2(a), the reconstructed images at the different sampling ratios (SR's) (i.e., SR = 0.10, 0.20, and 0.51) are displayed, where the SR is computed by dividing the number of samplings by the number of image pixels. Fig. 2(b) and 2(c) depict the peak signal-to-noise ratio (PSNR) and structural similarity (SSIM) index of the reconstructed images as functions of the SR, respectively. Given the reference image X and the reconstructed image Y , the PSNR and SSIM are defined as follows, PSNR (X, Y ) = 10 log 10 where the pixel size of image is m × n, B denotes the dynamic range of image pixels, which takes the value 255 in this paper, (µ X , σ X ) and (µ Y , σ Y ) are the means and variances of X and Y , respectively, σ XY is the covariance of X and Y , c 1 = (0.01B) 2 and c 2 = (0.03B) 2 . These two metrics measure the difference and similarity between the reconstructed images and the original ones, respectively. For each SR under test, the PSNR and SSIM are averaged over 500 reconstructed digit images to plot the curves. By comparing them, the reconstruction quality of different methods are compared empirically. From Fig. 2, it can be observed that the reconstruction quality of the proposed scheme is uniformly better than the other methods under test. In particular, it achieves 2dB to 4dB gain of PSNR and up to 10% higher SSIM over the Gaussian method, owing to the optimized light fields. Compared to Xu's method [35], the Gaussian method have a notable advantage in the low SR region, which gradually converges as the SR approaches one. The performance gap is mainly attributed to the utilization of dictionary learning that can better incorporate the sparsity prior of images. Among all test methods, the PSNR and SSIM of the NGI method lie in the lowest level. This is mainly due to the image noise, which often happens to the correlation-based reconstruction methods of GI, especially when the SR is low. It can also be observed from Fig. 2 that Duarte's method [40] performs comparably with the proposed method in the low SR region, but deteriorating dramatically when the SR increases. Such phenomenon seems unreasonable at first glance, but can be interpreted from the condition number perspective. To be specific, the sampling matrix Φ of Duarte's method has larger condition number as the SR increases (see detailed explanations in Footnote 7 of [40]). Thus, when Φ multiplies with the representation error e = x − Ψz, it could significantly amplify this error, and eventually degrade the reconstruction quality. Indeed, in this case one would need to reconstruct the sparse vector z from the samplings y = Φx + n = ΦΨz + Φe + n = Dz + (Φe + n), which can be difficult since the largely amplified error Φe essentially becomes part of noise for the reconstruction.
In practice, detectors measure the intensity signals with quantization, which means that Φ is actually a quantized sampling matrix. Thus, we also simulate the case where the sampling matrices are quantized to 8-bit of precision and plot the results in Fig. 3. Similarly, Fig. 3(b) and 3(c) show the curves of PSNR and SSIM with averaged values over 500 reconstruction trails, respectively. We observe that the overall behavior is similar to that of Fig. 2(b) and 2(c) except that both the PSNR and SSIM curves of Duarte's method [40] fluctuate in the lowest level for the whole SR region. Accordingly, Duarte's method [40] also fails to retrieve the images in Fig. 3(a). This is mainly because Duarte's method [40] is demanding in the quantization accuracy. When large quantization errors are introduced, sparse coefficients of the test images may not be correctly calculated by the reconstruction algorithm. The PSNR and SSIM curves of the NGI method lie in a low level similar to that of Duarte's method, and the images are only vaguely reconstructed, as shown in Fig.3(a). We also observe that the performance of Xu's method [35] becomes worse in the quantized case, although the images can still be retrieved. Overall, our proposed method performs the best for both the high accurate as well as the quantized scenarios.

Experimental Results
We also experimentally compare the proposed method with the Gaussian method, Duarte's method [40] and NGI method [24]. The schematic diagram of experimental setup is shown as Fig. 4. The light-field patterns are first displayed on the digital micro-mirror device (DMD) after preloaded via the computer. Next, light from a light-emitting diode (LED) source is modulated by a Kohler illumination system to be evenly incident on DMD. The light reflected by DMD is then projected onto the imaging object by a lens system. Finally, the whole light reflected from the object is collected by the lens and measured by the detector. In our experiments, the light-field patterns are displayed at a rate of 10Hz to avoid frame dropping of the detector, so that the sampling procedure lasts for one minute or so. After the samplings, the subsequent image   Fig. 5(a), where the ground truth is obtained by pixel-wise detection and serve as a reference image. By comparing the reconstructed images with the ground truth, again, the PSNR and SSIM are calculated and plotted in Fig. 5(b) and 5(c), respectively. Overall, the experimental results demonstrate that the reconstructed quality of the proposed optimization scheme is superior to that of other methods under test, which well matches our simulation results.

Discussions
We would like to point out some interesting points that arise from the simulation and experimental results.
• Firstly, the superiority of the proposed method is mainly owing to two factors: i) optimization of sampling matrix and ii) dictionary learning. Indeed, the proposed optimization scheme outperforms the Gaussian method, even though they share the same spasifying basis obtained by dictionary learning. This is because our method essentially performs a "global" optimization of light fields that incorporates the image statistics captured in the dictionary learning process, thereby enhancing the sampling efficiency. Besides, the improvement of the proposed method over Xu's method can be attributed to the use of both dictionary learning and our optimized sampling matrix.
• Secondly, the PSNR and SSIM curves of the proposed method tend towards flat after the SR reaches a critical value. This in turn implies that the inherent information of the imaging object acquired at this very SR value already suffices to produce a satisfactory reconstruction. The critical SR can thus be utilized to evaluate the capability of information acquisition and also allows the comparison of different approaches.
• Thirdly, we would like to point out a practical limitation of the proposed scheme in handling images of large size due to the use of dictionary learning. Specifically, while dictionary learning in our scheme can bring in some performance gain, it is usually demanding in the requirements of storage and computational cost. Thus the patch size used in dictionary learning should not be large, which, however, poses a limitation to the image size that we can handle. Nevertheless, efficient dictionary learning methods dealing with images of larger scales have recently been proposed [46][47][48], in which the handled image size can go beyond 64 × 64 pixels. To demonstrate the effectiveness of the proposed method for images of larger size, we carry out simulations over the LFWcrop database [49], which consists of more than 13, 000 images of 64 × 64 pixels. The dictionary is trained offline using the algorithm in [47] with 12, 000 images, which takes about 20 hours in our industrial computer. The results of the proposed method and the Gaussian method, which involve dictionary learning, are shown as Fig. 6 for comparison.
• Finally, we mention that if one wish to deal with images of even larger size such that existing dictionary learning methods fail to handle or cannot learn the images offline, then the proposed light-field optimization scheme can still be applied by using explicit dictionaries (e.g., Cropped Wavelets [47]) to incorporate the sparse prior of images.

Conclusion
In this paper, an optimization scheme of light fields has been proposed to improve the imaging quality of GI. The key idea is to minimize the mutual coherence of the equivalent sampling matrix in order to enhance the sampling efficiency. A closed-form solution of the sampling matrix has been derived, which enables successive sampling. Simulation and experimental results have shown that the proposed scheme is very effective in improving the reconstruction quality of images, compared to the state-of-the-art methods for GI. The proposed scheme can thus be used to imaging specific targets with higher quality. We would also like to point out a technical limitation in our scheme. Recall that we have employed a NN lifting to cope with the constraint Φ i j ≥ 0. This operation, however, may severely influence the incoherence of the equivalent sampling matrix in the worst case, though such situation rarely happens as confirmed by our empirical test. Deriving analytical results can better address the non-negative issue while also would require a bit more effort, and our future work will be directed towards this avenue.