Snapshot hyperspectral imaging via spectral basis multiplexing in Fourier domain

Hyperspectral imaging is an important tool having been applied in various fields, but still limited in observation of dynamic scenes. In this paper, we propose a snapshot hyperspectral imaging technique which exploits both spectral and spatial sparsity of natural scenes. Under the computational imaging scheme, we conduct spectral dimension reduction and spatial frequency truncation to the hyperspectral data cube and snapshot it in a low cost manner. Specifically, we modulate the spectral variations by several broadband spectral filters, and then map these modulated images into different regions in the Fourier domain. The encoded image compressed in both spectral and spatial are finally collected by a monochrome detector. Correspondingly, the reconstruction is essentially a Fourier domain extraction and spectral dimensional back projection with low computational load. This Fourier-spectral multiplexing in a 2D sensor simplifies both the encoding and decoding process, and makes hyperspectral data captured in a low cost manner. We demonstrate the high performance of our method by quantitative evaluation on simulation data and build a prototype system experimentally for further validation.

Since only 1D and 2D commercial imaging sensor are available, primary hyperspectral imaging are implemented in a scanning mode, either along spatial or spectral [13,14] dimension. Although the advanced pixel count and sensitivity of array detectors boost the spatial and spectral resolution of scanning-based hyperspectral imaging systems, the requirement of steady scanning limits the speed and robustness of scanning based methods. Hence, there is a general scientific interest in investigating spectral imaging of dynamic samples such as combustion processes [15] or fluorescent probes used in biological and biomedical imaging [16,17], and snapshot hyperspectral imaging is one of the main research focuses in this area.
The typical way to perform snapshot hyperspectral imaging is mapping different spectral bands (either single narrow band or multiplexed ones) to different positions and then collecting them by one or more detectors. So far, various methods have been proposed to implement such band mapping. The integral field spectroscopy (IFS) [18,19], multispectral beamsplitting (MSBS) [20], image replicating image spectrometer (IRIS) [21,22] and multispectral sagnac interferometer (MSI) [23] are the paradigms directly using stacked optical components to split spectral channels, which complicate the system and impose strict limitation on the light path building. Another drawback of the direct mapping method is limited spectral band number. For example, the spectral band number of MSBS is dependent on the beam splitter which is upper limited by 4, and the spectral resolution of IRIS is limited to 16 wavelength bands due to lack of large-format sufficient-birefringence Wollaston polarizers. In comparison, computed tomography imaging spectrometer (CTIS) [24], multi-aperture filtered camera (MAFC) [25], image mapping spectrometry (IMS) [26] and snapshot hyperspectral imaging Fourier transform (SHIFT) [27] are more compact but need customized components, which is usually expensive and of high fabrication difficulty. Introducing new reconstruction algorithms might inspire new imaging schemes and raise the final performance. The coded aperture snapshot spectral imager (CASSI) [28] is the first imager exploiting the compressive sensing theory to recover hyperspectrum. Getting rid of simple combination of optical elements, this method demands careful calibration and heavy computational load, which limits its broad applications in the scenarios where online reconstruction is needed. In sum, a snap shot spectral imager, with simple implementation, low computation load, and high reconstruction performance are worth studying.
One inspiring work in this direction is the three-channel multispectral imaging in [29], which is based on the statistics that natural scenes are sparse in the Fourier domain [30,31]. However, limiting to three color channels can suffice in most cases. In this paper, making use of both the spatial and spectral sparsity of the natural hyperspectral data, we propose a snapshot Fourierspectral-multiplexing (FSM) method for hyperspectral imaging using a monochrome camera.
Our snapshot hyperspectral imaging method is of high performance, and also quite promising in building a compact setup. In summary, we make the following contributions: • We introduce FSM as a new way of hyperspectral imaging technique.
• We validate our method in simulation both quantitatively and qualitatively.
• We build a snapshot hyperspectral imaging prototype to verify this approach and demonstrate its practical benefits.
The structure overview of our paper is given as below. In section 2, we introduce our imaging scheme and formulation. In section 3, we set the parameters and quantitatively evaluate the effectiveness in simulation. In section 4, we demonstrate our method with physical experiments.
In section 5, we discuss the advantages and limitations of our method, the potential compact hyperspectral implementation, and the future work of our imaging system.

Imaging scheme
The architecture for the snapshot hyperspecral imaging is shown in Fig. 1, including both encoding and decoding modules. During encoding, the 3D hyperspectral data cube is sequentially projected to a low dimensional space after going through a series of wide-band spectral filters (here the number of spectral filters is 6), and we can get the projections x 1 , x 2 , · · · , x 6 . Since the spectrum of natural scenes are of low intrinsic dimension, principle components analyze (PCA) is a widely used technique for dimension reduction of hyperspectral images, six bases are sufficient for high fidelity hyperspectral data representation (see [32]). Then the projections are modulated by different sinusoidal patterns s 1 , s 2 , · · · , s 6 , respectively. Mathematically, the sinusoidal modulation is represented as point-wise product " ", and would shift the Fourier spectrum of each projection into a specific region in the Fourier domain. The sinusoidal patterns are designed to ensure few overlap among the Fourier distributions of {x i }, as shown in the bottom of Fig. 1(b). Finally, the fast coded image is recorded by a grayscale camera in an add-up

Spectral filtering
Hyperspectral datacube Fourier spectrum demultiplexing 1. The scheme of the proposed hyperspectral imaging system. (a) The hyperspectral data is spectrally filtered and projected into six images: x 1 , x 2 , · · · , x 6 , and then codified by six sinusoidal patterns s 1 , s 2 , · · · , s 6 , respectively. (b) The six sinusoidal patterns shift the Fourier distribution of six projected images away from the origin into six different regions, and the grayscale camera captures the modulated images in an add-up way. The hyperspectral images are reconstructed through a two-stage method, including: Fourier spectrum demultiplexing and linear system reconstruction. way as shown in the top of Fig. 1(b). Corresponding to the coding procedure, the decoding is straightforward. We first transform the captured image to the Fourier domain, and separate the Fourier spectrum of each projection x i according to the shifting effect of the corresponding sinusoidal modulation s i . Then we transform the separated Fourier spectrum back to the spatial domain. Finally, we back project the reconstructions to the high dimensional data cube by solving a linear system. In Sec. 2.2, we introduce the formulation of the whole encoding and decoding process in details.

Formulation
Though going through the ith spectral filter, the spectrum of the target scene is filtered with the corresponding spectral transmission, which can be represented as in which I i (λ) is the spectrum transmission of the ith spectral filter and r(λ) denoting the spectrum reflectance/transmission. Here we omit 2D spatial coordinate for brevity. Statistically, the spectra of the natural materials can be represented as a linear summation of a few (e.g., J=6) characteristic spectral basis [33], i.e.
where b j is the jth spectral basis, and α j is the corresponding coefficients. Substituting Eq. (2) into Eq. (1) [34,35], we could get In Eq. (3), I i (λ) is precalibrated and b j (λ) is trained from hyperspectral database [33]. So the hyperspectrum of the surface can be represented by J parameters α j . The six sinusoidal patterns {s i } for Fourier domain multiplexing of six projected images {x i } are designed to have as minimal overlap in the Fourier domain as possible. Generally, the sinusoidal pattern can be written as where p is the 2D spatial coordinate, ω i (i = 1, 2, · · · , 6) represents the spatial frequency, and the Fourier trainsform of Eq. (4) includes three delta functions δ(ω + ω i ), δ(ω), δ(ω − ω i ) in the Fourier domain. After applying such sinusoidal modulation to the target scene, its spatial spectrum is duplicated into three replicas centered at ω = −ω i , 0, ω i , respectively. Research in the field of natural image statistics suggests that the Fourier domain of natural images is sparse [46] and enables efficient coding and sampling of natural images [31]. Taking advantage of this prior information, if the shifted distance ω i is set properly, these three replicas can be separated to eliminate aliasing effectively. To set the sinusoidal patterns statistically, we first analyze the Fourier spectrum of natural scenes and then estimate the proper shift distance and the corresponding cropping radius accordingly (details are discussed in Sec. 3). The encoded image is recorded by the grayscale camera as In terms of reconstruction, we first transform the captured image y to Fourier domain and demultiplexing each image by Fourier spectrum cropping according to each sinusoidal pattern. After inverse Fourier transform to spatial domain, we obtain estimations of six spectral filtered images {x i }. To further remove the aliasing, we use generalized alternating projection (GAP) to improve the reconstruction using these six estimations as the initial value. GAP is basically an extended alternating projection algorithm which solves the compressive sensing problem in the transformed domain, i.e. discrete cosine transformation or wavelet domain [36,37]. The optimization problem is formulated as , subject to Y = SX and X = TW, where the capital notation Y is diagonalization of vectorized form of y [i.e., Y = diag(vec(y))], S = [S 1 , S 2 , · · · , S 6 ] and X = [X 1 , X 2 , · · · , X 6 ] T , with S i = diag(vec(s i )) and X i = diag(vec(x i )) correspondingly. T is wavelet transformation matrix, and W is corresponding coefficient in the transformed domain. The · G β 2,1 is weighted group-2,1 norm, calculated as where W G k is a subvector of W containing components indexed by G k of each group with β k weighted this group, and m is group number. The Eq. (6) would converge with desired accuracy after a number of iterations. After de-aliasing, we obtain estimations of six spectral-filtered images {x i } with high quality.

Parameter setting and quantitative evaluation
The modulation number (i.e., the number of sinusoidal projections) is set as 6 to trade off between Fourier space overlappping and reconstruction quality. We conduct a statistical analysis over the multi-spectral database built by the CAVE laboratory at Columbia University [38] to evaluate the influence of Fourier copping based demultiplexing method, in terms of peak signal-to-noise ratio (PSNR), root mean square error (RMSE) and structural similarity (SSIM). Setting the cropping circle radius to 1/8 of the vertical/horizontal pixel resolution, these three metrics maintain at 32.0 dB, 0.03 and 0.94 on average. Therefore, although multiplexing six channels in the Fourier domain reduces the vertical/horizontal pixel resolution to 1/4 ( Fig. 1 (b)), the abundant pixel resolution of current arrayed sensors and the natural image sparsity help to largely reduce the distinction before and after cropping in spatial frequency and the reconstruction quality is reasonably high.
During de-multiplexing of the spatial projections from the recorded encoded measurement, as aforementioned, we use GAP algorithm instead of simple cropping to suppress aliasing. We also make use of the symmetric distribution of nature images' the spatial spectrum, and take average over the pair of corresponding regions before transforming back to spatial domain. The simulation on the CAVE multi-spectrum database reveals 3.05 dB improvement on average by  introducing GAP optimization. Fig. 2 displays the result on an example image with and without GAP optimization. We can clearly see the improvement introduced by GAP, in terms of PSNR, RMSE, and SSIM.
To evaluate the performance of our hyperspectral imaging system quantitatively, we test the imaging accuracy on the CAVE's multi-spectral image database. For each example, we simulate the encoded image based on the six sinusoidal patterns {s i } and CW's spectral responses {I i } demonstrated in Fig. 4(a). The reconstruction results are quite promising, with PSNR averagely over 28.4 dB, as shown in the left subfigure of Fig. 3. For a clearer demonstration, we also display two scenes with the lowest PSNR in the right part: Beads and Sponges, each with the ground truth (upper row) and corresponding reconstruction (lower row) displayed in comparison. We can see high fidelity reconstruction are achieved. We build a prototype to verify our imaging system, as shown in Fig. 4(a). A broadband point light source (Energetic EQ-99) is collimated by a reflective beam collimator (Thorlabs RC04SMA-F01). Then a grayscale sinusoidal pattern module (GSM) and a synchronized rotating color wheel are used to modulate the incoming light at high speed. To achieve fast sinusoidal modulation of generating a sequence of projections within a single shot of the camera, we adopt the same modulation method proposed in [39], with the scheme illustrated in Fig. 4  (b). Specifically, the left part of the DMD modulates the incident light beam with a dithering sinusoidal pattern, and the right half of the DMD optically filters the spatial spectrum to remove the approximation error introduced by dithering algorithm. The filter selects three dominant frequencies of an ideal sinusoidal pattern, i.e., -1, 0, and +1. Here the Fourier transform is taken by a concave mirror, which is well designed so that the right region of DMD is the Fourier plane of the left counterpart. This modulation can achieve 20 kHz sinusoidal patterning and is sufficient for our imaging scheme. The off-the-shelf six segment color wheel driven by a high speed motor (a 60k rpm Walkera Super CP motor) is placed on the Fourier plane of the 4f system for spectral modulation. After modulation, the light beam is focused by the 4f system onto the sample. The spot on the color wheel is small enough so that the large temporary overlap between two segments can be neglected. Finally, the encoded sample information is collected by a grayscale camera (Point Grey, GS3-U3-15S5M-C) after converged by a lens.

Imaging prototype and experiment on captured data
In our experiment, the pixel resolution of both sinusoidal patterns displayed in DMD and the detector are 768 × 768. Therefore, we crop the Fourier modules with the radius of 96 pixels (12.5% of 768). We use external trigger mode for synchronization: the trigger out signal per circle of the color wheel triggers the DMD for displaying the sinusoidal patterns, and again the trigger out signal of DMD triggers the exposure of the camera.
To validate the reconstruction accuracy of the hyperspectral images, we use a static color scene to evaluate our imaging system quantitatively. The test scene is strip-wisely uniform and we can calibrate the spectrum with a spectrometer. From the results in Fig. 5, we can see that the reconstructed spectrum of each patch is of great consistency with the ground truth spectrum.
To demonstrate the dynamic acquisition capability of our method, we conduct two hyperspectral reconstructions of dynamic scenes shown in Fig. 6. The first one is a color film of Santorini Island mounted on a translation stage moving at a constant speed of 3.7 mm/s. The grayscale camera works at 24 frames per second. We display two frames in Fig. 6 (a), and the hyperspectral vedio of total 274 frames is available online (Visualization 1). The second one is a colorful rotating scene ( Fig. 6 (b)) fixed on a rotary translation stage. The angular speed is 0.023 rad/s. The hyeperspectral video showing the whole process is also available online (197 frames in total, see Visualization 2). From the two reconstructions, we can clearly see that the proposed method can work well for hyperspectral imaging of dynamic scenes.

Summary and discussions
In summary, we propose a snapshot hyperspectral imaging technique jointly utilizing the spectral and spatial redundancy of hyperspectral data cube. Specifically, we conduct spectral dimension reduction and spatial frequency multiplexing under the computational imaging scheme. For reconstruction, we can resolve spectral transmission/reflectance with low computational load in realtime. Benefiting from a recently proposed fast sinusoidal modulation [40] working at up to 20 kHz and a rotator working at 1 k revolution per second (rps), our imaging speed is mainly limited by the frame rate of the adopted camera and thus can work well for the common dynamic scenes.
The spatial multiplexing scheme would reduce the pixel resolution in our scheme. To avoid resolution loss, we can introduce super resolution techniques, either single image based algorithms  or sequence based techniques which comes at expense of slightly reduced imaging speed. Moreover, utilizing the temporal redundancy of natural scenes, imaging speed can be further improved by introducing random modulation based on compressive sensing [41][42][43][44][45]. In short, our technique is quite promising to be extended to a high performance hyperspectral imaging system.

Funding
This work is supported by the National key foundation for exploring scientific instrument of China No.2013YQ140517 and the National Science Foundation of China under Grant 61327902 and Grant 61631009.