Compressive spectral image classification using 3D coded convolutional neural network

Hyperspectral image classification (HIC) is an active research topic in remote sensing. Hyperspectral images typically generate large data cubes posing big challenges in data acquisition, storage, transmission and processing. To overcome these limitations, this paper develops a novel deep learning HIC approach based on compressive measurements of coded-aperture snapshot spectral imagers (CASSI), without reconstructing the complete hyperspectral data cube. A new kind of deep learning strategy, namely 3D coded convolutional neural network (3D-CCNN) is proposed to efficiently solve for the classification problem, where the hardware-based coded aperture is regarded as a pixel-wise connected network layer. An end-to-end training method is developed to jointly optimize the network parameters and the coded apertures with periodic structures. The accuracy of classification is effectively improved by exploiting the synergy between the deep learning network and coded apertures. The superiority of the proposed method is assessed over the state-of-the-art HIC methods on several hyperspectral datasets.


Introduction
Hyperspectral imaging acquires hundreds image planes spanning wavelengths from the visible to the infrared wavelengths.The rich spectral information of hyperspectral images has been widely employed in a range of remote sensing applications, such as ecological science, geological science, hydrological science, and precision agriculture [1,2].Hyperspectral image classification (HIC) technology plays a crucial role in these applications, where a label is assigned to each spatial pixel of the scene based on its spectral signature.A large number of HIC methods have been proposed based on k-nearest-neighbors, maximum likelihood criterion, logistic regression, and support vector machine (SVM) [3][4][5][6].Over the past several years, deep learning has become one of the most efficient signal processing approaches with great potential in hyperspectral imaging and classification [7][8][9].
Traditional HIC methods need three-dimensional (3D) spatio-spectral datasets, which are captured by whisk broom or push broom scanning spectral imaging systems [10,11].However, these systems require a time-consuming scanning process in the spatial or spectral domain, and lead to a big data size to be stored, transmitted, and processed.To overcome these limitations, Wagadarikar et al. introduced the concept of coded aperture snapshot spectral imaging (CASSI) system based on the compressive sensing (CS) theory [12].The CASSI system simultaneously senses and compresses the 3D spectral data cube with just a single or a few two-dimensional (2D) projection measurements [13][14][15][16].The complete 3D spectral data cube can be reconstructed from the compressive measurements, which can then be used for classification and processing.However, spectral image classification based on CASSI is a challenging task since the reconstruction procedure is very time-consuming and noise sensitive.
Recently, a supervised compressive spectral image classifier was proposed for CASSI system, where the hyperspectral pixel was approximately represented as a sparse linear combination of samples in an overcomplete training dictionary [17].The sparse coefficients were recovered from a set of CASSI compressive measurements to determine the clusters of the unknown pixels.Although this method does not need to reconstruct the complete 3D spectral data cube, recovering the sparse coefficients for all hyperspectral pixels is still computationally intensive.In addition, to improve the accuracy of compressive spectral image classifiers, two improvements in the compressive classifier system were made.One is the measurement stage and the other is classification stage.In the measurement stage, the coded apertures were optimized based on the restricted isometry property (RIP), which is a widely used criterion to obtain the optimal reconstruction performance in CS theory [17].In the classification stage, the sparse dictionary was optimized and different sparsity-based classifiers were proposed [18][19][20].However, the coded apertures with optimal reconstruction performance are not necessarily the best to achieve the highest classification accuracy, since the reconstruction itself may introduce unexpected artifacts that are not supported by the compressive measurements.In addition, current methods ignored the codependency between the measurement stage and classification stage, which limits the improvement of classification accuracy.Recently, the combination of optics and deep learning has become a trend, and some end-to-end optimization approaches of optics and image processing were proposed [21][22][23].This paper proposed a novel deep learning approach, namely 3D coded convolutional neural network (3D-CCNN) that efficiently solves the HIC problem directly from the compressive domain without reconstruction, and jointly optimizes the coded apertures.As shown in Fig. 1(a), the CASSI system with a dual-disperser architecture (DD-CASSI) is used in the measurement stage to capture the compressive measurement of a target scene.In DD-CASSI, the hyperspectral data cube is first shifted by the front dispersive element, then modulated by a coded aperture in spatial domain, and finally shifted back using the second dispersive element [14].After that, the encoded hyperspectral data cube is projected onto a 2D integrated detector.In the measurement process, we switch different coded apertures to capture a set of compressive measurements.Different from the single-disperser-based CASSI system [12,13], the compressive measurements in DD-CASSI have the same spatial dimensions as the target scene.Each detector element receives the information from all spectral bands with different codes.These characteristics enable us to decompose the imaging model of DD-CASSI into patch-based models, whose dimensionality is consistent with the following classification network, since the classification is implemented in a patch-based manner.
As shown in Fig. 1(b), the classification stage consists of a 3D convolutional neural network (3D-CNN) to predict the classification map directly in the compressive domain, without reconstructing the complete 3D data cube.Given the correlation of hyperspectral data across both spatial and spectral dimensions, the 3D-CNN takes the compressive measurement patches as the input.In order to obtain the optimal coding from a small training subset of the hyperspectral data, the coded apertures are designed as periodic patterns to reduce the independent optimization variables.Thus, we only need to optimize one period of coded apertures, and then periodically extend it to the entire coded pattern.Taking the full advantages of the patch-based model, an end-to-end training method is proposed to jointly optimize the coded apertures and the classification network parameters.In this work, the coded aperture optimization and hyperspectral image classification are concatenated into a single system, dubbed 3D-CCNN, which effectively increases the degrees of optimization freedom and improves the classification accuracy.
The main contributions of this paper are twofold.First, we integrate deep learning with CASSI to solve for the classification problem directly in the compressive domain, thus avoid the time-consuming reconstruction procedure and alleviate the influence of reconstruction artifacts.Second, the hardware-based coded aperture and software-based classification network are unified into one framework, coined 3D-CCNN.This paper is thus bridges the gap between coded aperture design and classification to increase the degrees of optimization freedom.Then, the end-to-end training method is used to jointly optimize the coded apertures and network parameters, which effectively improves the classification accuracy.The superiority of the proposed method over some of the state-of-the-art approaches is verified by a set of simulations.This is the updated version of the articles arXiv:2009.11948v1[eess.IV] and arXiv:2009.11948v2[eess.IV], which are respectively entitled "Compressive Spectral Image Classification using 3D Coded Neural Network" and "Joint coded aperture optimization and compressive hyperspectral image classification using 3D coded neural network".This updated version includes several improvements over the previous versions.The simulation results have been corrected and updated, and the method to generate the coded apertures has been modified.Several data sets, images, and descriptions have also been improved and updated.The contributions in the paper and the modifications in the revision have been provided by the corresponding author Xu Ma, and all of the coauthors.

Forward imaging model of DD-CASSI system
As shown in Fig. 1(a), DD-CASSI system employs two opposite dispersers and a coded aperture to encode the hyperspectral data cube in both, the spatial and spectral domains [14].Let ( ) 0 ,, f x y  be the hyperspectral data cube of the target scene, where x and y are the spatial coordinates, and  is the spectral coordinate.The hyperspectral data cube is first laterally shifted as a function of wavelength by the front disperser to form the skewed data cube, which is then projected by an imaging lens onto the coded aperture plane.The skewed data cube is modulated in the spatial domain by the coded aperture whose transmission function is denoted by () T x, y .Subsequently, the coded source planes are shifted back into a standard cube by the second disperser, and integrated along the  axis on the 2D focal plane array (FPA) detector.The dispersion effect enables the coded aperture to introduce distinguishable spatial modulations in different spectral bands.The measurement intensity on FPA detector can be formulated as [14]: where  and c  are the linear dispersion rate and the center wavelength of prisms, respectively.
Due to the pixelated nature of detector array, the continuous model in Eq. ( 1) can be transformed into a discrete form.Suppose we take K snapshots in total with different coded apertures, and k T represents the coded aperture pattern used in the kth snapshot.Then, the kth snapshot measurement on FPA is given by 1 , , , , 0 where i and j are the pixel coordinates in spatial domain, and l is the pixel coordinate in spectral domain; where k H is the system matrix representing the effect of the kth coded aperture and the dispersers, and k  is the vector of measurement noise.Taking into account all of the K snapshots, the measurements can be concatenated together, and the forward imaging model becomes [15,24,25]:

3D-CCNN approach for hyperspectral image classification
In this section, we build up a seven-layer 3D-CNN to solve the classification problem directly in the compressive domain.Then, we decompose the forward imaging model of the DD-CASSI into patch-based models, and the periodic design of coded apertures is introduced.The coded aperture and classification network are further connected into a uniform framework, namely 3D-CCNN.The joint training method of 3D-CCNN is presented at the end of this section.The sketch of the 3D-CCNN framework is shown in Fig. 3.

Compressive spectral image classification using 3D-CNN
The DD-CASSI system is used to acquire several compressive measurements with different coded apertures.Initially, random coded apertures were used in DD-CASSI.Assume the hyperspectral data cube of target scene consists of NM  spatial pixels and L spectral bands.
Taking K snapshots, we can obtain a compressive measurement data cube with dimension of The goal is to solve the HIC problem directly from the compressive measurements without reconstruction.
Recently, deep learning has been proved to render accurate semantic interpretation of the underlying datasets [29].Given the 3D nature of the compressive measurement data cube, the 3D-CNN framework is chosen to perform the classification task, since it can simultaneously exploit information from all measurement slices with different coding, which is essential to improve classification performance [8,9,[29][30][31].
As shown in Fig. 3, the HIC problem is pixel-based, where each spatial pixel on the hyperspectral images is associated with a specific classification label.Note that the pixels inside a small neighborhood often reflect relevant information of the underlying objects or materials.Thus, the information of measurement data surrounding a pixel is helpful to improve the classification accuracy of that pixel.For each pixel under consideration, we truncate a small patch around it from the compressive measurement data cube.The dimension of the patch is PPK  , where PP  is the spatial size, and K is equal to the number of compressive measurements.The dimension P is often chosen as an odd number to keep the symmetry.The center of the patch is located on the pixel to be classified.Then, the patch is used as the input of the 3D-CNN, and the output is the classification label of the central pixel.Next, we describe the structure of the 3D-CNN in more detail.
Talking about the depth and width of the 3D-CNN is a very rich debate that generates a lot of questions.However, it has been recently proved that one of the keys for better performances is to find the right balance between the network's depth and width [29].To harmonize the cost and accuracy of a deep network, the 3D-CNN built up in this paper consists of 7 layers, including 6 convolutional layers and 1 fully connected layer.In addition, the first and second layers are characterized with 20 filters, whereas the rest of the layers have 35 filters.As shown in Fig. 3 (b), the convolutional layers transfer the input data to a series of 3D feature maps, which are gradually reduced into a 1D feature vector.The 1D feature vector is inputted to a fully-connected layer, the output of which is then fed into a Softmax classifier to calculate the classification result.From the first layer to the sixth layer, the dimensions of the 3D convolution kernels are 20 3 3 3  (i.e., 1 3 35  " means that there are twenty 3D-kernels with dimension of 333  (i.e., two spatial dimensions and one spectral dimension).Denote W as the parameter set of the 3D-CNN, including all the convolution kernels, weights and biases.

Patch-based model with periodic coded apertures
To keep the consistence of the dimensionality, we first decompose the forward imaging model of DD-CASSI into patch-based models.As shown in Fig. 4, we first divide the hyperspectral data cube and compressive measurements in Eq. ( 4) into small patches.Let i y with dimension PPK  be the ith measurement patch truncated from the compressive measurement data cube.Define i as the kth slice of i y .Then, we can trace i k y from the detector back through the DD-CASSI system, and find out the corresponding 3D patch i s in the original hyperspectral data cube F .The patch i s is a PPL  data cube, where L denotes the number of spectral bands.Due to the first disperser, the hyperspectral patch i s is shifted into a parallelepiped, and then modulated by a coded aperture patch i k t with dimension of ( 1) P P L  + − , where i k t represents the coded aperture patch associated with i s at the kth snapshot.on the detector.The hyperspectral patch i s is modulated by a coded aperture patch To describe in detail, each spectral band of i s is modulated by different coding templates with dimension PP  due to the dispersive effect, and every coding template can be regarded as a part of the coded aperture patch i

F
, where l is the spectral coordinate.The lth spectral band of i s and the coded aperture patch i k t can be given by: x q y q l x q y q l i l x y l x q y q l x q y q l

s
, and , , 1 kk x q y q x q y q L i k kk x q y q x q y q L , and % is the remainder operation.Then, Eq. ( 7) can be rewritten as C can be concatenated together to form: where i indicates the input patch i s .The set () i C consists of all the entries in the K basic blocks associated with i s .As shown in Fig. 3(a), the coded aperture can be regarded as a pixel- wise connected layer to encode the input data cube and obtain the measurement patches, which are then inputted to the 3D-CNN proposed in Section 3.1.The effect of coded apertures is equivalent to a virtual connected layer in the 3D-CCNN framework.Then, we can jointly optimize the basic blocks of coded apertures and other network parameters using an end-to-end supervised training method.

Joint training method of 3D-CCNN
This section proposes an supervised training method for 3D-CCNN.The set of all parameters to be optimized is denoted as [ , ( )] Wi = C , where W represents the parameters in the seven-layer 3D-CNN model described in Section 3.1, and () i C represents the parameters of coded apertures defined in Eq. ( 9).In the hyperspectral data cube, we randomly choose 30% labeled samples as the training data, and the remaining 70% pixels are used for testing.
In this work, the softmax loss is used as the objective function in the training process.Suppose the input of softmax classifier is a vector where i x and j x represent the ith and the jth elements of the input vector x , respectively.The loss function is minimized using back propagation method.The network parameters are updated as: where v indicates the iteration number,  is the learning rate, and { , ( ) represents the gradient of the loss function with respect to the variables.After the training process, the leaned basic block k C can be tiled to form the complete coded aperture k T .
In this paper, the coded apertures are greyscale.At the end of the training process, we use a function to limit the value of coded apertures between 0 and 1, i.e.
As shown in Fig. 5, the blue arrows represent the end-to-end training process, and the red arrows represent the testing process.In the testing process, the optimized coded apertures obtained by training method are first manufactured and installed in the DD-CASSI system.A set of compressive measurements are captured by the detector.Subsequently, the compressive measurement data cube is decomposed into patches, which are then inputted into the 3D-CNN to obtain the classification results.

Experimental results
In this section, we evaluate the 3D-CCNN method on two public hyperspectral datasets, including the Pavia University dataset and the Salinas Valley dataset [7].In addition, we compare the proposed method with several competitive methods, including the convolution neural network and support vector machine (SVM) classifier [32,33].In the comparative experiments, we use the random coded apertures or blue noise coded apertures in the DD-CASSI system.Note that the blue noise coding strategy has been proved to be optimal for the reconstruction in CASSI [34].In this paper, the transmittance of all random coded apertures is set as 0.5.The blue noise coded apertures are generated based on the relevant method in [34].
In the practical implementation of the classification system, we to calibrate and keep alignment between the coded aperture pattern and the detector.We can make a cross mark on the coded aperture, and use a standard whiteboard to replace the target.The cross mark can be imaged on the detector, and we can adjust the position of the cross-mark image to align the coded aperture with the detector.More details of the calibration method in CASSI system can be found in literature [35].In addition, the modulation of coded aperture in real CASSI system cannot be regarded as ideal coding.Thus, we can first obtain the images of the coded apertures on the detector, and then use these images to calibrate the transmission functions of coded apertures.
The first four comparative methods are defined as follows: (1) "Rand-compress-3D-CNN" Method: Use the random coded apertures to obtain the compressive measurements, and then perform the hyperspectral classification using the seven-layer 3D-CNN model described in Section 3.1.(2) "Bluenoise-compress-3D-CNN" Method: Use blue noise coded apertures to obtain the compressive measurements, and then perform the hyperspectral classification using the seven-layer 3D-CNN model.
(3) "Rand-compress-SVM" Method: Use the random coded apertures to obtain the compressive measurements, and then perform the hyperspectral classification using the SVM classifier.(4) "Bluenoise-compress-SVM" Method: Use the blue noise coded apertures to obtain the compressive measurements, and then perform the hyperspectral classification using the SVM classifier.All of the methods mentioned above perform the classification in the compressive domain.It is noted that the hyperspectral data cube of the target scene can be reconstructed from compressive measurements by solving for an l1-norm minimization problem.The details of the reconstruction methods have been published in literature [13,36,37].It is natural to ask whether the classification accuracy can be improved by using the reconstructed hyperspectral data cube instead of the compressive measurements.To answer this question, we compare the proposed method with the following comparative methods: (5) "Rand-construct-3D-CNN" Method: Use the random coded apertures to obtain the compressive measurements, and then use 3D-CNN to perform the classification based on the reconstructed hyperspectral data cube.(6) "Bluenoise-construct-3D-CNN" Method: Use the blue noise coded apertures to obtain the compressive measurements, and then use 3D-CNN to perform the classification based on the reconstructed hyperspectral data cube.(7) "Rand-construct-SVM" Method: Use the random coded apertures to obtain the compressive measurements, and then use SVM to perform the classification based on the reconstructed hyperspectral data cube.(8) "Bluenoise-construct-SVM" Method: Use the blue noise coded apertures to obtain the compressive measurements, and then use SVM to perform the classification based on the reconstructed hyperspectral data cube.Furthermore, in this paper the proposed method is also compared with the classifiers, where the original hyperspectral data cube is assumed to be available: (9) "Original-3D-CNN" Method: Use 3D-CNN to perform the classification directly based on the original hyperspectral data cube of target scene.(10) "Original-SVM" Method: Use SVM to perform the classification directly based on the original hyperspectral data cube of target scene.In the following simulations, several indices are used to quantitatively assess the classification performance, including the overall accuracy (OA), average accuracy (AA), and Kappa coefficient (Ka).The OA is defined as the ratio of the correctly classified samples over all testing samples.The AA is the mean value of accuracy for each category.The Ka is a statistical metric that provides mutual information regarding the agreement between the ground truth map and the classification map [7].

Simulation result on Pavia University dataset
The Pavia University dataset was collected by the Reflective Optics Imaging Spectrometer (ROSIS) over University of Pavia, Italy [7].The spectral image in this dataset is characterized by high spatial resolution (1.3m per pixel) comprising 640 340  spatial pixels, and 103 spectral reflectance bands in the wavelength range from 0.43 m  to 0.86 m  .In the following, a 256 256 103  cube is truncated from the entire dataset to be used as the original hyperspectral data cube.Figure 6(a) shows the false-color composite image of the Pavia University spectral data.Figure 6(b) shows the ground truth of the classification map, which consists of nine distinct classes with different colors.Each class label corresponds to a different kind of objects in the urban cover, and the black regions represent the unlabeled pixels.From the image, 30% of the labeled pixels are randomly chosen to be used as the training samples, and the remaining 70% pixels are used for testing.Figure 7(a) illustrates one of randomly initialized coded aperture patterns, and Fig. 7(b) illustrates the optimized coded aperture pattern after joint optimization of 3D-CCNN.The difference pattern between the initial coded aperture and the optimized coded aperture is showed in Fig. 7(c).Figure 8 shows the classification results on the Pavia University dataset using the (a) proposed 3D-CCNN method, and the first four comparative methods, including the (b) Randcompress-3D-CNN method, (c) Bluenoise-compress-3D-CNN method, (d) Rand-compress-SVM method, and (e) Bluenoise-compress-SVM method.The number of snapshots is 5, which means that the compression ratio of DD-CASSI is about 5%.In the 3D-CCNN framework, the spatial dimension of each patch in both and testing sets is 77  .Thus, the patch size of the 3D-CNN input is 7 7 5 . Table I shows the classification performance of the proposed method and the first four comparative methods using the Pavia University dataset.These metrics are calculated by averaging over several runs of the experiments.From the second row to the tenth row, it shows the percentage of accurate classification for each king of objects.The last three rows provide the OA, AA, and Ka of the overall classification result.Above simulations show that the proposed 3D-CCNN method outperforms other methods directly based on the compressive measurements.The gain of the proposed method is mainly attributed to the joint optimization between coded apertures and network parameters.In addition, both of the 3D-CNN and SVM classifiers perform better on the blue noise coded apertures than the random coded apertures.That is because the blue noise coding strategy achieves more uniform sampling than the random ones, and is beneficial to capture more structure information from the target scene.Based on the same type of coded apertures, the 3D-CNN outperforms the SVM classifier, which proves the superior prediction capacity of the deep learning approach.
Figure 9 shows the classification results on the Pavia University dataset using the (a) Original-3D-CNN method, (b) Rand-construct-3D-CNN method, (c) Bluenoise-construct-3D-CNN method, (d) Original-SVM method, (e) Rand-construct-SVM method, and (f) Bluenoiseconstruct-SVM method.From the image, 30% of the labeled pixels are randomly chosen to be used as the training samples, and the remaining 70% pixels are used for testing.These methods perform the classification based on the original hyperspectral data cube or the reconstructed data cube.Table II provides the metrics of classification performance for these methods.From Fig. 9 and Table II, it is observed that the classifiers with blue noised coded apertures outperform those with random coded apertures, since the blue noise coding strategy achieves higher reconstruction quality than the random coding.Although the 3D-CNN applied on the reconstructed data cube outperforms the proposed 3D-CCNN method, it is important to remark the computational complexity to solve the reconstruction problem.In our simulations, the reconstruction process will take about 1168s.On the other hand, the 3D-CCNN only takes 57s to calculate the entire classification map.That is the proposed 3D-CCNN method will achieve more than 20-fold acceleration compared to the methods based on the reconstruction.What is more interesting, the performance of 3D-CCNN with only 5% compressive ratio is even better than the well-known SVM classifier that performs on the reconstructed full data cube.Figure 11 shows the classification results on the Salinas Valley dataset using the (a) proposed 3D-CCNN method and the first four comparative methods, including the (b) Randcompress-3D-CNN method, (c) Bluenoise-compress-3D-CNN method, (d) Rand-compress-SVM method, and (e) Bluenoise-compress-SVM method.The number of snapshots is 10, and the compression ratio of DD-CASSI is about 5%.The patch size of the 3D-CNN input is 7 7 10  .Table III provides the classification performance metrics for different methods in Fig. 11.These metrics are calculated by averaging over several runs of the experiments.It is noted that, the proposed 3D-CCNN outperforms other classification methods directly based on compressive measurements, and we can obtain similar conclusions as those obtained from Fig. 8 and Table I.  Figure 12 shows the classification results on the Salinas Valley dataset using the (a) Original-3D-CNN method, (b) Rand-construct-3D-CNN method, (c) Bluenoise-construct-3D-CNN method, (d) Original-SVM method, (e) Rand-construct-SVM method, and (f) Bluenoiseconstruct-SVM method.The metrics of classification performance for these methods are provided in Table IV.From Fig. 12 and Table IV, we can obtain similar conclusions as those obtained from Fig. 9 and Table II.It is noted that the proposed 3D-CCNN with about 5% compression ratio even outperforms the well-known SVM classifier based on the full data cube.The research work and the revision are contributed by all of the four-authors.

Conclusion
This paper develops an efficient 3D-CCNN method to perform hyperspectral classification directly based on the DD-CASSI compressive measurements.The proposed 3D-CCNN method successfully avoids the time-consuming reconstruction procedure, and the influence of reconstruction artifacts.In addition, the hardware-based coded apertures and the softwarebased 3D-CNN are combined into a uniform framework, which are then jointly optimized by an end-to-end training method to increase the degrees of optimization freedom.Based on a set of simulations, the proposed 3D-CCNN is proved to outperform the 3D-CNN and SVM classifiers based on the compressive measurements.Also, the performance of 3D-CCNN with only about 5% compression ratio is comparable or even better than the SVM classifier based on the full data cube.This is the updated version of the previous articles.The contributions in the paper and the modifications in the revision have been provided by the corresponding author Xu Ma, and all of the other coauthors.

Fig. 1 .
Fig. 1.Sketches of (a) the DD-CASSI system and (b) the proposed 3D-CCNN framework.The DD-CASSI is used in the measurement stage, and the 3D-CNN is used in the classification stage.The imaging model of DD-CASSI can be decomposed into patch-based models to keep the dimensionality consistent with the classification network.The 3D-CCNN system combines the coded aperture optimization and the HSI classification into a single framework.

2 K = , 6 NM == , 3 L
correlated across the spatial and spectral domains, and is sparse in some representation basis Ψ[26][27][28].Then, f in Eq. (4) can be represented as = Ψ f , where 12 = Ψ Ψ Ψ is a 3D representation basis, 1Ψ is the 2D wavelet Symmlet-8 basis to depict the correlation in spatial domain, 2 Ψ is the one-dimensional (1D) DCT basis in spectral domain, is the Kronecker product, and  is the coefficient vector.Substituting = Ψ is noted that the matrix H is sparse and highly structured, which includes a set of diagonal line structures determined by the coded aperture entries , k i j l + T .An illustrative example of the matrix H is shown in Fig.2, where = , and the coded aperture patterns obey the Bernoulli distribution with 50% transmittance.

Fig. 2 .
Fig. 2.An illustrative example of the matrix H for the Bernoulli random coded apertures, where 2 K = ,

Fig. 3
Fig.3 Sketch of the 3D-CCNN framework, which connects (a) the measurement stage of DD-CASSI system and (b) the 3D-CNN classification network.The coded apertures and classification network are jointly trained in an end-to-end supervised manner.

Fig. 4 .
Fig. 4. The patch-based imaging model of DD-CASSI.For each snapshot, a 3D hyperspectral patch i P P L R   s corresponds to a compressive measurement patch i P P k R   y on the detector.The hyperspectral patch i s is modulated by a coded aperture patch xy is the central coordinate.Then, the central pixel of the hyperspectral patch i s is 00 ,, x y l

where / 2 mx
qP =   , and   is the rounding operator.Let ,, m n l F be the ( , )th mn pixel in the lth spectral band of i s , and let , k mn Y be the ( , )th mn pixel in the kth band of i y .The spatial dimension of input patch i s and the measurement patch i y is the same.According to Eq. (6), the relationships between the subscripts are 0 aperture entries to be optimized.If all the coded aperture variables are independent on each other, it is impossible to train them by only using a small set of training samples on the hyperspectral data cube.That is because the training problem will become underdetermined if the number of variables exceeds the number of training samples.To circumvent this problem, we design the coded apertures with periodic patterns, where each coded aperture is cyclically filled by a basic block with dimension BB  , and different coded apertures have different basic blocks.Denote k B B R   C as the basic block for the kth coded aperture( under consideration actually belongs to the jth class, otherwise 0 j = l .The j p is the jth element of the output vector, which represents the probability that the pixel belongs to the jth class:

Fig. 6 .
Fig. 6.(a) False-color composite image of the Pavia University spectral data and (b) ground truth of the classification map including nine distinct classes, where black regions represent the unlabeled pixels.

Fig. 7 .
Fig. 7. Illustration of coded apertures for the Pavia University spectral data: (a) the initial coded aperture pattern, (b) the optimized coded aperture pattern using 3D-CCNN, and (c) the difference between the initial and optimized coded aperture patterns.
Figure 10(a) shows the false-color composite image of the Salinas Valley dataset, and Fig. 10(b) shows the ground truth of the classification map including 8 distinct categories, each of which corresponds to a different type of crops.From the image, 30% of the labeled pixels are randomly chosen to be used as the training samples, and the remaining 70% pixels are used for testing.

Fig. 10 .Fig. 11 .
Fig. 10.(a) False-color composite image of the Salinas Valley spectral data, and (b) the ground truth of the classification map including nine distinct classes, where black regions represent the unlabeled pixels.(b) Rand-compress+3D-CNN

Table II . The classification performance of the last six comparative methods using the Pavia University dataset
[7]t, we test and evaluate all the classification methods on another dataset called Salinas Valley dataset.This dataset was acquired by the Airborne Visible/Infrared Imaging Spectrometer (AVIRIS) on the Valley of Salinas, USA[7].This spectral image exhibits a spatial resolution of 3.7m per pixel with 512×217 spatial pixels, and 192 spectral bands in the wavelength range from 0.24μm to 2.40μm.A 216×216×192 data cube is truncated from the entire dataset to be used in the experiments.