DDSR: Degradation-Aware Diffusion Model for Spectral Reconstruction from RGB Images

: The reconstruction of hyperspectral images (HSIs) from RGB images is an attractive low-cost approach to recover hyperspectral information. However, existing approaches focus on learning an end-to-end mapping of RGB images and their corresponding HSIs with neural networks, which makes it difficult to ensure generalization due to the fact that they are trained on data with a specific degradation process. As a new paradigm of generative models, the diffusion model has shown great potential in image restoration, especially in noisy contexts. To address the unstable generalization ability of end-to-end models while exploiting the powerful ability of the diffusion model, we propose a degradation-aware diffusion model. The degradation process from HSI to RGB is modeled as a combination of multiple degradation operators, which are used to guide the inverse process of the diffusion model by utilizing a degradation-aware correction. By integrating the degradation-aware correction to the diffusion model, we obtain an efficient solver for spectral reconstruction, which is robust to different degradation patterns. Experiment results on various public datasets demonstrate that our method achieves competitive performance and shows a promising generalization ability.


Introduction
A hyperspectral image (HSI) records the spectrum of a real-world scene in multiple bands, each reflecting information at a specific spectral wavelength, making it possible to detect unique spectral signatures of an individual object at different spatial locations, and thus to detect substances indistinguishable to the human eye.Compared with a traditional RGB image, a hyperspectral image has more spectral bands, which is able to store richer information and provide more details of the scene.Based on the above advantages, HSIs are very useful in many applications, such as medical image processing [1], remote sensing [2], anomaly detection [3], automatic driving [4], and other fields [5,6].However, acquiring HSIs with rich spectral information is very costly and complicated, which poses significant limitations to the development of various applications, particularly in dynamic scenes or real-time scenarios.It is known to us that the acquisition of RGB images is easy and cheap [6], so extracting spectral information from RGB images has become a recent research hotspot, also known as spectral reconstruction (SR).The reconstructed HSI has shown potential in real-world tasks recently [7,8]; for example, it is used to field disease detection with 6.14% improvement compared to the baseline methods.However, collecting a large amount of well-calibrated pairwise data, like using dual cameras, is not trivial.Inverse mapping of existing HSIs onto RGB images is currently an effective training method, also known as the degradation of HSIs.
Early SR methods are mainly prior-based, which explore priors (such as sparsity or spectral correlation [9][10][11]) in HSIs.However, due to the poor representation ability of these priors, these methods only perform well on data in specific domains.With the development of deep learning [12][13][14], deep neural networks have become a powerful tool for solving the SR problem, which learn an end-to-end mapping from RGB images to HSIs, such as AWAN [15], HSCNN+ [16] and DRCRNet [17], also called data-driven methods.Transformer is also used to solve SR tasks [18,19], achieving impressive results with the multi-head self-attention (MSA) mechanism [20,21].However, these end-to-end methods are obviously limited to specific degradation process, and data augmentation or ensemble learning could probably alleviate this problem, but not as a flexible approach.Moreover, these types of implicit mappings learned in an unconstrained space are often not optimal, especially when the data are insufficient, and the models usually face overfitting.
The diffusion model is a highly flexible and easy to train generative model, which consists of a forward and an inverse process.The basic idea is that the forward process sequentially perturbs the distribution of the data, and the inverse process is to restore the data distribution gradually.Previous works have demonstrated the powerful data modeling capabilities of the diffusion model to enable flexible mapping, i.e., from randomly sampled Gaussian noise to complex target distributions, such as text, image, and speech [22][23][24][25][26]. Therefore, we propose to use the diffusion model to perform efficient spectral reconstruction from an RGB image.However, for the SR task, the inverse process recovers hyperspectral information starting from random noise with the same size as the HSI; therefore, the noise space is large, which increases the prediction uncertainty in the diffusion inverse process.Fortunately, we find that the core part of the degradation process from HSI to RGB lies in the linear spectral downsampling, also known as the spectral response function, which is useful for spectral reconstruction [9,12,27,28].The spectral response function provides important prior information for modeling the degradation process.Based on the degradation process, we could reduce the error caused by the prediction uncertainty in the inverse diffusion process.According to the above observation, we propose a degradation-aware diffusion model (namely, DDSR) for spectral reconstruction from an RGB image.
Specifically, the real-world degradation process from HSI to RGB is modeled as a combination of multiple degradation operators; they are linear spectral downsampling, random noise and nonlinear JPEG compression, respectively.These degradation operators are useful for SR tasks, which are used to guide the inverse process of diffusion, the key is to reduce the error caused by the prediction uncertainty in each inverse prediction step by utilizing our degradation-aware correction.Thus, we obtain a degradation-aware diffusion model, which is easily adapted to different degradation processes, while ensuring the generalization ability.
To the best of our current knowledge, this is the first degradation-based diffusion model for spectral reconstruction.By integrating the degradation-aware correction to the diffusion model, we obtain an efficient solver for spectral reconstruction.The main contributions of this work are summarized below:

•
We propose DDSR, a diffusion-based spectral reconstruction architecture that utilizes a degradation-aware correction to reduce the error caused by the prediction uncertainty in each inverse step.

•
Since realistic scenarios are usually noisy, a noise-related correction method of our work, motivated by adapting the correction process to the noise level of the current image, is proposed to reduce the effect of the noise.• JPEG compression is a common nonlinear degradation, and we propose to extend the correction further for JPEG-related scenarios.

•
Quantitative experiments on various public datasets demonstrate that our method can achieve competitive performance and shows promising generalization ability.

Related Work
Recovering HSIs from RGB images is a highly ill-posed problem and previous methods have achieved relatively impressive results, which can usually be categorized into two types: prior-based methods and data-driven methods.

Prior-Based Methods
Prior-based methods explore statistical information (such as sparsity, spatial structural similarity, and spectral correlation) as a prior in an HSI dataset to learn the mathematical model to explain the correlation between an RGB image and an HSI in a subspace.For example, Arad et al. [9] proposed a dictionary learning method based on sparse coding, but its computational complexity becomes higher as the dataset expands, which limits its application.Nguyen et al. [11] proposed a learning method based on a radial basis function network, which uses RGB white balance to normalize the scene illumination and restore the reflectance of the scenes.Jia et al. [10] analyzed a large set of datasets based on nonlinear dimensionality reduction techniques and proposed that the spectra of natural scenes lie in an intrinsically low-dimensional manifold, and proposed a manifold-based reconstruction pipeline for RGB images to be accurately mapped to their corresponding HSIs.

Data-Driven Methods
Data-driven methods take advantage of the ability of neural networks to extract features from RGB and HSI datasets to fit optimal solutions, and various neural network architectures have been proposed to improve reconstruction accuracy recently.HSCNN+ [16] is a distinct SR architecture that replaces the residual block by the dense block with a novel fusion scheme.the Adaptive Weighted Attention Network (AWAN [15]) was proposed to better capture channel dependencies through a trainable Adaptive Weighted Channel Attention (AWCA) module.Zhao et al. [29] propose a four-level Hierarchical Regression Network (HRNet) with a PixelShuffle layer as an inter-level interaction to recover HSIs.Zhang et al. [28] proposed an unsupervised framework that first unsupervisedly estimates the degradation process of the HSI to RGB image by progressively capturing the difference between the input RGB image and the reprojected RGB image from the recovered HSI.To enable the training process without using pairs of HSI and RGB images in the framework, an adversarial learning manner was employed.However, the training of an adversarial learning model may experience pattern collapse.MST++ [19] is a Transformer-based model that exploits the property of self-similarity between HSI spectra, using the channels as tokens to perform multi-head self-attention along the spectral dimensionality, and subsequently refining the reconstruction result step by step.The above data-driven methods all learn implicit mappings from RGBs to HSIs, which are often not optimal, especially when the data are insufficient, and the model usually faces overfitting.

Diffusion Model for Image Restoration
The diffusion model is a hot topic in computer vision, and many excellent diffusionbased image restoration methods have also appeared recently.Specifically, Wang et al. [30] proposed an effectively image restoration method based on denoising diffusion that removes noise and defect from an image and preserves the details and structure of the image.SR3 [22] performs image restoration through a stochastic iterative denoising process using a conditional diffusion model.EDiffSR [31] proposed a novel conditional Prior Enhancement Module (CPEM) that can efficiently utilize prior knowledge from a low-resolution image, which is subsequently provided to the diffusion model for efficient hyperspectral image restoration.HSR-Diff [32] was proposed as a conditional diffusion model and Dong et al. [33] proposed an interpretable scale-propelled diffusion (ISPDiff) model, which they used to generate a high-resolution HSI by fusing a high-resolution multispectral image (MSI) with a corresponding low-resolution HSI, namely HSI fusion.However, these methods recover HSI information from low-resolution HSIs, while HSI fusion methods additionally require high-resolution MSIs.They are not suitable for spectral reconstruction from RGB images, which provide very limited information compared to low-resolution HSIs, making model learning more difficult.In our work, we aimed to combine the diffusion model with degradation-aware correction to obtain an efficient SR task solver, shown in Figure 1, while the model is robust in different degradation patterns.Nevertheless, to the best of our knowledge, the diffusion model has been little explored in terms of spectral reconstruction, and our work aims to fill this research gap.

Background
A diffusion model [30,34] is a type of probabilistic generative model that typically consists of two stages: forward process and inverse process.In short, the forward process perturbs the data distribution stochastically, usually by adding random noise, while the inverse process learns to recover the data distribution gradually.The forward process is composed of multiple perturbation steps in the form of a Markov chain, where the low-level noise is added to the input image x 0 as the timestep increases.At timestep t, the perturbation process can be formulated as follows: where β t is a hyperparameter that represents the noise level, ϵ is sampled from a standard Gaussian distribution and x t is the noised image at timestep t, which can be represented as the closed-form ( where α t = 1 − β t and α t = ∏ t s=1 α s .Thus, x t can be seen as a weighted sum of x 0 and ϵ; as the timestep of noise injections increases, the input image gradually gets corrupted until it approaches standard Gaussian noise.The inverse process is to estimate q(x t−1 |x t ) from the noised image, which is difficult to estimate.However, the posterior distribution p(x t−1 |x t , x 0 ) can be derived by the Bayes theorem (see Appendix A.1 for details), which can be expressed as where x t is the mean value [35].Based on Equation (2), x 0 can be reparameterized as where ϵ θ denotes a predicted noise by the neural network with parameter θ, described in Figure 2, and the x 0|t is the predicted image of x 0 in timestep t.Finally, the inverse process to obtain x t−1 can be expressed as The specific forward and inverse processes can be seen in Figure 1; the forward process adds noise according to Equation ( 1), while the inverse process predicts x 0|t using a neural network, then leverages Equation ( 3) to recover the HSI from random noise repeatedly.

Architecture of Neural Network
The neural network is an important part of the inverse process; specifically, the network takes the noisy image x t and timestep t as inputs to predict the noise ϵ θ , which is subsequently used to compute x t−1 according to Equation (5).The overall structure of the neural network is depicted in Figure 2, which uses U-Net as a backbone and consists of a number of ResBlocks and attention blocks [20].Apparently, the network extracts features at different scales of the image, thus providing more enriched contextual information.We also used skip connections to convey feature information at different levels, which helps to fully utilize the local and global information of the image.This ensures that the network can learn the noise distribution more efficiently.

Degradation-Aware Diffusion Model
For the SR task, the inverse process recovers an HSI starting from random noise with the same size at the HSI; therefore, the noise space is large for conventional diffusion-based methods, which increases the uncertainty in predicting x t−1 in the inverse process.Thus, we leverage a degradation-aware correction to rectify the predicted x t−1 , which is based on Range-Null Space (RNS) Decomposition [36].By integrating the degradation-aware correction to the diffusion model, we obtain an efficient solver for spectral reconstruction, which is robust to different degradation patterns under the guidance of the degradation process.The overall architecture of our proposed DDSR is shown in Figure 1; a degradationaware correction module is added in each inverse process.Unlike conditional diffusion models [22,31], it is important to note that the neural network in the DDSR does not require an RGB image as input, which is only provided as prior information for the correction, so the model can adapt to various degradation processes flexibly.

Degradation-Aware Correction
In general, the core part of degradation from HSI to RGB can be formulated as linear spectral downsampling: where x ∈ R D×1 denotes the HSI, y ∈ R d×1 denotes the degraded RGB image, and H ∈ R d×D denotes the spectral downsampling operator, also known as the spectral response function; d and D are the number of channels of the RGB and HSI, respectively.The goal of HSI reconstruction, here, is to recover x from y given H.An ideal recovered x r needs to satisfy the consistency constraint y = Hx r .However, the reconstructed HSI from end-to-end models optimized by pixel loss with the ground-truth x usually fails to satisfy the consistency constraint.Therefore, we propose a degradation-aware correction to rectify the x t−1 in each inverse step, while trying to ensure that x 0|t satisfies consistency in each inverse process.Specifically, H and y are given; based on RNS decomposition [36], we can simply decompose x into two parts.Then, x can be written as where H † is the pseudo-inverse matrix of the spectral downsampling operator H; they satisfy HH † H = H, while H † y is in range space and (I − H † H)x is in null space.There are two noteworthy points: (i) the range part of x is known given H and y; and (ii) if we replace x with x 0|t in the null part of Equation ( 7), the resulting x0|t still satisfies consistency , which can be formulated as where x0|t is the result after degradation-aware correction, such that the error of the range part of the generative algorithm is removed while the x0|t always satisfies the consistency constraint.The architecture of the degradation-aware correction is shown in Figure 1.We rectified x t−1 in each inverse process to achieve efficient spectral reconstruction; the whole process of sampling is represented in Algorithm 1.
Algorithm 1 Simple sampling 1: Input: x T ∼ N (0, I), degraded image y = Hx 2: Output: Reconstructed HSI x r 3: for t = T, ..., 1 do 4: Degradation in realistic scenarios is usually noisy due to the effects of the sensor, lighting, and exposure time.Furthermore, special modifications are required for noisy scenarios.In the case of noise, y = Hx + n, where n ∼ N (0, σ 2 y I) is sampled from a Gaussian distribution; if we substitute y into in Equation ( 8), we obtain where H † n is the noise bias, which will be amplified in the Markov chain during the inverse process.Given that n is sampled from a Gaussian distribution, we find that H † n still obeys a Gaussian distribution with variance ω 2 σ y 2 , while ω can be estimated given H.To reduce the impact of noise bias, the sampling needs to be modified as where λ t and κ t are subject to two constraints: (i) λ t needs to be as close to 1 as possible to ensure consistency; and (ii) the x t−1 sample from Equation (11) should keep the same variance σ 2 t as in Equation ( 3).If we substitute the modified x0|t in Equation ( 10) into µ t in Equation (3), obviously, there is an additional variance (ωa t λ t σ y ) 2 due to the bias λ t H † n, where a t = √ ᾱt−1 β t 1− ᾱt , which is the coefficient of x 0|t .So, the constraint (ii) can be formulated as Combining the above two constraints, we can value λ t and κ t as follows: The whole process of noise-related sampling is represented in Algorithm 2. The motivation is to try to ensure consistency when the current noise level is higher than the bias, and we make the variance of Equation ( 11) the same as Equation ( 3) when the noise level is low.Update κ t , λ t via Equation (12) 5: ▷ update x0|t via Equation (10) 7: x t−1 ∼ p(x t−1 |x t , x0|t ) ▷ Equation (11) 8: end for 9: return x 0

JPEG-Related Correction
JPEG compression [37] is a common nonlinear operator in realistic scenarios, as shown in Figure 4; the degradation process in this paper can be extended by the noise case as follows: y = decoder(encoder(Hx)) + n (13) encoder(decoder(encoder(x))) ≈ encoder(x) where encoder(•) and decoder(•) denote the encoding and decoding process of JPEG compression; they have similar properties to the linear operator H shown in Equation ( 14).We propose to extend the correction further for JPEG-related scenarios by replacing the x0|t in Equation ( 10) with In the following sections, the JPEG compression quality factor is denoted as q, which is used to control the compression ratio of the image.

Experiments 4.1. Dataset
In this work, we trained our DDSR on the ARAD-1K dataset [27], which provides images measuring 482 × 512 and each HSI consists of 31 channels corresponding to wavelengths ranging from 400 nm to 700 nm.The ARAD-1K dataset contains 1000 HSI and RGB pairs, which are divided into sizes of 900, 50, and 50 for training, testing, and validating, respectively.We further validated the effectiveness of our method on five additional publicly available datasets, namely CAVE [38], Foster [39], KAUST [40], NUS [11], and ICVL [9].For each dataset, we chose the channels corresponding to 400-700 nm.We used all the HSIs of the CAVE and Foster datasets, and selected 12, 25, and 100 HSIs from the NUS, ICVL, and KAUST datasets, respectively.For a fair comparison, the same degradation process was adopted to obtain training and testing RGB images.In order to further demonstrate the generalization capability of our method, we adopted two spectral response functions, one provided by the 2022 NTIRE Spectral Recovery Challenge and the other being the CIE 1931 color matching function.

Implementation Detail
The total timestep in the diffusion model was set as T = 1000.β 1 and β T in Equation (1) were set to 1 × 10 −4 and 1 × 10 −2 , respectively, and β 1:T in Equation ( 1) was linearly increasing with the step in both training and testing phases.The number of ResBlocks in each layer of U-Net was set to two and the number of channels in U-Net was set to [64,128,256,512].We used Adam as the optimizer with β = (0.9,0.999), the initial learning rate was set to 1 × 10 −4 , while the learning rate was decayed by the CosineAnnealing schedule until it reached 1 × 10 −6 .During the training phase, we used four NVIDIA Tasle P100s as the hardware platform for our entire experiment, where the software platform was implemented using Pytorch 1.8.0; the model was firstly trained using images measuring 256 × 256 with a batch size of 16, and then was fine-tuned with images measuring 482 × 482 with a batch size of four.Additionally, each RGB-HSI pair was rescaled to the range of [−1, 1] for training.
During the testing phase, the ARAD, CAVE, and KAUST dataset, were tested using the entire size of the image and the rest of the datasets were cropped to a size of 512 × 512.In order to objectively evaluate the performance of our proposed method, we adopted three metrics as the quantitative evaluation results, as used in the 2022 spectral recovery challenge.The first metric is the mean relative absolute error (MRAE) that computes the pixel-wise distance between all channels of the reconstructed and ground-truth HSI, which can be formulated as where x ∈ R H×W×C λ denotes the reconstructed HSI cube and N = H × W × C λ denotes the number of all pixels on the image.x is the ground truth and is the same size as x.The second metric is the root mean square error (RMSE) that is defined as The last metric is the Peak Signal-to-Noise Ratio (PSNR): Obviously, if the value of x is too small (sparse or dark region), it may lead to a relatively large value of MRAE, thus causing a gradient problem.Therefore, we chose L 2 loss to train the neural network (see Appendix A.2 for details).

Baseline
We compared our proposed DDSR method with five state-of-the-art methods, including four deep learning-based spectral reconstruction algorithms (AWAN [15], HDNet [41], HRNet [29], and MST++ [19]), and an efficient image restoration model, Restormer [18], where MST++ and AWAN were the winners of the NTIRE 2022 and 2020 Spectral Reconstruction Challenges.In order to more fully evaluate our method, we also compared our method with a conditional diffusion model [22] and a SOTA diffusion model for hyperspectral image restoration [31].For a fair comparison, all the methods were trained using the same data and settings with our method.

Quantitative Result
We compared our method with five state-of-the-art methods on six datasets.In the testing, the RGB images were generated using the spectral response function provided by the 2022 NTIRE Spectral Recovery Challenge, which was the same as the training setting.Overall, Tables 1-3 demonstrates the quantitative results of our DDSR and the compared models on all the datasets.All metrics are averaged and presented for each dataset; it can be found that our method maintains the best or second-best RMSE and PSNR results for most of the datasets, which demonstrates the effectiveness of our method.
The quantitative results for the KAUST and NUS datasets are shown in Table 1; it is obvious that our method outperforms the compared methods.Specifically, compared with the secondbest method on the KAUST dataset, our DDSR achieves a 7.2% and 2.1% improvement in RMSE and PSNR, respectively.The quantitative results from Tables 2 and 3 show that our method, in the RMSE and PSNR metrics, achieves the best results on the CAVE and ICVL datasets, while achieving second-best on the ARAD-1K and Foster datasets, but are competitive with the best results.
However, the MRAE performance is slightly worse than other methods.This is due to the fact that our method has a relatively larger error in sparse or dark regions, which leads to a larger value when calculating the MRAE when dividing by small pixel values.The specific reasons can be explained as follows: the pixel values of y in sparse and dark regions tend to 0, which means that the information of H † y for the correction is limited, so the reconstruction results are relatively worse.This is further presented in Figure 5, for example, where the trees in the first and second columns, and the windows in the fifth column, which are labeled by red boxes, are regions with relatively large MRAE values; however, the L 1 loss in these regions is relatively small.
We also compared the results with two diffusion-based methods, and the results are shown in Table 4. Obviously, these diffusion methods of image restoration do not perform well in hyperspectral reconstruction.This is due to the lack of training data leading to insufficient model learning, while the input only provides RGB image with limited information, which further increases the difficulty of the diffusion model in recovering hyperspectral information from random noise.Figure 5. Visualization of MRAE (second and third rows) and L 1 loss (fourth and fifth rows) heatmap of the reconstruction results by our method corresponding to 460 and 500 nm, which are sampled from ARAD-1K dataset.Note that the red boxes label the regions where the MRAE loss is large, but the L 1 loss in these regions is small.

Qualitative Results
In this section, we compare the reconstruction quality of different methods visually.The highest correlation and overlap between the curves of the ground truth and our result in Figure 6 validates the effectiveness of our method.Figure 7 demonstrates the five slices of a reconstructed image in the KAUST dataset; it is obvious that the reconstructed results of our method are perceptually competitive and pleasing, and some of the compared methods have unpleasant artifacts in their results, and the contrast of their reconstructed image has a certain deviation compared to the ground truth.The curves in Figure 7 correspond to the average values of the RMSE for each channel at the green box position of the RGB image.Overall, it is clear that our method can achieve better results.
The error maps of the the reconstructed HSIs of all methods compared with the ground truth are presented in Figures 8-10; it can be observed that the compared methods show limitations in fine-grained reconstruction; the error is mainly concentrated in the detailed parts of the scene, while the reconstruction error of our method in the detailed region is always kept at a low level.We infer that this is because the degradation-based correction helps the model to better capture the details of image generation, while ensuring the reconstruction quality.

Ablation Study
We conducted various ablation studies to verify effect of different modules in the proposed method, including the effect of the noise-related sample, different amounts and ranges of correction, and the JPEG-related sample.
Table 5 shows the effectiveness of our noise-related method in different noise cases.It is obvious that, compared with simple sampling, the noise-related method significantly improves the performance at the same noise level, and maintains the same level of MRAE and PSNR in the face of larger noise.Our noise-related method, even at σ y = 0.01, still achieves the same results as simple sampling at σ y = 0.005.This is due to the fact that x0|t is adjusted according to the noise level, which helps to reduce the impact of noise.
Table 6 presents the results of our method under different ranges of correction amounts, indicating that the more corrections performed, the better the results obtained.In addition, it can be found that, for the same amount of corrections, the earlier they are performed, the better the results obtained.We infer that our approach helps the diffusion model to learn the distribution of HSIs more efficiently, while early correction can provide more accurate x t−1 for the subsequent samples, and, finally, obtain a more reasonable x 0 .
Table 7 shows that our JPEG-related method significantly improves MRAE, with a slight improvement in RMSE and PSNR.We infer that the proposed JPEG-related method can more effectively handle sparse or dark areas, while not affecting the accuracy of other regions.

Generalization Ability
In order to demonstrate the generalization ability of our DDSR to different degradation patterns, in this section, we tested a different degradation pattern corresponding to a different response function based on the CIE 1931 color match function for generating test images.The quantitative results on various datasets are shown in Tables 8 and 9; it can be seen that our method outperforms the compared data-driven based approaches and maintains the same level of performance when a different degradation pattern is demonstrating a promising generalization ability.This is attributed to the fact that the neural network in our correction method does not require RGB images as inputs, which are only provided as prior information for the correction, so the model can adapt to various degradation processes flexibly.However, those end-to-end trained methods have difficulty in ensuring a stable performance.
Table 8.The average quantitative results on the KAUST and NUS datasets.The degradation is set to σ y = 0.005, and q = 75.The best and second-best values are highlighted.

Conclusions
In this paper, we proposed a degradation-aware diffusion model for spectral reconstruction from RGB images, addressing the unstable generalization ability of end-to-end models, while exploiting the powerful ability of diffusion models.The degradation process from HSI to RGB was modeled as a combination of multiple degradation operators, which were used to guide the inverse process of diffusion; specifically, the key was to reduce the error caused by the prediction uncertainty in each inverse prediction step by utilizing our proposed degradation-aware correction.Based on the complex degradation of realistic scenarios, we further proposed correction methods related to noise and JPEG compression.Finally, we obtained an efficient solver for spectral reconstruction, which is robust to different degradation patterns.We conducted quantitative and qualitative experiments to demonstrate the performance and generalization ability of our method, as well as ablation experiments to validate the effectiveness of the proposed method.In the future, we will focus on the following aspects: (1) how to deal with more complex degraded environments; and (2) how to improve the reconstruction accuracy for dark regions.

Figure 1 .
Figure 1.Description of the overall architecture of our proposed DDSR and degradation-aware correction.

Figure 2 .
Figure 2. The architecture of the neural network with parameter θ, which uses U-Net as a backbone and consists of a number of ResBlocks and attention blocks, where we uniformly sample timestep t ∼ (0, 1, ..., T) and encode it sinusoidally, and then pass it through Full connection layers to get time embedding.

Figure 3
Figure 3 depicts the structure of the ResBlock, which contains batch normalization, Silu activation, 2D convolution layers with a 3 × 3 kernel, and a residual connection to the output.Due to the insufficient training data for hyperspectral images, we additionally used dropout to avoid overfitting.Time embedding was added to the intermediate features to allow the network to learn different levels of noise.

Figure 3 .
Figure 3. Structure of the ResBlock of the neural network; C,H,and W represent the number of channels, height, and width of the input, respectively.

Figure 4 .
Figure 4. Description of the degradation process in this paper, where Hx is the linear spectral downsampling, and JPEG denotes the JPEG compression.

Figure 6 .
Figure 6.Visualization of spectral density curve of selected samples corresponding to the green box from the Foster and CAVE datasets; corr represents the correlation with the ground-truth curve.

Figure 7 .
Figure 7. Visualization of slices of reconstructed image and RMSE curves for all methods, corresponding to the green box position in the RGB image on the KAUST dataset.

Figure 8 .
Figure 8. Visualization of the error heatmap of generated results on the NUS dataset.Please follow the color bar to find areas of large losses and zoom in for a better view.

Figure 9 .
Figure 9. Visualization of the error heatmap of generated results on the CAVE dataset.Please follow the color bar to find areas of large losses and zoom in for a better view.

Figure 10 .
Figure 10.Visualization of the error heatmap of the generated results on the ICVL dataset.hlPlease follow the color bar to find areas of large losses and zoom in for a better view.

Table 1 .
The average quantitative results on the KAUST and NUS datasets.The degradation is set to σ y = 0.005, and q = 75.The best and second-best values are highlighted.

Table 2 .
The average quantitative results on the CAVE and Foster datasets.The degradation is set to σ y = 0.005, and q = 75.The best and second-best values are highlighted.

Table 3 .
The average quantitative results on the ARAD-1K and ICVL datasets.The degradation is set to σ y =0.005, and q = 75.The best and second-best values are highlighted.

Table 4 .
The quantitative ablation results of our method and two diffusion-based methods on the ARAD-1K dataset.The degradation is set to σ y = 0.005, and q = 100.The best and second-best values are highlighted.

Table 5 .
Ablation study of our noise-related method with different noise cases on the ARAD-1K dataset, where q = 100.The best and second-best values are highlighted.

Table 6 .
Ablation study of our method in different amounts and at different ranges on the ARAD-1K dataset, where σ y = 0.005 and q = 100.The best and second-best values are highlighted.The (a, b) of the Range column means that the correction is performed when a ≤ t < b.

Table 7 .
Ablation study of our JPEG-related method on the CAVE dataset, where σ y = 0.005.The best and second-best values are highlighted.

Table 9 .
The average quantitative results on the CAVE and Foster datasets.The degradation is set to σ y = 0.005, and q = 75.The best and second-best values are highlighted.