Reconstructing Spectra from RGB Images by Relative Error Least-Squares Regression

Spectral reconstruction (SR) algorithms attempt to map RGB- to hyperspectral-images. Classically, simple pixel-based regression is used to solve for this SR mapping and more recently patch-based Deep Neural Networks (DNN) are considered (with a modest performance increment). For either method, the ‘train-ing’ process typically minimizes a Mean-Squared-Error (MSE) loss. Curiously, in recent research, SR algorithms are evaluated and ranked based on a relative percentage error, so-called Mean-Relative-Absolute Error (MRAE), which behaves very differently from the MSE loss function. The most recent DNN approaches - perhaps unsurprisingly - directly optimize for this new MRAE error in training so as to match this new evaluation criteria. In this paper, we show how we can also reformulate pixel-based regression methods so that they too optimize a relative spectral error. Our Relative Error Least-Squares (RELS) approach minimizes an error that is similar to MRAE. Experiments demonstrate that regression models based on RELS deliver better spectral recovery, with up to a 10% increment in mean performance and a 20% improvement in worst-case performance depending on the method.


Introduction
Hyperspectral imaging devices are widely used in practical applications, such as remote sensing [27,7,11,24], medical imaging [29,30], scene relighting [16] and digital art archiving [28]. These devices record high-resolution radiance spectra at every image pixel, providing very rich information of material compositions at pixel level. However, existing technologies often suffer from low spatial resolution, low light sensitivity and/or long integration time. Moreover, these devices are often expensive and bulky.
Alternatively, spectral reconstruction (SR) algorithms attempt to recover high-quality hyperspectral images from RGB images. Assuming SR algorithms work adequately well, they clearly provide a low-cost, high-speed and high-resolution hyperspectral imaging solution.
Early SR approaches are mostly pixel-based, aiming to find (statistically speaking) the 'best' spectral estimate from each individual RGB. Many of these methods are based on regression, including linear and polynomial regressions [13,8,17] and the leading sparse coding model [2]. These methods are characterized by simple formulations and generally optimized via closedform solutions.
More recently, patch-based methods have come to the fore. Leading by Deep Neural Network (DNN) solutions (e.g. [4,14,6]), these methods argue that incorporating 'object-level' information is a key to better spectral recovery. While studies have shown that DNN algorithms do recover spectra more accurately than pixel-based regressions, the performance difference is arguably modest [2]. And, the cost of achieving this performance increment is an expensive training procedure which requires or- The general workflow followed in SR research is illustrated in Figure 1. Ground-truth hyperspectral images (the top-left 3D image cube, denoted as r) are used to generate physically accurate RGB images (denoted as x) using a given camera's spectral sensitivities (here we show the CIEXYZ 1964 color matching functions [9] as an example). The SR algorithm models the map from RGB to spectra (i.e. reconstructing the spectra Ψ(x) =r), and its parameters are calibrated such that a 'loss function' (denoted as Loss(r,r)) is minimized. Lastly, after the training process the trained models are evaluated by a given 'evaluation metric' (denoted as Eval(r,r)). This workflow constitutes the best case operating conditions for SR, e.g. if we recover spectra from real camera's images, this will surely negatively impact on performance. We adopt the shown workflow because it has been used in many published works, and this allows us to benchmark our method against the prior art.
Intuitively, the loss and evaluation functions should be the same: metaphorically, it wouldn't make sense to optimize the design of a car for better speed and then gauge its performance by measuring its fuel efficiency. Likewise, SR algorithms optimized for one metric and yet evaluated by another will surely lead to sub-optimal performance for the evaluation metric at hand.
We see in recent benchmarks -leading by the biggest NTIRE Spectral Reconstruction Challenges in 2018 [4] and 2020 [5] models are evaluated and ranked according to a Mean-Relative-Absolute Error (MRAE). In contrary, all (to our knowledge) pixel-based regression models and most patch-based DNNs (e.g. [19,15,12,6]) are trained to minimize the sum of squared error (or equivalently the Mean-Squared-Error (MSE) loss function). While some recent patch-based DNN models have sought not to minimize MSE but rather the expected MRAE, see [23,4], pixelbased regression methods have not been upgraded to work with the MRAE error.
In this paper, we modify pixel-based spectral reconstruction algorithms -including, the regression-based models [13,8,17] and the leading sparse coding model [2] -to optimize for a percentage spectral error, namely the Relative Error Least-Squares (RELS). In RELS we optimize for the L2 relative error instead of (yet similar to) MRAE, simply because it is an easier function to optimize (with a closed-form global optimum). The rest of the paper is organized as follows. In section 2 we review the spectral reconstruction (SR) problem. In section 3 we detail the proposed Relative Error Least-Squares optimization for SR. Experiments are reported in section 4. This paper concludes in section 5.

Background
An RGB camera captures the scene with 3 numbers per pixel, referring to the photo-responses recorded by the respective R, G and B sensors. Each sensor is described by a spectral sensitivity function, that is the weighted sensitivities over different wavelengths. In this paper we assume the visible spectrum runs from 400 to 700 nanometers (nm), and the hyperspectral measurements sample the visible spectrum with 10-nm intervals (therefore the hyperspectral images have 31 spectral channels).
Recently many SR algorithms are trained on images. As opposed to the research based on 'point measurements' where matched RGB and hyperspectral data can be easily captured, in image-based SR we often simulate ground-truth RGBs from hyperspectral images given an explicit formulation of the camera's processing pipeline [18]. Denote r as the 31-dimensional spectral measurement, S as the 31 × 3 sensitivity matrix whose columns are the spectral sensitivity functions, the general form of the RGB camera response x = (R, G, B) T is written as [13]: where Γ is a given non-linear rendering function (accounting for mapping measured colors to display RGB, white balance, tonemapping, etc. [18]), and ε is a 3-component error vector referring to the noise occurred during the RGB capturing. We remark that the formulation in Equation (1) does not include any spatial processing applied in many camera's processing pipelines.
In this paper we consider a special case where Γ is identity transformation and ε = 0 as the noiseless simulation of the RGB camera's linear photo-response (i.e. the 'linear RGBs' or 'raw images'). We note that this setting is one of the standard protocols used in previous research.

Pixel-based Regression Models
Pixel-based regressions formulate the RGB-to-spectrum mapping with various assumptions on the nature of the map. Hereafter we denote an SR algorithm as the function Ψ : R 3 → R 31 , where the spectral estimates of r are given by Ψ on input of RGB camera responses, such that Ψ(x) ≈ r.
The simplest Linear Regression (LR) [13] assumes a linear map from RGBs to spectra: Here, the matrix M is called the regression matrix which contains all the model parameters. Recall that the spectra in this paper are modeled as 31-component vectors, so M is a 31 × 3 matrix. Then, the Polynomial Regression (PR) [8] and Root-Polynomial Regression (RPR) [17] serve as simple non-linear solutions. These models calculate a series of polynomial terms from each RGB (up to a given order). Examples of the 2 nd , 3 rd and 4 th order PR and RPR expansions are given in Table 1. Denote ϕ : R 3 → R p as the given polynomial expansion of PR or RPR (p is the number of polynomial terms), the spectral reconstruction is modeled by As an example, the 2 nd order root-polynomial expansion has 6 terms (refer to Table 1 in the second row on the right). Given this representation we are to find a 31 × 6 regression matrix M that maps the polynomial vectors to spectral estimates. Yet another regression approach is the Adjusted Anchored Neighborhood Regression (abbreviated as A+) [2]. Same as LR, A+ assumes a linear map from RGB to spectra. But, instead of modeling a universal mapping for all data as LR, A+ seeks to assign different mappings locally in the 'neighborhoods' (that is to say A+ is a sparse coding algorithm). In brief, the A+ algorithm partitions the spectral and RGB space into overlapping neighborhoods, and for the RGBs belong to the same neighborhood they share the same mapping function (i.e. regression matrix): For details about the method, see [2]. Regarding Equation (2), (3) and (4), we henceforth uniformly use the ϕ notation (polynomial expansion): since, LR and A+ effectively use the first order polynomial expansion ϕ(x) = x.

Ordinary Least-Squares (OLS) Minimization
Ordinary Least-Squares (OLS) is a standard way to solve for the regression models in section 2.1, which seeks to minimize the sum of squared errors (MSE loss function) between ground-truth training spectra and predictions: where r j 's and x j 's are matched spectrum and RGB pairs in the training set ( j indexes individual data pairs), and the number of training pairs is N.
Denoting Φ and R as respectively N × p and N × 31 data matrices (the rows are respectively matched ϕ(x j )'s and r j 's), Equation (5) can be written as: where || · || F indicates sum of squares (equivalently, the Frobenius norm). Then, the regression matrix M is solved in closed form (using the Moore-Penrose inverse [20]) as: Least-squares, while simple, can potentially over-fit the trained model to the training data [13]. An alternative 'robust' least-squares minimization is given by Tikhonov regularization [25,13]: and solved in closed-form [13]: where I is the p × p identity matrix. Above, γ is a user defined parameter that controls the penalty term: ||M|| 2 F . This term effectively prevents drastic Order Polynomials (PR) [8] Root-polynomials (RPR) [ changes in output (the spectra) on small perturbation of the input (the RGBs), such that the model can be more robust against noise and over-fitting. See [25] for more details.
Determining a suitable value of γ is, of course, crucial for the model performance. However, there is no closed-form method to determine its best value. Rather, an empirical search is carried out. In the literature, cross validation [10] (which we use in our experiments) and L-curve method [10,22] are two main ways to determine γ.

Relative Error Least-Squares (RELS)
Intuitively, the selection of evaluation metric should reflect the nature of the measuring targets. However, research proposed that MSE should not be used to evaluate spectral recovery [3,2,4]. These works argued that the Root-Mean-Square Error (RMSE) (the root of MSE) tends to penalize bright pixels more than dark pixels. We remark that this bias should also affect the MSE minimization (e.g. OLS) in the training stage, where spectra that are less well exposed are not regarded with the same importance as the bright spectra in an MSE minimization.

Measuring Relative Error
A natural metric to evaluate methods trained via MSE minimization is Root-Mean-Square Error (RMSE), defined as: where r is the ground-truth spectrum,r = Ψ(x) is the recovered spectrum, and r k is the value of the spectrum in the k th spectral channel.
In contrary, the recent Mean-Relative-Absolute Error (MRAE) [4,5] is calculated as: Notice that the division is done 'channel-by-channel'. In Figure 2 we show an actual example of A+ spectral recovery [2] to help clarify the motivation of why MRAE is preferable. In the left and right panels of Figure 2 we show two sets of spectra, where red curves are the ground-truths and blue curves indicate the recovered spectra. The spectra on the right are identical to those on the left, save that they are twice as bright. In the text boxes (foot of Figure 2) the RMSE and MRAE errors are shown, indicating that the RMSE scales with the brightness whereas MRAE does not.
The MRAE in this example seems to better capture the apparent fact that the recovery is equally good in both figures. In fact, the situation as Figure 2 happens very often: the same physical object can be represented by spectra scaled with different brightness factor when the exposure settings between scenes and/or the quantity of light across a scene vary.

Minimizing Relative Error
From an optimization point of view, we could try to minimize MRAE directly: This minimization refers to the 'Least Absolute Deviation Regression' which can be solved using linear programming [1]. However, the number of constraints in this linear program is large (meaning that the minimization itself is non-trivial), and also it cannot be solved in closed form. Hence, we in turn propose to minimize an allied L2 relative error: namely, the Relative Error Least-Squares (RELS) optimization. We carry out separate minimization for each of the 31 spectral bands. Consider the regression models Ψ(x j ) = Mϕ(x j ), we split the regression matrix into M = (m 1 , m 2 , · · · , m 31 ) T . Now, for the spectral channel k, the following expression is to be minimized: or, equivalently: Next, just like Equation (6) we denote Φ as the N × p data matrix of ϕ(x j )'s and vectorize the expression into: Here, 1 is an N-dimensional vector with all entries equal to 1, and D k is a diagonal matrix of dimension N × N whose components are the reciprocals of the k-channel values of all ground-truth training spectra (i.e. the r j,k in Equation (15)): Finally, the closed form solution of RELS (by channel) is derived again by Moore-Penrose inverse [20,26]: As for the Tikhonov-type regularization (as discussed in Equation (8) and (9)), the RELS version is analogously: where γ k is a per-channel user-defined penalty parameter. Note that, here the γ k is chosen to optimize the penalty for each spectral channel independently. For fair comparison, in the experiments we report later in this paper we also operate OLS channel-by-channel (with separate penalty parameter for each channel).
In Figure 3 we visualize the loss functions of OLS (left figure) and RELS minimization (right figure) for an arbitrary k th spectral channel within a selected window: r k ,r k ∈ [0.2, 0.5]. The dotted diagonal red line in both diagrams is the locus of zero error. It is evident that these two loss functions are quite different from one another -regarding how they evolve away from the zero-error locus -and, this said, it is likely that minimizing OLS and RELS loss will generate different spectral estimates.

RGB and Hyperspectral Images
The ground-truth spectral data is from the ICVL hyperspectral database [3], which includes 201 hyperspectral and RGB images of both indoor and outdoor scenes. The dimension of these images is of 1300 × 1392 × 31, where the '31' spectral channels refer to the discrete spectral measurements with 10 nm sampling intervals within the 400-to-700 nm visible range. The RGB images were created by numerical integration using the CIE 1964 color matching functions [9] as the camera spectral sensitivities and following the 'linear RGB' setting (refer to the discussion below Equation (1)).

Cross Validation
We followed a cross validation procedure to train and evaluate the models. In conventional K-fold cross validation, the data is split into K folds: train on K −1 folds and test on the remaining fold, and the error statistics are averaged over the K experiments [21]. In our experiment, apart from training and testing we are also to select proper penalty parameters γ k for the regularized regression.
We split our data set into four equalized folds -I, II, III and IV -and followed a four-experiment setting: For example, in the first experiment shown above we combine fold I and II as our training data. We use fold III to 'tune' the penalty terms, that is to try different γ k 's (grid search) and choose the one that minimizes the loss introduced by fold III data. Then, given a tuned penalty term we test the models on fold IV (test set). This training regime ensures that all folds are used for testing once. Finally, for each test set we calculated recovery performance statistics (in MRAE) and then averaged them over the four testing folds.

Results and Discussion
In Table 2 we present various results. First, the Mean (left  table) and 99 Percentile MRAE (right table) were calculated 'per image' and the shown numbers are the averaged results of the test-set images (cross validated). Second, under the All, Bright and Dim columns are respectively the results averaging over all the pixels, the top 50% brightest pixels and the 50% dimmest pixels in each test image (here we define 'brightness' as the L2 norm of the ground-truth spectra). Finally, referring to [17], PR6 performs differently under different testing exposure settings (while LR, RPR6 and A+ performs exactly the same), hence we specifically tested PR6 under original, half and double testing exposures: we uniformly scale all the testing ground-truth images by a factor ξ = 1, 0.5 and 2. For more information, see [17].
We see that for the overall performance (the 'All' columns), RELS significantly improves for all algorithms in terms of mean and the worst-case (i.e. the 99 percentile) MRAE. More specifically for the leading sparse coding model A+ [2], our RELS minimization provides a 7.2% boost in mean MRAE performance, and for the simplest LR method [13], RELS improves its worst-case MRAE by a remarkable 19.1%. These results show that choosing a loss function matched with the evaluation metric can be very crucial to the performance of pixel-based regression models. Now, we examine the brightness dependency in the MRAE performance (the 'Bright' and 'Dim' columns). Using the orig-  inal OLS minimization, LR, RPR6 and A+ show clear bias towards bright pixels with substantial discrepancies. In contrary, our RELS minimization effectively balances the performance between bright and dim pixels (while providing better overall performance). For the PR6 model under original exposure ξ = 1 there is not clear brightness dependency for both OLS and RELS, while the increment in overall performance by using RELS results from the improvement in the performance of dim pixels. Finally, in agreement with the results in [17], PR6 method deteriorates under varying testing exposure. This teaches that only the brightness-invariant models such as LR, RPR and A+ are suitable for practical use -where scene brightness can vary both spatially and temporally.
In Figure 4 we show the MRAE error maps. We can clearly see the trade-offs in the images: the OLS models recover the spectra of dark tree leaves very badly, but for the bright walls it performs rather well; in contrary, RELS narrows the MRAE difference between the tree leaves and bright walls, claiming a better overall recovery performance.

Conclusion
Reconstructing radiance spectra from RGB camera responses is often formulated as a machine learning problem. Most of the approaches by-default minimize the sum of squared errors (a Mean-Squared-Error (MSE) loss function). However, recent works on large-scale benchmark challenge [4,5] instead rank models using an MRAE metric -a percentage deviation with respect to the ground-truth spectrum. This 'mismatch' between training loss and evaluation metric has only been addressed by some recent Deep Neural Network (DNN) solutions [23,4] where MRAE is minimized directly.
In this paper we seek to solve this issue for the pixel-based regression methods. We formulate a closed-form Relative Error Least-Squares (RELS) optimization approach which minimizes a relative error loss function that behaves more like MRAE. Our experimental results show that RELS significantly improves the MRAE performance of dim pixels that leads to an overall performance boost for all considered models.