Automatic de-noising of close-range hyperspectral images with a wavelength-specific shearlet-based image noise reduction method

Hyperspectral imaging (HSI) has become an essential tool for exploration of different spatially-resolved properties of materials in analytical chemistry. However, due to various technical factors such as detector sensitivity, choice of light source and experimental conditions, the recorded data contain noise. The presence of noise in the data limits the potential of different data processing tasks such as classification and can even make them ineffective. Therefore, reduction/removal of noise from the data is a useful step to improve the data modelling. In the present work, the potential of a wavelength-specific shearlet-based image noise reduction method was utilised for automatic de-noising of close-range HS images. The shearlet transform is a special type of composite wavelet transform that utilises the shearing properties of the images. The method first utilises the spectral correlation between wavelengths to distinguish between levels of noise present in different image planes of the data cube. Based on the level of noise present, the method adapts the use of the 2-D non-subsampled shearlet transform (NSST) coefficients obtained from each image plane to perform the spatial and spectral de-noising. Furthermore, the method was compared with two commonly used pixel-based spectral de-noising techniques, Savitzky-Golay (SAVGOL) smoothing and median filtering. The methods were compared using simulated data, with Gaussian and Gaussian and spike noise added, and real HSI data. As an application, the methods were tested to determine the efficacy of a visible-near infrared (VNIR) HSI camera to perform non-destructive automatic classification of six commercial tea products. De-noising with the shearlet-based method resulted in a visual improvement in the quality of the noisy image planes and the spectra of simulated and real HSI. The spectral correlation was highest with the shearlet-based method. The peak signal-to-noise ratio (PSNR) obtained using the shearlet-based method was higher than that for SAVGOL smoothing and median filtering. There was a clear improvement in the classification accuracy of the SVM models for both the simulated and real HSI data that had been de-noised using the shearlet-based method. The method presented is a promising technique for automatic de-noising of close-range HS images, especially when the amount of noise present is high and in consecutive


Introduction
Close-range hyperspectral imaging (HSI) and image processing techniques are popular analytical tools in many scientific domains and are used in applications such as the exploration of food properties [1], pharmaceutical product characterisation [2,3], forensics analysis [4,5], exploration of plant traits for phenotype studies [6,7], and microbiology [8]. The major advantage of HSI over other conventional analytical techniques is its non-invasive and non-destructive nature which is further complemented by rapid data acquisition.
HSI combines two sensor modalities that are spectroscopy and imaging, where the spectroscopy provides the chemical information about the samples and the imaging adds a complementary domain of spatial information [9]. The data generated by HSI can be understood as spatial maps of spectral variation arranged in 3-D cubes (n × p × q). The first two dimensions (n × p) of the cubes are usually the spatial dimensions, and the third dimension (q) contains the spectral information. To extract the meaningful information from HS images, https://doi.org/10.1016/j.snb.2018.11.034 Received 31 October 2017; Received in revised form 2 November 2018; Accepted 7 November 2018 different data processing steps such as exploration, regression and classification are often performed. However, before any data processing, as a standard first step, the cubes are usually pre-processed to remove various types of noise from the data so as to increase the signalto-noise ratio (SNR) [10].
The information generated in HSI is often accompanied by noise, which can arise from detector sensitivity, illumination conditions (e.g. the choice of light source) and experimental conditions (e.g. interference from other light sources). The types of noise can range from small signal-independent noise such as low-level Gaussian to high-level mixed noise such as Gaussian, Poisson and spike [11]. Since the noise is in the acquired signals, it can be observed in each individual spectrum. However, it can also be observed as pixel-to-pixel intensity variations in each spatial plane. For this reason, the noise is visible in both the spatial and the spectral domains. The need for methodologies to deal with the noise present in the data cubes generated by close-range HSI has already been highlighted in [12]. A typical approach to deal with noisy signals in the chemometrics field is to remove the affected spectral range from the dataset. This approach can be seen very often when the extreme wavelengths of images are noisy, and the easiest option is to remove that part of the spectrum. However, the downside of this approach is that with the removal of noisy wavelengths, relevant information in the data might also be removed. The other common approach to remove noise from the signal (spectra) while keeping the information is to apply smoothing filters. Two commonly used filter methods are Savitzky-Golay (SAVGOL) smoothing [12][13][14] and median filtering. SAVGOL smoothing and median filtering can be used alone and independently for each spectrum corresponding to a pixel of the HS image. However, if the level of noise is too high in the spectrum, the use of SAVGOL smoothing and median filtering can become tedious because of the need to determine the optimum window width for smoothing. Also, if noise is present in successive wavelengths, SAVGOL and median filters can result in a deformed spectral profile. The deformation mainly occurs when a large number of noisy wavelengths are present inside the smoothing window, dominating the normal wavelengths. To deal with this, the filter-based methods are currently applied after removing the high noise wavelengths [15,16]. Furthermore, the main drawback of both of these methods is that they can only be used to deal independently with the noise in each spectrum of the HS image and it is not possible to take into account the spatial relations between the spectra of the pixels. Without removing the noise from the spatial domain, the scores maps resulting from classification and regression procedures can become noisy (misclassified pixels) leading to inefficient data modelling.
Methods like SAVGOL and median filtering require testing and optimisation of parameters such as the window size, order of derivative etc., which often requires expertise and visualisation skills to decide on efficacy. However, the use of HSI for process analysis, where real-time data processing is required, means that there is a need for automatic denoising methods. To the best of our knowledge, there is no existing automatic method that deals simultaneously with both the spatial and spectral noise in the data generated with close-range HSI. However, in the field of remote sensing, the problem of automatic de-noising of HS images is well understood. There are three main families of methods that are used for automatic de-noising of HSI data. The first is the family of methods that utilise the sparse representation of spatial planes such as wavelets but do not consider the spectral noise [17]. The second is the family of methods that combine decorrelation techniques such as principal components analysis (PCA) with the sparse techniques [18]. However, these methods deal with the spatial and spectral noise separately. The third is the family of methods, such as tensor decomposition methods [19], that utilise the spatial and spectral noise together and are based on the 3-D representation of data. However, the major drawback of such tensor approaches is that the spectral and spatial dimensions are treated equally whereas typically in HSI, spectral correlation is far higher than spatial correlation. Also, the type and amount of noise ranges from low signal-independent noise to mixed Gaussian, Poisson and spike noise for different wavelengths. Therefore, a method utilising both the spatial and spectral information together, and based on the type of noise present in the data would be of great use for de-noising HS images.
Recently, a wavelength-specific shearlet-based image noise reduction method was proposed for de-noising of HS images in the remote sensing domain [11]. The method perfectly fits the needs of HS image de-noising by considering both the spatial and spectral correlations and also considering the types of noise present in different image planes. The method first identifies the type of noise present in the image planes via measurement of spectral correlation. Based on the spectral correlation, the method categorises the noise into low-level Gaussian noise or high levels of mixed noise. After identification of the type of noise, the non-subsampled shearlet transform (NSST) is then performed on each image plane. Later, to de-noise the low-level Gaussian noise wavelengths, the method assumes an additive noise model and performs spatial de-noising using the BayesShrink threshold method [20]. To denoise the high-level mixed noise wavelengths, the method utilises the NSST information from the neighbouring low-level Gaussian noise wavelengths. The shearlet coefficients of adjacent low-level Gaussian noise wavelengths are fused with the details of mixed noise wavelengths utilising a weighted linear combination criterion, which results in spectral de-noising. Finally, after de-noising, the inverse of the NSST is applied to reconstruct the image planes.
The aim of the present work is to present a wavelength-specific shearlet-based image noise reduction method [11] for HS image denoising and to test its potential for de-noising close-range HS images. Furthermore, the method was compared with two pixel-based spectral smoothing techniques, i.e., SAVGOL and median filtering. The potential of the method was tested using three different sets of HS images. The first two image sets comprised supervised images containing known amounts of Gaussian and mixed noise. The third image set was a real VNIR HSI dataset generated for the classification of six commercial tea products (oolong, black, green, yellow, Pu-erh and white). The performance of the de-noising techniques was evaluated through visual inspection, spectral correlation, peak signal-to-noise ratio (PSNR), and through classifications performed with a multi-class support vector machine (SVM). The PSNR was used to quantify the improvement in the spatial domain and the spectral correlation was used to quantify the similarity of the spectra after de-noising with the corresponding spectra in the absence of noise, i.e. the clean spectra (spectral domain).

Samples and imaging sensor
De-noising and classification experiments were performed with visible-near infrared (VNIR) hyperspectral images of six different commercial tea products, which were purchased from a local market (Glasgow, United Kingdom). The samples were obtained in airtight sealed packaging and stored at ambient temperature. All samples of tea were in loose-leaf form. Black, green and white tea were from Vahdam Teas (New Delhi, India), oolong tea was from Yamamotoyama (California, USA), Pu-erh tea was from The Tea Makers of London (London, UK) and yellow tea was of an unspecified Chinese origin. Each tea sample was transferred to a black plastic circular container (diameter = 3.3 cm, depth = 1.3 cm) for analysis. The six samples were placed adjacent to each other on the translation stage so that all six samples were imaged in a single measurement. Imaging was performed using a push-broom line scan HSI system comprised of a V10E spectrograph from SPECIM (Oulu, Finland) and a CCD camera (C8484-05C, Hamamatsu Photonics, UK). The HSI system was used to acquire spatial maps consisting of 1350 × 256 pixels over the spectral range 383-1000 nm with a spectral resolution of 2.45 nm. The pixel size of the CCD camera is 6.45 × 6.45 μm 2 . Lighting was provided by two 20 W halogen light sources. The distance from the lens to the translation stage was 30 cm, and the stage was controlled by an independent stage motor connected to the computer system (Zolix TSA 200 BF). The speed of the translation stage, ∼3 mm s −1 , was optimised using a checkerboard to avoid any distortion in the shape of the image arising from the overlapping of the spectral and spatial information. A single image, comprising more than 2000 pixels per tea sample, was acquired of the six tea samples using a frame rate of 21 fps and an exposure time of 5 ms.
The acquisition and management of data were performed using inhouse code developed in Matlab (R2016b, Mathworks Inc., Natick, United States). Before data analysis, the radiometric calibration of images was performed using white and dark references. The correction was performed for every pixel in the HS image according to Eq.
where I R is the calibrated reflectance, I raw is the raw intensity measured from the test sample, I dark the intensity of the dark response, I white is the intensity of the uniform white reference, and i and j are spatial coordinates and k is the wavelength in the image.
To demonstrate the effectiveness of the HSI de-noising method, two more sets of HS images were simulated by adding different types of noise to the VNIR images. The simulation was performed by manually reducing the VNIR hypercubes to the cleanest (smoothest) spectral profile range (546-791 nm). Of the two sets of simulated images, one set was simulated with a known amount of Gaussian noise (zero mean and 0.03 variance), and the other was simulated with mixed noise comprising a combination of Gaussian (zero mean and 0.03 variance) and spike noise (density of 0.08) at 20 randomly selected image planes. The term density of 0.08 here means that the reflectance will change from zero to one with a probability of 0.08. In the following text, mixed noise (MN) will be used to represent the combination of Gaussian and spike noise and Gaussian noise (GN) will be used to represent Gaussian noise. A summary of the sets of images analysed is presented in Table 1.

De-noising methodology
The de-noising methodology has three main steps in its implementation. The first step is to identify the type of noise present (lowlevel Gaussian noise or mixed noise) to choose the de-noising techniques for that particular wavelength. Different techniques here signify the different ways of using the shearlet coefficients for de-noising. The second step is to perform the sub-sampled shearlet transform on each image plane to capture the shearlet coefficients. The third step is to utilise the shearlet transform coefficients to perform de-noising individually for each wavelength based on the type of noise identified. The detailed methodology is explained in the following Sections (2.2.1-2.2.3).

Noise characterisation
In HSI, the noise varies from wavelength to wavelength and can range from simple low-level Gaussian noise to high-level mixed noise resulting from a combination of Gaussian and spike noise. The typical additive noise model for any image plane of a data cube (n × p × q) can be understood from Eq. (2): where Y is the recorded image plane containing the useful informative signal part (X) and the noise part (N). This assumed model is usually correct if the noise present in the plane is limited to Gaussian white noise, but this is not always the case. In the methodology presented here, the nature of the noise is assumed to be unknown. To find the type of noise present in image planes, the method utilises the correlation coefficient, R, between two image planes, Y k and Y k+r , as in Eq. (3): From the correlation measurement between two image planes, it can be understood that if the two image planes are very similar, then they will have a very high correlation coefficient. However, in the presence of noise, the correlation between the image planes will be significantly reduced. Furthermore, the greater the noise, the more the correlation will decrease. To differentiate between the low-level Gaussian noise and the high-level mixed noise, the threshold for the mean correlation between the image plane and its neighbouring image planes was set. The mean correlation was obtained by choosing a window, w 1 , containing 10% of the total number of wavelengths centred around the wavelength considered, as in Eq. (4).
The values obtained for the mean correlation for low-levels of noise will be very high. Depending on the amount and the complexity of the noise, the correlation will decrease. Therefore, the values will span a heavy-tail distribution for the R. For such a distribution, the median is already known to be the best estimator to represent the central tendency of the distribution [21]. In the present methodology, the median estimated from the distribution of the correlation coefficients was chosen to be the threshold and to classify the image plane as either a low-or high-level noise image plane.

Shearlet transform
After the classification of the image planes as low-level Gaussian noise or high-level and/or mixed noise, the NSST coefficients for each image plane were calculated independently. NSST is a special type of discrete shearlet transform that provides an additional feature of invariance to the shift of the input signal [22]. The shearlet transform is a special type of composite wavelet transform in which the mother wavelet matrix is an anisotropic dilation matrix along with the shear matrix, compared to the dilated matrix associated with the scale transformation and directional transformation in the composite wavelet transform. The composite wavelet function can be understood as Eq. (5), where ψ is the mother wavelet, M is an anisotropic dilation matrix, S is a shear matrix and j, l and k are scale, directional and shift parameters, respectively. In this work, we have limited the explanation to the NSST only, however, more detailed information on the composite wavelet transform and shearlet transform can be found in [22][23][24]. The implementation of NSST to decompose the image planes requires two steps. The first is the application of non-subsampled pyramid (NSP) filter banks and the other is the use of non-subsampled shearing (NSS) filter banks. A non-subsampled filter bank has no shift variant issues as there is no down-or up-sampling during the decomposition. Furthermore, the NSP filter gives the multiscale decomposition of the original image into high-and low-frequency sub-images of the same size as the original image. The NSS part of the NSST performs directional filtering in the spatial domain and decomposes the high-frequency sub-images into directional sub-images. For a typical application, the filter banks are applied in an iterative way where the lowfrequency sub-images obtained are again decomposed to lower scale high-and low-frequency sub-images, resulting in a multi-scale and multi-directional decomposition. An example of the multi-scale and multi-directional decomposition performed by the NSST can be understood with the three-scale decomposition shown in Fig. 1. In the present case, a three-level shearlet decomposition with 16, 8 and 4 shearing directions at scales of 1, 2 and 3, respectively, was performed.

De-noising image planes
The coefficients obtained from the NSST of the image are mostly very small and close to zero. But due to the presence of the noise, the sparsity of the matrix of NSST coefficients is greatly reduced. Therefore, to perform the de-noising with NSST, the aim is to re-attain the sparsity of the matrix of NSST coefficients. To do this, a threshold is used to distinguish the coefficients corresponding to noise from the coefficients containing signal information. Different techniques were used for denoising the image planes identified with low and high levels of noise. For the low-level noisy image planes, the threshold for the shearlet coefficients, T s t , , was determined assuming an additive noise model and utilising the BayesShrink method [20], as shown in Eq. (6): where the noise variance at scale s (s = 1…j) and direction t (t = 1…l) is given by σ N s t , , 2 and is estimated as mentioned in [25]. The standard deviation (σ s t , ) of the signal measured from the sub-image Y s,t at scale s and direction t is estimated as in Eq. (7): and i and j define the spatial coordinates and N is the maximum value of (n, p).
Once the image planes with a low level of noise were de-noised, then to de-noise the high and/or mixed noise image planes, the shearlet coefficients of the adjacent low noise level image planes were fused to the details of the mixed noise sub-images. The shearlet coefficients were used to replace the details of the sub-images of the mixed noise level image planes by the weighted average of the sub-image details of the 10% closest low noise level image planes. The weights used were inversely proportional to the distance between the neighbouring wavelengths as explained in Eq. (8): where w 2 are the adjacent LGN image planes, and D MN and D LGN correspond to de-noised mixed noise and low-level Gaussian noise image planes, respectively. To reconstruct the de-noised image plane, the inverse of the NSST was applied to the coefficients. Any further classification analysis was performed on the resulting de-noised images.

Savitzky-Golay smoothing
SAVGOL smoothing is a window-based technique that utilises different polynomial functions to smooth signals [13]. To perform smoothing with SAVGOL, a window of fixed size is chosen, centred on the signal point to be smoothed, and a polynomial is fitted to the variables within the window. The value of the central variable is replaced by the value calculated by the polynomial function. The window is moved point-by-point over the signal to perform the smoothing on the complete spectrum. The window size and the polynomial function are usually chosen manually, and the optimum choice is based on visual inspection of the spectral profile. For the present work, a second order polynomial and 15-point window were used. SAVGOL smoothing was performed using the PLS Toolbox (version 8.11, Eigenvector Research Inc., USA).

Median filtering
Median filtering belongs to the family of non-linear signal filtering techniques and is often used to deal with high levels of noise such as spikes in the data. Median filtering in the spectral domain can be understood as a moving window that replaces each observation with the median value of the observations inside the window. When the number of observations inside the window is odd the median is a single value, however when the number of observations is even, then the median is the average of the two middle values. In the present work, the median filter was employed by unfolding the (n × p × q) HSI array to give a (np × q) matrix, performing the median filtering with a 4-point window and later reshaping the matrix back to the cube. To perform the filtering, the "meadfilt1″ Matlab function was used.

De-noising performance
Spectral correlation and peak signal-to-noise ratio (PSNR) were used to quantify the performance of the de-noising methods in the spectral and spatial domains, respectively.
Spectral correlation provides a measure of the similarity of spectra after de-noising with the corresponding spectra in the absence of noise (i.e., the clean spectra), and was estimated via calculation of the correlation coefficient between the de-noised and clean spectra utilising the corr function in Matlab.
The PSNR was calculated as shown in Eq. (9): where the mean square error is given by and Y 1 and Y 2 are the two image planes to compare (i.e., the image plane after de-noising and the corresponding image plane in the absence of noise, respectively) and peakvalue is either specified by the user or selected from a range that is dependent on the image datatype (e.g. 255 for a uint8 image). The PSNR was calculated utilising the PSNR function in Matlab. To perform the classification experiment on the HSI data sets, support vector machine (SVM) classifiers were developed. SVM utilises the hyperplanes to define the decision boundaries to perform classification. Furthermore, to deal with the data complexity, the SVM performs high dimensional mapping of the data using kernel functions. Mapping to higher dimensions is usually carried out to make the data linearly separable. Furthermore, in the high dimensional space, the choice has to be made to select the hyperplane that provides the largest separation of the classes. As in our case we have six different classes corresponding to six different tea products, the traditional SVM binary classifier was combined with the error correcting output code (ECOC) ensemble method. The ECOC deals with the multiclass problem by creating several independent binary classification models.
The SVM along with ECOC was implemented in Matlab via the Statistics and Machine Learning Toolbox (R2016b). The coding design used for the ECOC-SVM model was one-vs-one, where a model was developed with one class being assigned positive and another class being assigned negative and all other classes were neglected. The algorithm exhausts all combinations of class pair assignments leading to k (k-1)/2 models, where k is the number of classes to be considered. High dimensional mapping of the data was performed with a radial basis function (RBF) kernel function with a scale parameter of 2.5. The spectra from the images for six different classes were selected in a supervised way using the "roipoly" function in Matlab. The "roipoly" function provides the graphical interface to manually extract the information from the data cube. For each individual tea class, spectra were extracted from 200 pixels, which were selected at random from the image collected, leading to a total of 1200 spectra for classification model development. For the development of a robust model, the model was cross-validated with a 10-fold cross-validation method. In 10-fold cross-validation, the calibration data is divided into ten equal parts. For making the model, 9 out of 10 parts were used and to cross-validate, the 10 th part was used. This was then repeated ten times, and the average prediction accuracy was recorded. The whole process was performed with 100 iterations and the mean accuracy and standard deviation were recorded. Fig. 2 presents the spectrum extracted from the raw reflectance (2a), GN added to reflectance (2b) and MN added to reflectance (2c) data from a spatial location at (370, 135) in the simulated hyperspectral image. The spectral noise was added at 20 different random wavelengths over the data cube and its effect can be seen in Fig. 2(b,c) when compared to the raw reflectance profile in Fig. 2(a). Fig. 3 presents the effect of different de-noising methods applied to the spectra shown in Fig. 2. Fig. 3(a,b) shows the results of SAVGOL smoothing, with a 15 point window and a second order polynomial, of the spectra with Gaussian and mixed noise added. Fig. 3(c,d) shows the results from application of median filtering with a 4 point window. Fig. 3(e,f) illustrates the result of utilising the shearlet-based de-noising methodology. The spectra in red in Fig. 3 represent the clean reflectance profiles, while the solid blue and yellow lines represent the noisy and de-noised spectra, respectively.

Noisy and de-noised spectra from simulated images
In Fig. 3, it can be seen that the shearlet-based de-noising method outperformed SAVGOL smoothing and median filtering. The reason for the poor performance of SAVGOL smoothing in the case of simulated noise can be understood as being due to the window size and the smoothing function used. Since SAVGOL smooths each spectrum by fitting a polynomial to a window of adjacent wavelengths, if noise contributes significantly to several of the wavelengths, then the polynomial fitting will be less effective. Median filtering works better than SAVGOL smoothing since even if there are several outlier intensity values in the spectrum, they will not have as much influence on the smoothed value. Nevertheless, there are peaks to be found in the spectra after median filtering, due to the presence of several peaks within the window used for the smoothing, resulting in the median value being influenced by this noise. Furthermore, the peaks resulting from median filtering now appear at new wavelengths. This is because median filtering is performed for each wavelength resulting in transferral of noise to wavelengths adjacent to the noisy wavelengths. However, in the shearlet-based methodology, the de-noising of the waveband is performed using the shearlet coefficients of the adjacent low GN image planes, therefore, the high-intensity noise wavelengths do not affect the spectral de-noising as in the case of SAVGOL smoothing and median filtering. This is because the high-intensity noise wavelengths have no influence when performing the de-noising since they are not taken into account in the calculation of the weighted average of shearlet coefficients.
In Fig. 3(e,f), it can be seen that some small differences can still be found in the spectrum after de-noising. The reason for these small disturbances can be understood as resulting from the averaging of the shearlet coefficient, especially when the automatically selected GN image planes are distant, as averaging with the shearlet coefficients of these GN planes leads to small disturbances in the spectral profile. However, these disturbances are minute compared to the noise present in the spectrum after smoothing with SAVGOL or median filtering. To quantify these small disturbances and the spectral similarity, spectral correlation was used. Fig. 4 presents the spectral correlation calculated between a single spectrum extracted at a pixel location of 370,135 from, on the one hand, the noisy, SAVGOL, median filtered and shearlet de-noised images, and, on the other hand, the clean image. The blue dashed line and red solid line shows the GN and MN cases, respectively. The spectral correlation was calculated to quantify the similarity of the spectral profiles obtained from different de-noising techniques. It can be seen in Fig. 4, that the spectral correlation for the spectrum with GN added was always high for all the de-noising techniques compared to the spectrum with MN added. The reason is that the MN noise is much more complicated, resulting in a higher number of noisy wavelengths in the spectrum. Furthermore, the new shearlet-based method gave the highest correlation of 99.9% followed by median filtering and then SAVGOL smoothing. The plane corresponding to 588 nm is the de-noised band whereas 586 nm represents the clean adjacent band affected by the de-noising methods. Fig. 5(a,e) presents the original reflectance image plane with Gaussian noise added, Fig. 5(b,f) the SAVGOL smoothed data, Fig. 5(c,g) the median filtered data and Fig. 5(d,h) the image planes after application of the shearlet-based denoising method. The six circular objects in the image represent six different commercial tea products, i.e. oolong, black, green, yellow, Pu-erh and white. The presence of the Gaussian noise can be seen as a 'fog' over the image plane ( Fig. 5(a)). It can be seen clearly by visual inspection that the shearlet-based de-noising method outperformed both SAVGOL and median filtering to give a clearer image. Fig. 6 presents the image planes (for 588 and 586 nm) with MN added used in the simulation studies. Fig. 6(a,e) presents the reflectance image plane with MN added, Fig. 6(b,f) the SAVGOL smoothed data, Fig. 6(c,g) the median filtered data, and Fig. 6(d,h) the image planes after application of the shearlet-based de-noising method. The presence of mixed noise (Fig. 6(a)) can be seen as a 'fog' accompanied by some high-intensity (bright) pixels resulting from the spike noise. The shearlet-based de-noising method clearly outperformed SAVGOL and median filtering. However, median filtering seems to provide better results than SAVGOL smoothing for image planes containing GN or MN. This is because SAVGOL smoothing dilutes the noise of several consecutive wavelengths by fitting the polynomial whereas the median filter is calculated directly based on the intensities present inside the window resulting in a better de-noising. However, both the median filter and SAVGOL smoothing also affect the consecutive wavelengths by spreading the noise. The spreading is more important when the noise is present in consecutive wavelengths. In the case of SAVGOL smoothing, the noisy wavelengths dominate the shape of the polynomial and lead to spreading of the noise to the consecutive Fig. 3. Noisy and de-noised spectra from pixel location (370, 135), the red line shows the clean spectrum, the blue line shows the noisy spectrum and the yellow line shows the spectrum after de-noising (a). SAVGOL smoothing filter applied to the spectrum with Gaussian noise added, (b). SAVGOL smoothing filter applied to the spectrum with mixed noise added, (c). Median filtering applied to spectrum with Gaussian noise added, (d). Median filtering applied to spectrum with mixed noise added, (e). de-noising with shearlet-based method for spectrum with Gaussian noise added, and (f). de-noising with shearlet-based method for spectrum with mixed noise added. (For interpretation of the references to colour in this figure legend, the reader is referred to the web version of this article).

Noisy and de-noised image planes from simulated images
wavelengths. In the case of median filtering, a larger number of noisy wavelengths in a small window affects the calculation of the median. Also, median filtering does not use any measure to pre-identify the amount of noise. Therefore, the correction is performed for each wavelength, which results in a distribution of the noise to the adjacent wavelengths. The spreading of the noise can be seen in Figs. 5(f) and (g), 6 (f) and (g) resulting from SAVGOL and median filtering, respectively, when compared to Figs. 5(e) and 6 (e), the clean image planes. However, this was not the case with the shearlet-based method as it does not affect any adjacent wavelengths because the method starts with identifying the wavelengths containing noise resulting in wavelength-specific de-noising.
The potential of the shearlet-based de-noising method is further quantified using the PSNR as presented in Fig. 7. The PSNR represents the ratio of the maximum possible signal intensity to the corrupting noise present in the signal. Fig. 7 shows the PSNR for the 20 randomly selected image planes used to simulate the noise. The PSNRs were estimated taking the raw reflectance image planes with no added noise (i.e. clean) as the reference for the de-noised image planes. It can be seen in Fig. 7 that the shearlet-based de-noising method (in yellow and sky-blue) increased the PSNR and attained the highest levels for the majority of image planes where GN and MN had been added. Median filtering increased the PSNR more than SAVGOL smoothing and was as effective as the new shearlet-based de-noising method for several wavelengths. The PSNR obtained from the shearlet-based de-noising methodology was the same for data containing GN (in yellow) and MN (in sky blue). The PSNR obtained for SAVGOL smoothed data with added noise was higher for GN (in dashed thick blue) compared to MN (in dashed purple). Similarly, the PSNR obtained with the median filter was higher for data with added GN (in thick dashed and dot red) at several wavelengths compared to MN (in dashed and dot green). The reason for this is that MN is more complex than GN and therefore with median filtering and SAVGOL smoothing, the PSNR increased more for the GN than for the MN case. The improved PSNR indicates that the signal contains more information compared to the noise and that data modelling based on the improved PSNR should be more successful. However, a higher PSNR does not guarantee successful modelling because the affected adjacent wavelengths resulting from the de-noising method is also a concern. The improvement in the classification performance of the SVM classifier after de-noising can be noted in Table 2. The shearlet-based de-noising method gave the highest accuracy of 87.37 ± 0.30% and 87.36 ± 0.35% for data cubes with added GN and MN, respectively. The median filter was second in terms of classification accuracy with comparable accuracy obtained for the data cubes with added GN (79.00 ± 0.48%) and MN (78.23 ± 0.47%). SAVGOL smoothing resulted in the lowest accuracy of 57.18 ± 0.52% and 49.77 ± 0.44%, for the data with GN and MN added, respectively. For the data containing MN and subjected to SAVGOL smoothing, the accuracy was even lower than for the data containing MN (56.21 ± 0.41%). The poor performance of SAVGOL smoothing is due to the spreading out of the effect of the noise over several consecutive wavelengths by the     Table 2 SVM classification model accuracies (%) obtained using different de-noising methods with simulated and real datasets (see Table 1). The value given is the mean ± one standard deviation resulting from 100 iterations of a 10-fold cross validated model. fitting of the polynomial inside the window. This reason can also explain the better performance of median filter compared to SAVGOL smoothing.

De-noised VNIR image set spectra
Fig. 8 presents a complete spectrum (383-1000 nm) extracted from the raw reflectance and the de-noised VNIR hypercubes for the six different commercial tea products. Fig. 8(a) presents the spectrum after SAVGOL smoothing, Fig. 8(b) after median filtering and Fig. 8(c) shows the results of the shearlet-based de-noising method. The red line depicts the spectrum after the de-noising treatment, with the raw reflectance spectrum given in blue. It can be seen that the raw reflectance spectrum (blue) contains noise at different wavelengths over the complete range, especially at the beginning (383-500 nm) and end (900-1000 nm) of the spectrum. In Fig. 8(a) it can be seen that SAVGOL filter smooths the spectrum quite well from 500 to 900 nm. However, at the edges of the spectrum, SAVGOL smoothing does not work as well. In Fig. 8(b) it can be seen that median filtering also performed well between 500-900 nm, but was less effective than SAVGOL smoothing at the very noisy ends of the spectrum. In the case of the shearlet-based method (Fig. 8(c)), it can be seen that the spectrum around the edges (383-500 nm and 900-1000 nm) is smoother compared to both SAVGOL and median filtering. This is because the shearlet-based de-noising method utilises the shearlet coefficients of the neighbouring low Gaussian noise image planes to perform the weighted averaging. With the algorithm, a total of 59 wavelengths were identified as containing noise, and were de-noised automatically.
The VNIR (383-1000 nm) reflectance spectral profiles of food products contain different chemical information such as pigments, moisture content, and physical information such as particle size. Noisefree spectral features related to this information must be extracted to form the basis for the success of any classification modelling. In the case of tea products, the VNIR spectra contain information related to different chemical components. Some key wavelengths identified in a previous study were 485 nm corresponding to the total liquor colour, 522-625 nm to thearubigins constituent group TRS1, 688 and 732 nm to thearubigins, 706 nm to total polyphenols, 743 nm to liquor brightness and 745 nm to theaflavin [26]. Fig. 9 presents the noisy and de-noised image planes for the real VNIR cube. Fig. 9(a) shows the raw noisy image planes from the VNIR hypercube corresponding to 405 nm, where intense noise is visible. The chosen waveband used to display the image plane was automatically identified by the algorithm for de-noising. Fig. 9(b)-(d) present the results of de-noising with SAVGOL smoothing, median filtering and the shearlet-based de-noising method, respectively. In Fig. 9(d), it can be seen that after de-noising the image planes reveal six different tea samples thus demonstrating the improved performance of the presented de-noising method over SAVGOL ( Fig. 9(b)) and median filtering (Fig. 9(c)).

SVM classification on VNIR image set
The results of classification (Table 2) showed an improvement in the classification accuracy after utilising the shearlet-based de-noising method compared to raw reflectance, SAVGOL smoothed and median filtered data. The reason for this is the high signal-to-noise ratio obtained after the de-noising of the image planes. The accuracy of the model was increased from 42.90 ± 0.44% to 78.27 ± 0.55% after denoising with the shearlet-based methodology. For SAVGOL smoothing, the classification accuracy increased to 67.44 ± 0.55% and for median filtering, it increased to 59.67 ± 0.56%. In this case, SAVGOL  smoothing outperformed median filtering because at the two ends of the spectra there are many consecutive noisy peaks that are taken into account in the calculation of the median value.

Conclusions
The data generated from HSI often contain noise. For an efficient data processing strategy, it is important to deal with the noise present in the data by either reducing or removing it. However, simply removing noise (wavelengths) can lead to loss of information from the dataset. Therefore, exploiting different data pre-processing techniques to reduce the noise in the datasets is always the better option as the information in the dataset is largely retained.
Commonly used methods such as SAVGOL smoothing and median filtering can deal with a small amount of noise and if the noise is not present in neighbouring wavelengths. However, when the noise level increases and the noise is present in consecutive wavelengths, SAVGOL smoothing and median filtering can result in distorted spectral profiles leading to spreading of the noise to adjacent wavelengths. Furthermore, both SAVGOL smoothing and median filtering are performed for every waveband as no steps are present to determine the wavelengths that need to be de-noised automatically. This, results in over-smoothing of the spectral profiles. Another drawback is the need to determine the correct window size, which is often performed manually. Different window sizes might result in an improvement in the de-noising results, however, visual inspection is needed to select the optimum window size.
However, the presented shearlet-based technique deals with the noise in an intelligent, fully automatic way by first classing the wavelengths into non-noisy, low GN and MN, thus reducing the chances of over-smoothing of the wavelengths and spreading of noise to adjacent wavelengths. The method then deals with the spatial and spectral noise synergistically and also adapts to the type of noise present in the data. The de-noising of the low GN noise wavelengths is performed through retention of the sparsity of the shearlet coefficients, which is completely independent of other wavelengths. Finally, to de-noise the MN wavelengths, the method fuses the information from the shearlet coefficients of the neighbouring GN wavelengths. This study also demonstrated the potential of the shearlet based de-noising methodology as seen in the visual improvement of image planes, the increase in spectral correlation, the increase in PSNR for image planes and the improved classification accuracy of the multi-class SVM model, compared to SAVGOL smoothed and median filtered data. The shearlet-based methodology is a useful technique for automatic de-noising of close-range HSI data where the spectral domain exhibits broad signals (i.e., information from adjacent spectral bands is correlated). Hence, this methodology will provide new opportunities for the use of a wide variety of HSI techniques, e.g. NIR, UV, visible and fluorescence, for real-time decision making such as in a process environment.