Denoising Raman spectra by Wiener estimation with a numerical calibration dataset

: Most denoising methods that are currently used in the processing of Raman spectra require signiﬁcant user interaction in order to optimize their performance across a range of signal-to-noise ratios. In this study, we proposed a method based on the principle of spectral integration followed by Wiener estimation using a numerical calibration dataset, which eliminates the need of experimental measurements for calibration as in the previous Wiener estimation based denoising method. The new method was tested on three types of samples, including a phantom sample, human ﬁngernail and leukemia cells. Compared to two common denoising methods, i.e. moving-average ﬁltering and Savitzky-Golay ﬁltering, the performance of the proposed method is signiﬁcantly less sensitive to the choices of parameters. Moreover, this method provides comparable or even better denoising performance in the cases with low signal-to-noise ratios.


Introduction
Raman spectroscopy has been well accepted as a non-destructive analytical tool for chemical and biological analysis in the last few decades. It is a powerful technique for both qualitative and quantitative characterization of materials including biological samples. However, the Raman signal is inherently weak, making interesting Raman peaks susceptible to noise, especially in biological samples. There are two major sources of noise: the camera and signal itself (shot noise) [1]. As one of the most important preprocessing steps, noise reduction is usually performed before any subsequent Raman spectral analysis. The reduction of noise and recovery of the clean Raman signal from Raman measurements with low signal-to-noise ratio (SNR) is a fundamental task to obtain the valuable characteristics of the sample under study.
Smoothing algorithms are commonly used to reduce noise; however, spectral information might be lost in the smoothing process because smoothing removes noise by essentially reducing the spectral resolution of original measurements [2]. Among various smoothing methods, moving-average (MA) filtering is one of the simplest smoothing algorithms where the value of each spectral point is replaced by the average value of all points in a predefined spectral window centred on it. However, it works well only for those spectra with very broad features [2], and the performance is highly dependent on the proper selection of window length. Another commonly used smoothing method for reducing noise in Raman spectra is the Savitzky-Golay (SG) algorithm [1,3]. SG smoothing is conducted by fitting each segment of the original Raman spectrum within a specified window to a polynomial function [4]. The main performance index of the SG filter is determined by the polynomial order and window length [5]. Although this method is useful in noise reduction, it may also suffer from the degradation of the underlying Raman spectral features [1,[5][6][7]. Particularly, a short window length results in a poor smoothing performance, while a long window length may degrade the spectral resolution due to distortion of the weak spectral features. Therefore, a prudent selection of the polynomial order and window length is of great importance to ensure a tradeoff between denoising performance and spectral resolution, which highly depends on a user's experience. Another category of smoothing methods such as Fourier transform (FT) filtering [8] and wavelet transformation [9][10][11][12] are conducted in the frequency domain. These methods decompose a noisy Raman spectrum into high-and low-frequency can recover Raman spectra from noisy measurements with different noise levels, and the method is demonstrated to be significantly less sensitive to the selection of involving parameters than the common SG method. The proposed method provides denoising performance comparable to the SG method and surpasses the MA method especially in low SNR cases.

Methods and materials
In this study, we define the SNR as in Eq. (1) to evaluate the level of noise in a spectrum: where E refers to the root-mean-square intensity of the reference spectrum, which is supposed to be free from noise, calculated according to Eq. (2), and all S Ref (λ i ) refers to the spectral intensity at wavenumber λ i (i = 1, . . . , N) in the reference spectrum. σ is the root-mean-square-error of the noisy Raman spectrum relative to the reference Raman spectrum calculated according to Eq. (3), where S Noisy (λ i ) is the intensity values of the noisy Raman spectrum at the wavenumber λ i . The fluorescence background is not removed from the reference and noisy Raman spectra prior to the SNR calculation; however, their intensity values are scaled such that the maximum and minimum values in each spectrum fall in the same range.

Spectral reconstruction by Wiener estimation based on a numerical calibration dataset (NCD-WE)
Denoising of Raman measurements using Wiener estimation involves two stages. The first stage is to create a Wiener matrix from the calibration dataset, i.e. the calibration stage, and the second stage is to reconstruct a denoised spectrum from noisy Raman measurements in the test dataset using the obtained Wiener matrix, i.e. the reconstruction stage. Stage 1: Calibration In this work, the calibration dataset is designed to include two sets of spectra, i.e. clean spectra and noisy spectra. The set of clean spectra is made up of a series of basic components (BCs) spectra and each component spectrum contains a single peak numerically generated using a Gaussian function as shown in Eq. (4). The wavenumber range and increment in the basic components spectra should be identical to those of the measured spectra in the test dataset.
where I max is the maximum peak intensity with two possible values, i.e. 1 and 2. λ refers to wavenumber (cm −1 ), the range of which depends on that of the spectra to be reconstructed in the test dataset. λ 0 refers to the central peak position. σ is related to bandwidth varied within the interval 10-30 cm −1 with an increment of 5 cm −1 .
In addition, the set of noisy spectra in the calibration dataset is obtained by adding one of two kinds of random noises to each clean spectrum: 1) Gaussian noise (GN) spectrum generated using the function 'awgn' in MATLAB, yielding an SNR range of 1.51 to 1.77 for the noisy spectra in the calibration dataset according to the definition in Eq. (1); 2) Poisson noise (PN) spectrum generated using the function 'poissrnd' function in MATLAB, where the original clean spectrum after scaling with respect to intensity is used as the rate parameter to specify the Poisson distribution at each data point. The scaling factor is set to 10 2 so that the SNR range of noisy spectra in the calibration dataset is varied from 1.26 to 3.09 according to the definition in Eq. (1). Although Gaussian noise is commonly used as random noise model, the Poisson noise model has been confirmed to be more suitable in Raman spectroscopy [16]. Therefore, we considered both cases to be comprehensive. In addition, we have verified that no significant difference exists in the denoising performance of our method as long as the SNR of the noisy spectra in the calibration dataset is lower than those of the spectra in the test dataset.
Then, principal-component analysis (PCA) is performed for the noisy spectra in the calibration dataset according to Eq. (5). The process compresses the spectra to reduce data dimension.
where S c, L is an (m × n) matrix of the noisy Raman spectra with low SNR, m represents the number of spectra, and n represents the number of discrete wavenumbers in the Raman spectrum. Subscript c refers to the calibration dataset and L refers to low SNR. The (n × l) matrix F is the transmission spectra of principal components (PCs) based filters, and l is the number of PCs. A truncated (m × l) matrix D c , denoted as the PC score matrix, is thus obtained by mapping the original spectra matrix of n variables to a new space of l variables. Matrix D c will be used for Wiener estimation in the reconstruction later. Subsequently, Wiener matrix W is created from the calibration dataset according to Eq. (6) [17].
where S c, H presents the set of clean spectra with high SNR in the calibration dataset, and D c is the PC score matrix obtained earlier according to Eq. (5). E[·] function refers to ensemble average, and the superscripts T and −1 represent matrix transpose and matrix inverse, respectively. The Wiener matrix W associates the clean Raman spectra with the PC scores obtained from the corresponding noisy Raman spectra by PCA, which partially contributes to the denoising power of our method. As described above, the established numerical calibration dataset is made up of a series of clean and noisy spectra each with a single Gaussian peak at the same peak location. One of the general accepted assumptions in the quantitative analysis of Raman spectroscopy is that the spectrum of the sample is the linear summation of spectra of all basic components present in the sample [18]. Simplified the assumption to a linear model, a spectrum with multiple peaks can be modelled as the linear summation of several spectra with single peak after proper spectral shifts. It can be predicted that Wiener matrices obtained from calibration datasets in which the spectra possess different central peak locations are identical in column vectors except the differences in central peak location. Therefore, the Wiener matrix obtained from a calibration dataset with multiple peaks in spectra is equivalent to the linear summation of all Wiener matrices obtained from a series of calibration datasets each with a single peak in spectra after the peaks in the column vectors of the Wiener matrices are spectrally shifted along the spectral dimension properly. Figure 1 demonstrates two Wiener matrices obtained from calibration datasets with λ 1 and λ 2 as the central peak locations, respectively. The upper panel shows the clean and noisy spectra in the calibration dataset, and the obtained Wiener matrix according to Eq. (6) is presented column-wise in the lower panel, where each curve corresponds to one column vector of the Wiener matrix.
Stage 2: Reconstruction In the reconstruction stage, the corresponding denoised Raman spectra are reconstructed from Raman spectra measured with low SNR in the test dataset using the Wiener matrix obtained in the calibration stage. This stage involves two steps. First, dimension reduction using PCA is performed to the low-SNR Raman measurement to yield the PC score matrix, according to Eq. (7).
where S t, L is a (m × n) matrix of the low-SNR Raman spectra, m represents the number of spectra, and n represents the number of discrete wavenumbers in the Raman spectrum. Subscript t refers to the test dataset and L refers to low SNR. F is the transmission spectra of PCs based filters, and it should be noted that the number of PCs is the same as that of the calibration dataset. PC score matrix D t will be used for Wiener estimation in the following step. Then, the reconstruction is performed step by step using the Wiener matrix obtained from the calibration dataset after spectral shifting with a shifting increment of 1 cm −1 . Take the central location λ j as an example, the reconstructed spectrumŜ λ j is obtained according to Eq. (8), then only values falling in a spectral window with a length L and a central location of λ j are retained, all values outside the window are set to 0. Issues regarding the determination of a proper window length L will be discussed in the subsequent section below.
where D t is the PC scores of the test dataset calculated according to Eq. (7). W λ j refers to Wiener matrix obtained from the calibration dataset with a central location at λ j . The final reconstructed spectrum is achieved by averaging allŜ λ j as shown in Eq. (9) whereŜ is the reconstructed spectrum and N refers to the total number of shifted Wiener matrix. N should be equal to the range of wavenumber in Raman spectra divided by the shifting increment (1 cm −1 in this work).

Sample preparation and Raman measurements
In this study, Raman measurements from three samples, a phantom, human fingernail and leukemia cells, were utilized to validate the effectiveness of the numerical calibration dataset based wiener estimation method (NCD-WE) for noise reduction. The results of the proposed method were compared with two traditional denoising techniques, i.e. Savitzky-Golay filtering (SG) and moving-average (MA) filtering. Parameter determination of the three methods will be discussed in the first part of "Results" section below. The phantom was made by dissolving monosodium phosphate (20233-1 KG, Affymetrix, USA) in distilled water to create a solution with a concentration of 1M. The Raman spectra of phantom were measured over a wavenumber range from 600 to 1800 cm −1 with an increment of 2 cm −1 , using a micro-Raman system (InVia, Renishaw, U.K.) coupled to a microscope (Alpha 300, WITec, Germany) with a 20× objective lens. The excitation wavelength was 785 nm (HPNIR785, Renishaw, UK). The maximum laser power reaching the samples was approximately 218 mW. Spectra with four levels of SNR were collected. This was achieved using 10%, 1%, 0.1%, and 0.05% of the maximum laser power, respectively, and the exposure time was kept at 10s. In addition, a reference Raman spectrum (with high SNR) was collected from the sample using 50% of the maximum laser power with an exposure time of 20 s and 3 accumulations.
Raman measurements were taken from a human fingernail sample ex vivo using the same system as above with laser light focusing on the surface of the fingernail sample. Spectra with four levels of SNR were collected. This was achieved by using 1%, 0.5%, 0.01%, and 0.05% of the maximum laser power, respectively, and the exposure time was 10s. A reference Raman spectrum (with high SNR) was also collected from the sample using 50% of the maximum laser power with an exposure time of 20 s and 3 accumulations.
Leukemia cells (CCL-243, USA) were cultured in Iscove's Modified Dulbecco's Medium (IMDM) added with 10% fetal bovine serum and incubated under a 37°C with 5% CO 2 circumstance. It should be noted that the cells were washed twice and immersed in PBS before Raman measurements to reduce fluorescence background originated from culture media. Raman measurement of cells was conducted using the same system as above. Spectra with four levels of SNR were collected using 10%, 5%, 1%, and 0.5% of the maximum laser power, respectively, and the exposure time was 10s. A reference Raman spectrum (with high SNR) was also collected from the sample using 50% of the maximum laser power with an exposure time of 60 s and 3 accumulations. The state of the cells was fine according to our visual observation and spectral evaluation. In fact, the integrity of cells has been confirmed in the literature when exposed to 115 mW laser power at 785 nm for more than one hour [19].
The parameters of all experimental measurements and the resulting SNRs of measured spectra are summarized in Table 1. The noisy spectra in the test dataset with four noise levels are denoted as P1 -P4 for the phantom sample, F1 -F4 for fingernail, and L1 -L4 for leukemia cells, respectively, hereinafter.

Performance assessment
The performance of noise reduction is evaluated according to the relative root mean square error (rRMSE) as defined in Eq. (10).
whereŜ is the reconstructed spectrum from the Raman measurement with low SNR in the test dataset. S ref refers to the reference Raman spectrum with high SNR. λ i (i = 1, . . . , N) refers to the i-th wavenumber, and the function max[·] provides the maximum value in the entire spectrum.
It should be noted that the intensity values of both the reconstructed spectra and the reference spectrum went through z-score normalization to have a mean of 0 and a standard deviation of 1 before calculating rRMSE. The purpose of normalization is to minimize the difference in spectral intensities due to varied measurement parameters. In addition, fluorescent background was also removed using a wavelet algorithm [20] to facilitate comparison and eliminate the influence of the background.

Optimization of parameters
In addition to NCD-WE, another two widely utilized denoising methods in Raman spectroscopy, Savitzky-Golay filtering (SG) and moving-average filtering (MA), were evaluated on the Raman measurement in low SNR for comparison. Because the performance of a method depends on both the parameters involved and the SNR of the test spectra, we analysed the performance of each method according to the rRMSE of the denoised spectrum relative to the reference spectrum, using a range of values for every parameter for each method. In particular, the parameters involved in NCD-WE include the number of PCs and window length. The SG method also involves two parameters, i.e. polynomial order and window length. It is important to note that SG smoothing algorithm performs properly only if the window length is odd and the polynomial order is designated a value smaller than the window length [6]. In an earlier study [6], a polynomial order varied from 1 to 11 and a window length varied from 3 to 25 were adopted to test the performance of SG smoothing filter for denoising the Raman spectra of non-structural protein 1, a biomarker for flavivirus origin diseases. Results demonstrated that an optimal polynomial order of 3 and a window length of 13 provided the best smoothing effect, meanwhile, preserved most characteristics in the original Raman spectra. Inspired by the observations in this study, a polynomial order ranging from 3 to 9, and a window length ranging from 5 to 95 were considered to ensure that at least one combination of these parameters would yield the optimal performance in denoising spectra with various SNR. For the MA method, window length is the only adjustable parameter [21]. The range and increment of each parameter in every method are summarized in Table 2, which cover most commonly used values. Figure 2 and 3 display the rRMSE distribution of denoised Raman spectra measured from the phantom sample using the NCD-WE method for the ranges of number of PCs and window length and that using the SG method for the ranges of order and window length, respectively, as listed in Table 2. The graphs from left to right correspond to P1, P2, P3 and P4, respectively, in both figures. Here, we only demonstrate the rRMSE distribution of NCD-WE method with GN added in the calibration dataset. As for PN, the best and average performance in terms of rRMSE are summarized in Table 3. The window length appears to be an important parameter. The best denoising performance was achieved when the window length values of 5, 7, 11 and 19 were used, respectively, for P1, P2, P3, and P4. However, the method is relatively insensitive to the number of PCs for denoising Raman spectra measured from the phantom sample, especially in the range of 5 to 9. This can be clearly seen from the observation that the maximum rRMSE value is not 86% greater than the minimum rRMSE value at each SNR. In contrast, the performance of SG relies highly on window length especially for lower SNR, e.g. P3 and P4. Only a lower order combined with a longer window length can guarantee a reliable result for P3 and P4, while a short window length (less than 30) works very poorly. There is an obvious trend that a longer window length yields a better result for lower SNR. The maximum rRMSE value can be anywhere from 180% to 406% greater than the minimum rRMSE value across all SNR. Figure 4 displays the rRMSE of denoised Raman spectra measured from the phantom sample using the MA method for a range of window lengths. The lower the SNR, the longer the window length required to yield better denoising performance.

Fig. 2.
Relative RMSE (rRMSE) of denoised Raman spectra measured from the phantom sample using the NCD-WE method (calibration dataset with GN). Graphs from left to right correspond to P1, P2, P3 and P4.
The rRMSE distribution of denoised Raman spectra measured from the human fingernail and leukemia cells using the three methods are not shown here due to limited space. In general, the performance of NCD-WE is relatively consistent in the entire range of selected parameters according to the observation of smaller range of rRMSE values compared to that for the SG method. The two parameters of SG method are dependent with each other resulting a noticeable pattern. For MA method, it is observed that a short window is more suitable for dealing with spectra with high SNR, while a long window yields better performance in low SNR cases.

Denoising performance
The best and average denoising performance based on rRMSE of the NCD-WE, SG and MA methods are summarized in Table 3. Relative RMSE (rRMSE) of denoised Raman spectra measured from the phantom sample using the SG method. Graphs from left to right correspond to P1, P2, P3 and P4. Notice the presence of NAN (not a number) values at the right corner is due to the fact that the SG algorithm cannot be properly conducted when the window length is smaller than the polynomial order. Fig. 4. Relative RMSE (rRMSE) of denoised Raman spectra measured from the phantom sample using the MA method. Graphs from left to right correspond to P1, P2, P3 and P4. In general, the MA method yields the worst performance in nearly all cases except for P1 and P2 for the phantom sample. The SG method with optimal parameters works better than the NCD-WD method for denoising spectra with higher SNR. However, the NCD-WE method is comparable to or even better than the SG method in denoising spectra with lower SNR. From the average rRMSE data we can see that the performance of NCD-WE is relatively consistent because its average rRMSE spans a smaller range from SNR 1 to SNR 4 compared to the SG method. Comparing NCD-WE (GN) and NCD-WE (PN), the performance of NCD-WE (PN) is slightly better for the phantom sample and leukemia cells in higher SNR cases (e.g. P1, P2 and L2) but worse in lower SNR cases (e.g. P3, P4, F3 and L4). Figure 5 presents the best denoised results of Raman spectra measured from the phantom sample using the NCD-WE (GN) method. Graphs from left to right correspond to P1 -P4 respectively. In the upper panel, the blue line shows the low-SNR test spectrum, and the red dot line shows the denoised spectrum. In the lower panel, the grey line and the red dot line display the reference spectrum and the denoised spectrum in both of which fluorescence background has been removed using a wavelet algorithm [20] for easy comparison. It can be observed that the spectral shape and the positions of main peaks are accurately recovered using the NCD-WE method, even in the very low SNR case (P4), where the peaks are overwhelmed by noise in the original test spectrum. Another observation is that the major peak intensity values are slightly underestimated, but this problem could be ameliorated by performing normalization on the denoised spectrum. In addition, some artificial peaks are observed in the relatively flat regions in cases of P3 and P4. The best denoised results of Raman spectra measured from human fingernail and leukemia cells are not shown here due to limited space. Generally, the spectral shape and the main peaks of the spectrum are nicely reconstructed, except the very narrow peak at 1000 cm −1 peak in the spectra of leukemia cells. Only some small spectral features are not accurately recovered.

Discussion
In this study, we develop a denoising method using Wiener estimation based on a calibration dataset created numerically rather than experimentally, to overcome the disadvantages of the latter approach in cost, complexity and applicability because experimental measurements of real samples can be costly and the Wiener matrix obtained in this manner is specific to calibration samples. The clean spectra in the numerical calibration dataset is composed of a series of basic component spectra each with a single Gaussian peak possessing a different width. The noisy spectra in the numerical calibration dataset are obtained by adding random noise to the clean spectra. Two types of commonly used noises are taken into consideration, i.e. Gaussian noise (GN) and Poisson noise (PN). An actual spectrum with multiple peaks is treated as the linear summation of several basic components spectra each with a single Gaussian peak after proper shifting in the central location. A similar treatment in which the Raman spectrum of a sample is the linear summation of the spectra of all basic components has been applied earlier in cell and tissue spectra studies [18,22,23]. Furthermore, we prove that the Wiener matrix obtained from calibration spectra with multiple peaks is equivalent to the linear summation of all Wiener matrices obtained from those calibration spectra each with a single peak after proper peak shifting along the spectral dimension. This finding significantly reduces the complexity and time of generating Wiener matrices.

Guideline on the parameter selection
Our method involves two parameters that may affect the denoising performance. One is the number of principal components in PCA, and the other is the window size used in the reconstruction stage. The number of principal components determines the percentage of variance explained by the components. Since PCA plays a role in not only dimension reduction but also denoising, the components that capture more variance do not necessarily yield better denoising performance. Similarly, an appropriate choice of the window length would ensure the denoising performance while preventing weak peaks from being smoothed out.
We systematically analysed the denoising performance in terms of rRMSE by varying the two parameters one at a time. The number of PCs ranging from 3 to 9 with an increment of 1 was utilized to test our method. Since the set of noisy spectra in the numerical calibration dataset contains 10 spectra, including five basic components each with a different Gaussian bandwidth and two variants for every basic component, the maximum number of PCs obtained from the PCA of the calibration dataset is 9. The smallest number of PCs was set to be 3 because the first three components account for more than 80% of the total variance. Taking the spectra of the phantom sample reconstructed using GN based calibration dataset as an example, the upper panel of Fig. 6 demonstrates the effect of PCs number on the denoising performance for each SNR case, where the error bar indicates the standard deviation of rRMSE when various window size is used in the reconstruction. It can be seen that performance of 3 PCs is not as good as that of other number of PCs, which may be due to the loss of useful spectral information in the principal component analysis. In general, 5-9 PCs yield equally good denoising performance regardless of the SNR of Raman spectra, so that the number of PCs can be selected freely in this range for denoising Raman spectra with unknown noise level. The same trend can be observed from the data for human fingernail and leukemia cells (results now shown for conciseness).
For selected number of PCs, a series of window length varied from 1 to 20 were used in the reconstruction stage. The lower panel of Fig. 6 demonstrates the effect of window length on the denoising performance for each SNR case, where the error bar indicates the standard deviation of rRMSE when various number of PCs are used. It can be seen that a window length of 5-10 is more suitable in high SNR cases, e.g. P1 and P2, while the denoising performance is less dependent on the selection of window length for low SNR cases generally. Overall, a window length within 5-16 can provide satisfied results for all SNR (P1 -P4) without significant difference. This rule of thumb is also applicable to human fingernail and leukemia cells.
One implication regarding the selection of two parameters is that a user does not have to make a perfect choice on the parameters of the NCD-WE method to obtain average denoising performance for a noisy spectrum with unknown SNR, which establishes a great advantage of our method over existing approaches, such as SG.

Comparison between GN and PN model
Many studies have been carried out on noise modelling in Raman spectral data. While Gaussian noise model has been widely used [1,[24][25][26], Sompel D Van de et al. reported in an earlier study [16] that noise in Raman spectra follows closer to Poisson distribution than to Gaussian distribution. They developed a new version of 'hybrid reference spectrum and principle component analysis' algorithm by incorporating a Poisson noise model. Results indicated that both noise models yield comparable quantification performance for measured data, nevertheless, the Poisson noise model outperformed the Gaussian noise model consistently for the simulated data. For comparison, we take into account both the Gaussian noise and Poisson noise models when creating the calibration dataset in this study. In Table 3, no significant difference was observed between the two noise models in terms of either the best or the average rRMSE values, except that the Poisson model often slightly outperformed in higher SNR cases.

Comparison between NCD-WE and UCD-WE methods
In addition to modelling a Raman spectrum as the summation of many single-peak spectra as shown in this study, it is also feasible to model it as the linear summation of multiple component spectra assuming that the underlying biochemical components are known, and their Raman spectra have been experimentally measured a priori. These component spectra constitute a universal calibration dataset [15] for Wiener estimation (UCD-WE). We applied the UCD-WE method to denoising and compared the performance of denoising the Raman spectra of leukemia cells between the proposed NCD-WE method and the UCD-WE method (data not shown due to limited space). It was found that the NCD-WE method yields superior performance compared to the UCD-WE method, but the reconstruction of the narrow peak at 1000 cm −1 is more accurate using the UCD-WE method due to the involvement of the experimental measurement of albumin and actin in the calibration dataset. Furthermore, the denoising performance of the UCD-WE method is highly dependent on the complete knowledge of the constituent components, which determines its applicability to different types of samples.

Comparison of the NCD-WE method with two common denoising methods
Comparison is carried out among NCD-WE, SG and MA methods from the perspectives of denoising performance and spectral shape recovery. The best denoised spectra (according to rRMSE) reconstructed from the measurements of the phantom sample and leukemia cells using NCD-WE, SG and MA methods are displayed in Fig. 7 and Fig. 8, respectively, where (A) -(D) correspond to P1 -P4 and L1 -L4. In each graph, the black line shows the noisy Raman measurement without any treatment; the red bold line is the reference spectrum; the magenta line and blue line represent NCD-WE with the GN model and PN model-based calibration datasets, respectively; the orange line and green line are denoised spectra reconstructed using SG and MA methods, respectively. It should be mentioned that the fluorescence background of the reference spectrum and denoised spectra has been removed using a wavelet algorithm [20] for easy comparison. It can be observed that in the high SNR cases, all methods yield good results and the performance degrades as the SNR of the original spectrum decreases. In low SNR cases (P3, P4 and L3, L4), NCD-WE and SG are still capable of recovering the main peaks of the original spectrum accurately, although noises are not completely removed in some relatively flat regions. In contrast, MA works very poorly in these cases; and some spectral regions with large noises are mistakenly retained as peaks, more importantly, there are artificial shifts in the main peaks between the recovered spectrum and the reference spectrum, despite the fact that MA shows advantage over the other two methods in recovering very narrow peak, e.g. the peak of leukemia cells spectra at 1000 cm −1 , in high SNR cases (L1 and L2).  7. Comparison of the best denoised spectra reconstructed from measurements of the phantom sample using NCD-WE, SG and MA methods. (A) -(D) correspond to P1 -P4, respectively. The test spectrum refers to the noisy spectrum with various SNR without fluorescence background removal. The reference spectrum refers to the clean spectrum with fluorescence background removed, which is meant to be compared to the denoised result using each method. Fig. 8. Comparison of the best denoised spectra reconstructed from measurements of leukemia cells using NCD-WE, SG and MA methods. (A) -(D) correspond to L1 -L4, respectively. The test spectrum refers to the noisy spectrum with various SNR without fluorescence background removal. The reference spectrum refers to the clean spectrum with fluorescence background removed, which is meant to be compared to the denoised result using each method.
The computational time values for denoising a single spectrum using the NCD-WE, SG and MA methods are 0.82s, 2.8×10 −3 s and 7×10 −4 s, respectively. The computational efficiency of the NCD-WE method could be improved by performing the reconstruction using the Wiener matrix (after spectral shifting) with an adaptive shifting increment according to the bandwidth of Raman peaks.
In the future, we believe it is possible to improve the proposed algorithm performance using a multi-scale approach, which is similar to wavelet reconstruction. In this approach, multiple calibration datasets will be created each with a different peak width in the initial single-peak spectrum. The reconstruction will be performed in multiple steps and the calibration datasets associated with progressively smaller peak widths will be sequentially used in these steps to reconstruct the details of the target spectrum in a series of spectral scales. This approach is expected to yield more accurate details at the cost of longer computation.

Conclusion
In this study, we proposed a method based on the principle of spectral integration followed by Wiener estimation using a numerical calibration dataset, which eliminates the need of experimental measurements for calibration as required in the previous Wiener estimation based denoising method. This method was tested on three types of samples, including a phantom sample, human fingernail and leukemia cells. Compared to two commonly used denoising methods, i.e. moving-average filtering and Savitzky-Golay filtering, the performance of the proposed method is significantly less sensitive to the choices of parameters. Moreover, this method provides comparable or even better denoising performance in the cases with low signal-to-noise ratios.