PLS Subspace-Based Calibration Transfer for Near-Infrared Spectroscopy Quantitative Analysis

In order to enable the calibration model to be effectively transferred among multiple instruments and correct the differences between the spectra measured by different instruments, a new feature transfer model based on partial least squares regression (PLS) subspace (PLSCT) is proposed in this paper. Firstly, the PLS model of the master instrument is built, meanwhile a PLS subspace is constructed by the feature vectors. Then the master spectra and the slave spectra are projected into the PLS subspace, and the features of the spectra are also extracted at the same time. In the subspace, the pseudo predicted feature of the slave spectra is transferred by the ordinary least squares method so that it matches the predicted feature of the master spectra. Finally, a feature transfer relationship model is constructed through the feature transfer of the PLS subspace. This PLS-based subspace transfer provides an efficient method for performing calibration transfer with only a small number of standard samples. The performance of the PLSCT was compared and assessed with slope and bias correction (SBC), piecewise direct standardization (PDS), calibration transfer method based on canonical correlation analysis (CCACT), generalized least squares (GLSW), multiplicative signal correction (MSC) methods in three real datasets, statistically tested by the Wilcoxon signed rank test. The obtained experimental results indicate that PLSCT method based on the PLS subspace is more stable and can acquire more accurate prediction results.


Introduction
In the past few decades, near-infrared spectroscopy (NIR) has been widely used in various fields, because of its fast speed and the fact that it does not cause damage to sample characteristics. These areas include pharmaceutical [1][2][3], biomedical [4], petrochemical [5], agricultural [6,7], food [8][9][10]. In the NIR analysis, the most frequently used multivariate calibration techniques are partial least squares regression (PLS) [11,12] and principal component regression (PCR) [13,14]. However, the established calibration model is often outdated or unsuitable for new samples due to factors of the diversity of measuring instruments and measuring environments, as well as the variability of the materials being measured. New samples refer to any samples not included in the calibration model, such as those samples collected at different times or with different instruments. Frequent calibration is not desirable because a large amount of time and resources are devoted to establishing calibration models. One advisable option would be to carry out the calibration transfer.
Numerous relevant calibration transfer methods have been proposed in articles. In general, these methods can be divided into two types: transfer standard and non-standard. The transfer standard requires the same standard samples to be measured on the master instrument and the slave instrument. In this type of method, according to the stages in which the adjustment occurs are further divided into four types.
The first type is the method of correcting the slave spectra. In the standard samples, the slave spectra are made as close as possible to the corresponding master spectra by a transfer matrix. The most widely used are direct standardization (DS) and piecewise direct standardization (PDS) methods [15,16]. In the PDS method, the transfer relationship between the master spectra and the slave spectra from the sliding window is established at each wavelength of the master spectra, and finally a band-shaped transfer matrix is formed for correcting the slave spectra.
The second type is the method of simultaneously correcting the master spectra and the slave spectra. Commonly used is calibration transfer by the generalized least squares (GLSW) method [17,18]. GLSW uses the difference between the standard set of the master instrument and the slave instrument to build the weight matrix, and then uses the weight matrix to reduce the weight of spectral feature to be suppressed. A detailed description of the weight matrix is provided in [17] and [18].
The third type is the method of correcting the predicted values. Mainly the slope and bias correction (SBC) method [19], this method considers that there is a linear relationship between the predicted values of the slave spectra obtained by the master spectral model and the response variable, usually using ordinary least squares method to calculate this relationship. The predicted values are then corrected using this relationship.
The fourth type is the projection method. For example, calibration transfer method based on canonical correlation analysis (CCACT) [20], which uses CCA to find the set of canonical variables that are maximally correlated between the standard set of the master instrument and the slave instruments. Further explore the transfer relationship between the two canonical variables.
In practical applications, it is difficult or even impossible to measure the same samples on two instruments due to the position of the measuring instrument and the stability of the samples, etc. At this time, it is necessary to use a method that does not require measurement of the same standard samples, that is, a non-standard method. These methods are mainly divided into two types.
One is the signal preprocessing method, which removes the baseline offset and the linearly sloped baselines by simple mathematical operations of the first derivative and the second derivative. Common methods include multiplicative signal correction (MSC) [21], finite impulse response (FIR) filtering [22], generalized moving window MSC (W-MSC) [21], OSC [23,24], etc., wherein FIR and MW-MSC are variants of MSC. However, it must be noted that these simple preprocessing methods do not handle complex changes between the master spectra and the slave spectra.
The other is the projection method. It includes transfer component analysis (TCA) [25] and kernel principal component analysis (KPCA) [26]. TCA projects the master spectra and the slave spectra into a common feature space in which the distribution of the master spectra and the slave spectra are as similar as possible while retaining the key properties of the spectra. TCA and KPCA use different kernels, so they can cope with nonlinear and more complex changes in the spectra.
In this paper, a novel projection method is proposed, which is a feature transfer model based on PLS subspace (PLSCT). PLSCT establishes the PLS model of the calibration set of the master instrument firstly, constructing a low-dimensional PLS subspace, which is a feature space constructed by the spectral feature vectors. The PLS model is then used to extract the predicted features of the master spectra and the pseudo predicted features of the slave spectra, that is, to project all spectra of the master instrument and slave instrument into this PLS subspace. Then, the ordinary least squares method is used to explore the relationship between the two features in the identical PLS subspace, the relationship will then be resorted to construct a feature transfer relationship model.
Notice that the pseudo predicted feature of the slave spectra is acquired by the PLS model established by the master instrument rather than the PLS model of the slave instrument. And PLSCT does not need the response variable corresponding to the standard set. In addition, compared with PDS, PLSCT corrects the feature of the spectra rather than the spectra. In contrast to CCACT, PLSCT uses PLS to find the covariance between the spectra and the response variable, instead of using CCA to find the correlation between the master spectra and the slave spectra.
In order to validate the performance of the PLSCT model, we not only compare its prediction results against those of the SBC, PDS, CCACT, GLSW, and MSC methods, but also apply the Wilcoxon signed rank test [27] to determine whether PLSCT is statistically significantly superior to other models. The experiment was conducted in three real near-infrared datasets. By analyzing all the experimental results, we conclude that the PLSCT can significantly reduce the prediction error.

The Analysis of the Corn Dataset
First of all, Table 1 lists the latent variables (LVs) and the root mean square error of prediction (RMSEP) of Calibration, Direct transfer and Recalibration. The RMSEP was 0.010156 when using the calibration model of the master instrument to predict the spectra of the test set measured on the master instrument. However, when directly using the calibration model of the master instrument to predict the spectra of the test set measured on the slave instrument, the RMSEP was 1.41931, which indicates that if the model of the master instrument is directly applied to the slave instrument, a large prediction error will be generated. The number of the factors for constructing the pseudo predicted feature matrix from the standard set of the slave spectra ( T s_m std ) and the predicted feature matrix from the standard set of the master spectra (T m std ), which is a key parameter in the PLSCT model, was determined by leave-one-out cross-validation. Figure 1A,B illustrates the effects of selecting the number of factors used to build T s_m std andT m std on the cross-validation error when the number of the samples in the standard set is set to 25 and 30. From the results in Figure 1A,B, inferring that when the number of the samples in the standard set is set to 25 and 30, the number of factors should be set to 3. At this time, the root mean square error of cross-validation (RMSECV) reached the minimum and PLSCT achieves the best performance. For comparison, the differences between the feature before and after transfer in the PLS subspace, the relationship between the first pseudo predicted feature of the slave instrument and the first predicted feature of the master instrument is displayed in Figure 3. In these two plots, the blue dots represent the feature before transfer, and the red dots represent the feature after transfer. The closer the dots are to a straight line, the smaller the differences between the pseudo predicted feature of the slave instrument and the predicted feature of the master instrument. Figure 3A,B depicts the differences between features in the standard set and the test set, respectively. Obviously, after transfer, the differences between the first pseudo predicted feature of the slave instrument and the first predicted feature of the master instrument was significantly reduced, not only in the standard set, but also in the test set.
In order to evaluate the effect of the number of the samples in the standard set on different calibration methods, 5, 10, 15, 20, 25, and 30, standard samples were considered in the experiment. As can be seen from Table A1 in the appendix, the RMSEP of MSC was relatively large, and the predictability of CCACT and GLSW were better than that of PDS, SBC and MSC. From 5 samples to 30 samples, the RMSEP of PLSCT was smaller than the RMSEP of PDS, SBC, CCACT, GLSW and MSC. Moreover, the RMSEP of PLSCT had been gradually stabilized when the number of the samples in the standard set from 20 to 30. So, we conclude that PLSCT had significantly better predictive performance than other models.
To further compare PLSCT with other models, the RMSEP improvement and p-value by Wilcoxon signed rank test are listed in Table A2   In addition, the measured values of the moisture content of the corn dataset obtained from different models are compared with the predicted values when the number of the samples in the standard set is set to 30 are shown in Figure 2. In this case, the slope of the line was equal to 1. A point on the line indicates that the predicted value was equal to the measured value. As shown in Figure 2, PLSCT exhibited the smallest differences between the measured values and predicted values. This is attributed to the implementation of the feature transfer in the PLS subspace. The detailed description is shown in Figure 3.   For comparison, the differences between the feature before and after transfer in the PLS subspace, the relationship between the first pseudo predicted feature of the slave instrument and the first predicted feature of the master instrument is displayed in Figure 3. In these two plots, the blue dots represent the feature before transfer, and the red dots represent the feature after transfer. The closer the dots are to a straight line, the smaller the differences between the pseudo predicted feature of the slave instrument and the predicted feature of the master instrument. Figure 3A,B depicts the differences between features in the standard set and the test set, respectively. Obviously, after transfer, the differences between the first pseudo predicted feature of the slave instrument and the first predicted feature of the master instrument was significantly reduced, not only in the standard set, but also in the test set.
by (A) piecewise direct standardization with a window size of 3 (PDS(3)), (B) piecewise direct standardization with a window size of 5 (PDS(5)), (C) piecewise direct standardization with a window size of 7 (PDS (7)), (D) slope and bias correction (SBC), (E) calibration transfer method based on canonical correlation analysis (CCACT), (F) generalized least squares (GLSW), (G) multiplicative signal correction (MSC), (H) Recalibration and (I) partial least squares regression subspace based calibration transfer (PLSCT). In order to evaluate the effect of the number of the samples in the standard set on different calibration methods, 5, 10, 15, 20, 25, and 30, standard samples were considered in the experiment. As can be seen from Table A1 in the Appendix A, the RMSEP of MSC was relatively large, and the predictability of CCACT and GLSW were better than that of PDS, SBC and MSC. From 5 samples to 30 samples, the RMSEP of PLSCT was smaller than the RMSEP of PDS, SBC, CCACT, GLSW and MSC. Moreover, the RMSEP of PLSCT had been gradually stabilized when the number of the samples in the standard set from 20 to 30. So, we conclude that PLSCT had significantly better predictive performance than other models.

The Analysis of the Wheat Dataset
In Table 1, we can note that when no calibration transfer method was used, the difference between the RMSEP of directly using Calibration and the RMSEP of Recalibration was much smaller than the difference in corn dataset, in part because the difference between the two instruments in wheat dataset was relatively small. Figure 4 displays the comparison of the measured values and the predicted values from different models. From these plots, it is worth noting that the differences between measured values and predicted values in PLSCT were only slightly larger than Recalibration and smaller than any other methods.
between the RMSEP of directly using Calibration and the RMSEP of Recalibration was much smaller than the difference in corn dataset, in part because the difference between the two instruments in wheat dataset was relatively small. Figure 4 displays the comparison of the measured values and the predicted values from different models. From these plots, it is worth noting that the differences between measured values and predicted values in PLSCT were only slightly larger than Recalibration and smaller than any other methods. Since the spectra difference between the master instrument and the slave instrument was small in the wheat dataset, the effect of feature transfer was not obvious in the PLS subspace from Figure 5. However, the difference between the first pseudo predicted feature after transfer and the first predicted feature is still slightly smaller. The number of samples of the standard set in Figure 5A was 30.
The performances of the different methods on wheat samples are also shown in appendix Table  A1. The Table A2 shows clearly that PLSCT has much lower prediction error than PDS, SBC, GLSW and MSC when the number of the samples in the standard set is 10, 25 and 30. When the number of the samples in the standard set was 30, the minimum RMSEP obtained by PLSCT was 0.6604. The Since the spectra difference between the master instrument and the slave instrument was small in the wheat dataset, the effect of feature transfer was not obvious in the PLS subspace from Figure 5. However, the difference between the first pseudo predicted feature after transfer and the first predicted feature is still slightly smaller. The number of samples of the standard set in Figure 5A was 30.
The performances of the different methods on wheat samples are also shown in Appendix A  Table A1. The Table A2 shows clearly that PLSCT has much lower prediction error than PDS, SBC, GLSW and MSC when the number of the samples in the standard set is 10, 25 and 30. When the number of the samples in the standard set was 30, the minimum RMSEP obtained by PLSCT was 0.6604. The RMSEP of Recalibration2 fluctuated greatly, probably because there were outliers in the standard set of the slave instrument. These outliers also affect the performance of the SBC as shown in Figure 4D.
of the slave instrument. These outliers also affect the performance of the SBC as shown in Figure 4D.

The Analysis of the Pharmaceutical Tablet Dataset
As in the previous cases, the LVs and RMSEP of Calibration, Direct transfer and Recalibration are shown in Table 1 Figure 6. The results show that PLSCT has achieved the best performance. Figure 7 displays the comparison of the first pseudo predicted feature of the slave instrument standard set and test set before and after transfer in the PLS subspace, where the number of samples of the standard set in Figure 7A was 30. From the two plots in Figure 7, the first pseudo predicted feature after transfer was significantly closer to the predicted feature of the master instrument, whether in the standard set or in the test set of the slave instrument.
From appendix Table A1, as the number of the samples in the standard set increases, the performance of PLSCT gradually got better. The RMSEP of PLSCT gradually became stable when the number of samples in the standard set was 25 and 30, which were outperformed than PDS, SBC, CCACT, GLSW and MSC significantly. From the results in Table A2, when the number of the samples in the standard set was greater than 20, the RMSEP of PLSCT was already less than that of Recalibration.

The Analysis of the Pharmaceutical Tablet Dataset
As in the previous cases, the LVs and RMSEP of Calibration, Direct transfer and Recalibration are shown in Table 1. The RMSEP of Calibration is 3.123115, the RMSEP of direct transfer is 4.514284, the RMSEP of Recalibration was 3.31598.
In the PLSCT model, the number of factors for constructing T s_m std andT m std was 4 when the number of the samples in the standard set was set to 25 and 30, as shown in Figure 1C,D. When the number of the samples in the standard set was set to 30, the comparison between the predicted values and measured values is shown in Figure 6. The results show that PLSCT has achieved the best performance. Figure 7 displays the comparison of the first pseudo predicted feature of the slave instrument standard set and test set before and after transfer in the PLS subspace, where the number of samples of the standard set in Figure 7A was 30. From the two plots in Figure 7, the first pseudo predicted feature after transfer was significantly closer to the predicted feature of the master instrument, whether in the standard set or in the test set of the slave instrument.
From Appendix A Table A1, as the number of the samples in the standard set increases, the performance of PLSCT gradually got better. The RMSEP of PLSCT gradually became stable when the number of samples in the standard set was 25 and 30, which were outperformed than PDS, SBC, CCACT, GLSW and MSC significantly. From the results in Table A2, when the number of the samples in the standard set was greater than 20, the RMSEP of PLSCT was already less than that of Recalibration.
Compared with other models, the RMSEP improvement of PLSCT over them can reach up 16.3743%, 15.12146%, 14.35178%, 40.04516%, 16.81376%, 41.83697%, 24.21448%, 23.82937% and 2.908651%, respectively. Furthermore, the differences between PLSCT and other models are all statistically significant at the 95% confidence level (shown in Appendix A Table A2). 2.908651%, respectively. Furthermore, the differences between PLSCT and other models are all statistically significant at the 95% confidence level (shown in appendix Table A2).   2.908651%, respectively. Furthermore, the differences between PLSCT and other models are all statistically significant at the 95% confidence level (shown in appendix Table A2).

Corn Dataset
The first dataset was corn dataset. We can conveniently access to obtain it at http://www. eigenvector.com/data/Corn/. The dataset is composed of 80 corn samples. Three near-infrared spectrometers were used to measure these samples, with wavelength range from 1100 nm to 2498 nm at 2 nm intervals (700 channels). The property of moisture, oil, protein and starch of corn is contained in the dataset. In this paper, the moisture content was chosen as the property of interest. We choose M5 as 'master instrument', MP5 as 'slave instrument'. The difference between the spectra measured on M5 instrument and MP6 instrument can be observed in Figure 8A.

Dataset Division
We adopt the Kennard and Stone algorithm [28] to split the dataset. Firstly, the entire samples were split into the calibration set and the test set. The test set accounted for 20% of the total samples, and the remaining 80% was used as the calibration set. The corn dataset was divided into 64 samples for calibration set and 16 samples for the test set. The wheat dataset was divided into 198 samples for calibration set and 50 samples for the test set. For the pharmaceutical tablets dataset, we first integrated the three parts that have been divided, and then divided it into 524 samples for calibration sets and 131 samples for test sets. The standard samples were selected from the calibration set via the Kennard and Stone algorithm.
It must be noted that the Kennard and Stone algorithm was applied to the master spectra when splitting the calibration set and test set, while the Kennard and Stone algorithm was applied to the slave spectra when extracting the standard samples.

Wheat Dataset
The second dataset was the wheat dataset, which consisted of 248 samples measured by three instruments of manufacturer A. This dataset was the shootout data of the International Diffuse Reflectance Conference (IDRC) in 2016. We can obtain it from http://www.idrc-chambersburg. org/content.aspx?page_id=22&club_id=409746&module_id=191116. The wavelength range of the manufacturer A was 730 nm-1100 nm and the interval was 0.5 nm. The dataset only provides the reference protein values. In this paper, we take the first instrument of manufacturer A as 'master instrument' and the second instrument as 'slave instrument'. Figure 8B shows the difference between the spectra measured on the A1 and A2 instruments.

Pharmaceutical Tablet Dataset
The third dataset came from the IDRC shootout 2002, which contains 655 pharmaceutical tablets measured on two spectrometers, with the range from 600 to 1898 nm, and the interval was 2 nm. We can obtain it from http://www.eigenvector.com/data/tablets/index.html. There are three reference values associated with this dataset, but we were only interested in weight content for each sample. The difference between the spectra in the pharmaceutical tablet dataset is shown in Figure 8C.

Dataset Division
We adopt the Kennard and Stone algorithm [28] to split the dataset. Firstly, the entire samples were split into the calibration set and the test set. The test set accounted for 20% of the total samples, and the remaining 80% was used as the calibration set. The corn dataset was divided into 64 samples for calibration set and 16 samples for the test set. The wheat dataset was divided into 198 samples for calibration set and 50 samples for the test set. For the pharmaceutical tablets dataset, we first integrated the three parts that have been divided, and then divided it into 524 samples for calibration sets and 131 samples for test sets. The standard samples were selected from the calibration set via the Kennard and Stone algorithm.
It must be noted that the Kennard and Stone algorithm was applied to the master spectra when splitting the calibration set and test set, while the Kennard and Stone algorithm was applied to the slave spectra when extracting the standard samples.

Determination of the Optimal Parameters
The number of latent variables used in the PLS model was selected by a 10-fold cross-validation. In order to avoid over-fitting caused by the inclusion of redundant latent variables, the optimal number of latent variables was achieved based on the statistical F-test [29] (α = 0.05).
The predicted feature from the standard set of slave instrument is a pseudo predicted feature T s_m std constructed by the PLS model of the master instrument. Compared with the predicted feature T s std constructed by the PLS model of the slave instrument, the T s_m std may contain some noise, which has a great influence on the solution of the transfer matrix ξ, further affecting the performance of the PLSCT model. In order to optimize the model, we used leave-one-out cross-validation to select the best number of factors in the standard set based on the minimum root mean square error of cross-validation (RMSECV) criterion. The response variable of the standard set used in cross-validation was the predicted value of the master instrument standard set obtained by the PLS model of the master instrument.
For the PDS method, its window sizes were set to 3, 5, and 7, respectively.

Model Performance Evaluation
In order to verify the prediction performance of different calibration models, we calculated the root mean square error of prediction (RMSEP). The calculation of RMSEP is as follows: where y i represents the measured value associated to the i-th test sample,ŷ i is its final predicted value, while n is the number of samples in the test set.
In order to compare the prediction performance difference between the proposed model and other models more directly, Equation (2) was used to calculate the RMSEP improvement of the PLSCT method compared with other methods: where RMSEP PLSCT represents the prediction error of the PLSCT method, RMSEP other represents the prediction error of other comparison methods. In addition, by comparing prediction error of the different models, the Wilcoxon signed rank test at the 95% confidence level was utilized to point out whether there was a significant difference between PLSCT and other methods. In python, we used the wilcoxon function in the scipy package to directly calculate the p-value between the two prediction errors. If p > 0.05, there is no significant difference between the two methods. Otherwise, there is significant difference.

Notation
In this paper, we define the spectral matrix as X, n × p represents the size of the matrix, n represents the number of samples, p represents the number of variables, and x i represents the spectral variables corresponding to the i-th sample of the matrix. The response variables are defined as y and the predicted values are defined asŷ. In order to distinguish the spectra collected on the two instruments, we added a superscript to the back of the matrix, such as defining the spectra from the master instrument as X m , defining the spectra from the slave instrument as X s , the predicted feature matrix of the master spectra obtained by the master instrument calibration model isT m , the pseudo predicted feature matrix of the slave spectra obtained by the master instrument calibration model is T s_m .
At the same time, a subscript was added to the back of the matrix to distinguish different data sets. For instance, X m cal , X m std , and X m test represent the calibration set, standard set and test set of the master instrument, respectively. X s cal , X s std , and X s test represent the calibration set, standard set and test set of the slave instrument, respectively.

Overview of PLS
PLS is a widely used multivariate calibration technique. PLS applies score vectors model the relationship between X and y. It projects X and y into a PLS subspace, a low-dimensional space defined by a small number of the score vectors. The mean-centered X and y are decomposed as follows: where T is the score matrix, P and q represent loadings matrix for X and y, respectively. E and F are the matrices of residuals corresponding to X and y. The matrix of regression coefficients is: where W is the weight matrix.
With the regression coefficient matrix β, we can have the predicted values:

Proposed PLSCT method
In the PLSCT, the PLS model was built on the calibration set of the master instrument to construct the PLS subspace, which is also the feature space constructed by the feature vectors of the spectra of the master instrument calibration set. The number of latent variables (LVs) in the PLS model is determined by cross-validation.
On the basis of this PLS model, the predicted feature matrix of standard set in the master instrument X m std can be calculated via it, that is, the spectra of the master instrument can be projected into the PLS subspace: Similarly, the pseudo predicted feature matrix of standard set in the slave instrument X s std can be calculated via this PLS model as well as X m std , in other words, the spectra of the slave instrument can be projected into this PLS subspace: The two predicted feature matrices obtained are derived from the same PLS model of the master instrument, that is to say, all spectra are projected into the identical PLS subspace constructed by the master instrument. In the identical PLS subspace, there should be a linear relationship between the two feature matrices. So T s_m std andT m std can be built as: The linear relationship between the two feature matrices can be solved through the ordinary least squares method, by the following equation: Once ξ is computed, for the test set from the slave instrument X s test , applying Equation (11) to calculate the predicted values corresponding to the spectra:

Conclusions
In this paper, an ingenious calibration transfer method based on PLS subspace is proposed. PLSCT uses the same PLS model to project the spectra into the identical PLS subspace. In the identical subspace, a feature transfer model is constructed by narrowing the differences between the predicted feature of master instrument and the pseudo predicted feature of the slave instrument via an ordinary least squares method. Additional, PLSCT does not need the response variable corresponding to the standard set. As expected, experimental results on three real datasets show that compared with PDS, SBC, CCACT, GLSW, and MSC, the PLSCT model is more stable and can obtain more accurate prediction results. The reason why the PLSCT model can achieve such remarkable results is that while the spectra of the slave instrument are projected into this subspace, some noise effects such as scattering that are unrelated to the response variable will be removed from the spectra, and then the feature transfer in the identical PLS subspace can more accurately narrow the differences between the predicted feature of master instrument and the pseudo predicted feature of slave instrument.