Introduction

Wood products require accurate identification of structural performance and reliability, especially for construction material. The modulus of rupture (MOR) and modulus of elasticity (MOE) are important mechanical properties. Traditional wood-bending mechanical measurements are accurate and reliable but are destructive, time-consuming, and contribute to material waste. Near-infrared (NIR) spectroscopy, coupled with multivariate data analysis, has been shown to be a promising technique for measuring the physical, chemical and mechanical properties of wood (Via et al. 2003). NIR spectroscopy measures the absorbance of near-infrared light (780–2500 nm), and the spectra are mainly composed of absorptions based on stretching vibrations of hydrogen-containing groups (C–H, O–H and N–H) (Schimleck et al. 2001). Research using NIR spectroscopy has identified wood properties such as density, moisture content, lignin content, MOR, and MOE (Downes et al. 2014; Fujimoto et al. 2015; Tsuchikawa and Kobori 2015). For MOR or MOE determination, Kelley et al. (2004) used NIR spectroscopy on six softwoods. It was also used to predict mechanical properties of species such as Pinus taeda L. (Schimleck et al. 2005) and Chinese fir (Cunninghamia lanceolata (Lamb.) Hook. (Yu et al. 2009). Horvath et al. (2011) used transmittance NIR spectroscopy to predict the MOE of one- and two-year-old transgenic and wild-type aspen (Populus tremuloides Michx.). Kothiyal and Raturi (2011) reported the strong relationship between laboratory measurements and NIR predictions for five-year-old Eucalyptus tereticornis. Schimleck et al. (2011) illustrated MOR and MOE using NIR spectra collected from the transverse surface of pernambuco (Caesalpinia echinata). Xu et al. (2011) developed calibration models to predict MOE of oven-dried and green wood samples of balsam fir (Abies balsamea) and black spruce (Picea mariana). Todorović et al. (2015) evaluated whether thermally modified beech wood (Fagus moesiaca (K. Maly) C.) could enhance the accuracy of prediction models. The NIR spectra are usually obtained from radial, tangential and transverse surfaces of solid wood samples. Zhao et al. (2012) investigated the effects of average NIR spectra from radial and tangential surfaces. Unfortunately, recent research shows that the coefficient of determination (R2) and the ratio of performance to deviation (RPD) of mechanical properties are lower than other physical and chemical properties. The reason is that the partial least squares (PLS) regression model is only well adapted to linear relationships between NIR spectra and properties. In a non-linear relationship such as MOE, the accuracy will be reduced.

In order to improve the accuracy of MOE determination, different data pre-processing methods were evaluated such as, multiplicative scatter correction (MSC), first derivative (1stDer), second derivative (2ndDer), Savitzky–Golay smooth (S–G) (Andrade et al. 2010; Bächle et al. 2010). In the discipline of mathematics many improved PLS models have been reported, including moving window PLS (MWPLS), and synergy interval PLS (SiPLS) (Yang et al. 2015; Deng et al. 2016). These models commonly use partial spectra bands to select linear data. Saul and Roweis (2003) implemented a non-linear method to reduce data dimensionality and as a tool to simplify and accelerate machine learning in high dimensional spaces. Roweis and Saul (2012) used one of the manifold non-linear methods, locally linear embedding (LLE), to use the lower dimensionality within the Euclidean distance, but errors might be larger while certain neighbor points map into a lower dimensionality space. For this problem, Nguyen et al. (2015) showed that using correlation to seek neighbor points is better than using Euclidean distance. While somewhat effective, the disadvantage of these methods is that information related to mechanical properties may be missed when only relevant spectra are selected.

This study uses NIR spectroscopy to predict the MOR and MOE of Mongolian oak wood. A correlation linear embedding PLS (CLE–PLS) model was implemented to conform to the non-linear relationship between NIR spectra and MOR and MOE, which could be applied as a nondestructive identification of wood properties.

Materials and methods

Materials

Mongolian oak (Quercus mongolica) was used to build the near-infrared prediction model. It is a deciduous species distributed in the northeast of China and is commonly used as structural material. Ten trees were randomly chosen from the ‘Chong He’ forest farm located in Wuchang, Heilongjiang province. The average diameter at breast height was 16 cm and the average height was 13 m. All the logs were taken above 1.3 m from the ground and cut into 1-m-long or 2-m-long pieces. After air-drying, each log (40 logs in total) was cut into bending mechanical samples (no pith) of 300 mm × 20 mm × 20 mm. Samples were numbered 1–4, with 1 and 4 taken from the sapwood and 2 and 3 from the heartwood (Fig. 1).

Fig. 1
figure 1

Process of sample production

After removing samples with visible defects and deformations, 135 bending mechanical samples remained. These were numbered from 1 to 135; sapwood and heartwood samples had different serial numbers. All samples were conditioned to 12% moisture content prior to testing.

Property tests

Mechanical test

Samples were measured in four-point bending according to Chinese national standards of the Method for Determination of the MOR in Static Bending of Wood (GB1936.1-2009) and the Method for Determination of the MOE in Static Bending of Wood (GB1936.2-2009).

NIR spectroscopy test

NIR spectra were collected with a one-chip spectrometer produced by INSION Co., Germany. This spectrometer is equipped with an optical fiber probe covering wavelengths from 900 to 1700 nm, with a 7.0 nm resolution. The NIR spectra were acquired by an optical fiber probe 5 mm in diameter (Fig. 2).

Fig. 2
figure 2

NIR measurement process

SPEC View 7.1 collected and recorded the spectra. For each scanning point, 30 scans were collected and averaged into a single spectrum. For each sample (300 mm × 20 mm × 20 mm), eight longitudinal spectra on the radial surface and eight longitudinal spectra on the tangential surface of the bending test samples were averaged to one spectrum. The sampling points were scanned in equal spacing 30 mm on both of the surfaces (Fig. 3). In addition, the diameter of the optical fiber probe was 5 mm.

Fig. 3
figure 3

Sampling points for each sample

NIR spectra pre-processing methods

Matlab 2014a software is a commonly used pre-processing method. The scattering effect of diffuse reflectance measurements can be reduced by multiplicative scatter correction (MSC), and the ground area of baseline shift can be modified by first or second derivatives, and high frequency noise can be smoothed by Savitzky–Golay (S–G).

NIR CLE–PLS calibration models development

CLE–PLS calibrations were developed by PLS analysis. The coefficient of determination \(R_{c}^{2}\) is for the calibration set, and \(R_{p}^{2}\) is for the validation set. The standard error of calibration (SEC), the standard error of prediction (SEP) and the ratio of performance to deviation (RPD) were used to evaluate the performance of the calibration model. The RPD is the ratio of the standard deviation (SD) to the original data and the SEP.

Correlation local embedding is a non-liner dimension reduce method. First, neighbor points are found for each spectra by correlation. Second, a local reconstruction weight matrix is defined and the mapping invariance is used to calculate the lower dimension data by the local reconstruction weight matrix and the nearest neighbor points. Finally, lower dimension data are used and measured to establish the PLS calibration model (Fig. 4).

Fig. 4
figure 4

The block diagram of CLE–PLS

The detailed algorithm development process is:

Step 1 Input spectra data of the calibration set, \(X = \left\{ {x_{i} |i = 1,2, \ldots ,n} \right\}\) and n is the number of samples, \(X \in R^{N}\).

Step 2 Set a value to k as the number of neighbor points.

Step 3 Calculate the correlation coefficient of two random spectra by Eq. (1):

$$\rho \left( {x_{i} ,x_{j} } \right) = \frac{{\left( {x_{i} - \bar{x}} \right)^{T} \cdot \left( {x_{j} - \bar{x}} \right)}}{{\sqrt {\left( {x_{i} - \bar{x}} \right)^{T} \left( {x_{i} - \bar{x}} \right) \cdot } \sqrt {\left( {x_{j} - \bar{x}} \right)^{T} \left( {x_{i} - \bar{x}} \right)} }}$$
(1)

where, \(\bar{x}\) is the average spectra of calibration, \(j = 1,2, \ldots ,n\).

Step 4 Define a local reconstruction weight matrix: G.

First, an error function is designed as Eq. (2):

$${ \hbox{min} }{\kern 1pt} {\kern 1pt} {\kern 1pt} J (G )= \sum\limits_{i = 1}^{n} {\left| {x_{i} - \sum\limits_{j = 1}^{k} {g_{ij} x_{ij} } } \right|}^{2}$$
(2)

where, \(x_{ij}\) is the neighbor points of \(x_{i}\), and \(g_{ij}\) is weight. Constraint condition:

$$\sum\limits_{j = 1}^{k} {g_{ij} } = 1$$
(3)

Second, a covariance matrix \(H^{i}\) is designed as Eq. (4):

$$H_{{}}^{i} = (x_{i} - x_{ij} )^{\text{T}} \cdot (x_{i} - x_{im} )$$
(4)

where \(x_{ij}\) and \(x_{im}\) are both neighbor points of \(x_{i}\)

Finally, use a Lagrange multiplier algorithm to combine Eqs. (3) and (4). Calculate the \(g_{ij}\) (Eq. 5) of G.

$$g_{ij} = \frac{{\sum\nolimits_{m = 1}^{k} { (H^{i} )^{ - 1} } }}{{\sum\nolimits_{s = 1}^{k} {\sum\nolimits_{t = 1}^{k} { (H_{st}^{i} )^{ - 1} } } }}$$
(5)

Step 5 Preserve the mapping invariance to calculate the lower dimension data by neighbor points and G. Assuming C is d dimensional data (d ≪ N), and the constraint of mapping is provided in Eq. (6):

$${ \hbox{min} }{\kern 1pt} {\kern 1pt} J (C )= \sum\limits_{i = 1}^{n} {\left| {c_{i} - \sum\limits_{j = 1}^{k} {g_{ij} c_{ij} } } \right|}^{2}$$
(6)

where, \(J (C )\) is the objective function, \(c_{i}\) is the output vector of \(x_{i}\), and \(c_{ij}\) is neighbor points of \(c_{i}\) in lower dimensional space. Equation (6) can be simplified as Eq. (7):

$${ \hbox{min} }{\kern 1pt} {\kern 1pt} J (C )= \sum\limits_{i = 1}^{n} {\left| {CI - CG_{i} } \right|^{2} } = \left| {C (I - G )} \right|^{2}$$
(7)

Constraint condition:

$$\sum\limits_{i = 1}^{n} {c_{i} } = 0$$
(8)
$$\frac{1}{n}\sum\limits_{i = 1}^{n} {c_{i} c_{i}^{\text{T}} } = I$$
(9)

where, \(I\) is \(d \times d\) dimensional unit matrix.

Therefore, Eq. (7) can be simplified continually. It is shown in Eq. (10):

$${ \hbox{min} }{\kern 1pt} {\kern 1pt} J (C )= \left| {C (I - G )} \right|^{2} = (I - G )^{\text{T}} (I - G )C^{\text{T}} C$$
(10)

Step 6 Output the lower dimensional data C. C is the vectors of d minimum non-zero eigenvalues calculated by \((I - G )^{\text{T}} (I - G )\). C is \(n \times d\) (d ≪ N) dimensional matrix.

Step 7 Use C and the measured value to establish the PLS regression model.

Results and discussion

Pre-processing and PLS model

Spectra data were collected and averaged from the radial and tangential surfaces of 135 samples (Fig. 5). The original spectra are shown in Fig. 5a. The spectra were pre-rpocessed with different preprocessing methods such as MSC, Fig. 5b, first derivative (1stDer), Fig. 5c, second derivative (2ndDer), Fig. 5d, and S–G smooth, Fig. 5e. Among these methods, 1stDer could eliminate the baseline drift and reduce background interference, and S–G smooth could suppress high frequency noise. In order to achieve the best prediction of mechanical properties, these methods were combined in different ways and the spectra data after preprocessing are shown in Fig. 5f. The number of windows for S–G can be set as 7, 9, 11, and 13, in which we found that 9 was the best. The weight coefficient was [− 21, 14, 39, 54, 59, 54, 39, 14, − 21].

Fig. 5
figure 5

Original spectrum and pre-processed spectrum

Using these pre-processing methods, the spectra are consistent by MSC, the background of gentle area can be expressed more effectively by 1stDer and 2ndDer, and the useless noise or irrelevant absorption peak can be eliminated when combined with S–G smooth (Table 1). In the PLS model, the same number of factors was set at 4.

Table 1 Results of PLS Model by different pre-processing methods

The performance of the PLS calibration model corresponds to different pre-processing methods in a different manner. The most frequently used method is 2ndDer of the spectra. We found that using 2ndDer is better than 1stDer, a result that is consistent with the research on Eucalyptus pellita. For the spectra of Mongolian oak, 1stDer + S–G is the best of the above methods as shown by the higher \(R_{c}^{2}\) and \(R_{p}^{2}\), as well as the lower SEC and SEP (Table 1).

Although we used pre-processing method (1stDer + S–G) here, the \(R_{c}^{2}\) and \(R_{p}^{2}\) are only 0.70 and 0.62 for MOR, and the \(R_{c}^{2}\) and \(R_{p}^{2}\) are only 0.67 and 0.60 for MOE. For MOR and MOE, RPD are both 1.5 < RPD < 2, indicating that the model cannot provide an adequate prediction. Compared with other studies of wood samples with NIR spectra, the coefficient of determination (Table 1) is a slightly lower because of the linearity of the PLS model. Higher correlation could be found for MOE using various species of wood. For example, in Schimleck’s paper, \(R_{p}^{2}\) values are 0.75 (Rp is 0.87 of MOR, factors = 5), 0.70 (Rp is 0.84 of MOE, factors = 5) for six softwoods, and Zhao et al. obtained results of \(R_{p}^{2}\) = 0.77 (Rp is 0.88) for MOR, \(R_{p}^{2}\) = 0.79 (Rp is 0.89) for MOE by using both radial and tangential surface. The \(R_{p}^{2}\) in these studies are still lower than 0.80, and it could be even lower if considering other wood properties such as density, moisture, and some chemical properties.

CLE–PLS model

The pre-processed spectra by 1stDer + S–G were regarded as the input data of the calibration model. As two significant parameters of CLE–PLS, the lowest dimension d and the number of neighbor points k need to be selected. The effects for SEC of different values are shown in Fig. 6.

Fig. 6
figure 6

Effects of different d and k for calibrations

The minimum SEC (standard error of calibration) has been determined and d and k were selected when SEC reach a minimum. For MOR, SEC (9.96) is minimum when d = 23 and k = 18. For MOE, SEC (0.81) is minimum when d = 25 and k = 17. The parameter values were calculated through 90 random samples and were used to establish the CLE–PLS model. The number of factors of the PLS model is 4. The results are shown in Table 2 and the correlation between predicted and measured values of the CLE–PLS validation model is shown in Fig. 7.

Table 2 CLE–PLS model results
Fig. 7
figure 7

CLE–PLS validation model. a The correlation of MOR between predicted and measured values; b The correlation of MOE between predicted and measured values

In order to verify the robustness of the CLE–PLS model, cross-validation was applied to divide calibration and validation sets. The calibration sets, including 90 samples and the validation sets, including 45 samples, were both randomly generated 20 times. The comparisons of PLS and CLE–PLS models is shown in Table 3.

Table 3 Comparisons of PLS and CLE–PLS models

Compared with the PLS (partial least square) model, CLE–PLS performed better for MOR and MOE as it had the higher \(R_{p}^{2}\) and RPD, as well as the lower SEP (Fig. 7). Using the CLE–PLS model, the values of \(R_{p}^{2}\) (0.80 for MOR and 0.79 for MOE) were both more than 0.75, and the values RPD (2.23 for MOR and 2.15 for MOE) were both more than 2.0. This suggests that the CLE–PLS model could be used for preliminary screening (2 < RPD < 3) of Mongolia oak wood products. In the prediction of Mongolian oak mechanical bending properties, the results show that the CLE–PLS model is more effective.

Conclusions

Near-infrared (NIR) spectra, ranging from 900 to 1700 nm predicted the modulus of rupture (MOR) and modulus of elasticity (MOE) of Mongolian oak. The truth values of MOR and MOE were determined by the four-point bending method, and the spectra were collected from radial and tangential surfaces of samples. Compared with five pre-processing methods, 1stDer + S–G is better. Meanwhile, correlation-local embedding (CLE) dimensional methods were used to improve the performance of the partial least squares model (PLS). CLE–PLS [improved 0.18 determination coefficient] for both MOR and MOE in contrast with the PLS model. In addition, the ratio of performance to deviation (RPD) values for MOR and MOE were both more than 2.0 ([separately improved by 0.61 and 0.57] respectively) by using the CLE–PLS model. The CLE algorithm is effective in non-linear dimensionality, and the CLE–PLS model can be used for preliminary screening of Mongolian oak, (without visible defects and deformations), to predict MOR and MOE. Our results also indicate that non-linear dimensional methods to develop NIR spectra prediction models may be more applicable.