Near infrared spectroscopy (NIRS) data analysis for a rapid and simultaneous prediction of feed nutritive parameters

Presented paper described dataset on near infrared spectroscopy (NIRS) used as a rapid and robust method to predict and determine several nutritive parameters of animal feed simultaneously. Near spectra data were acquired and recorded in wavelength range from 1000 to 2500 nm with co-added of 64 scans per sample measurement. On the other hand, actual reference nutritive parameters: in vitro organic matter digestibility (IVOMD), in vitro dry matter digestibility (IVDMD), neutral detergent fibre (NDF) and acid detergent fibre (ADF) of animal feed were measured using proximate laboratory procedures. Near infrared datasets can be enhanced using several spectra correction methods to improve prediction accuracy and robustness. Animal feed nutritive parameters can be determined simultaneously and rapidly by establishing prediction models by means of principal component regression (PCR), partial least squares regression (PLSR) and other regression approaches.


Data
Near infrared spectral dataset of feed samples were acquired and recorded as absorbance spectrum in wavelength range from 1000 to 2500 nm (Fig. 1). Typically, the near infrared spectra data can be represented as a function of the energies (cm À1 ) or wavelengths (nm) of the electromagnetic radiation. Spectra data contains chemical properties and information that can be revealed through calibration by Specifications Table   Subject Agricultural and Biological Sciences Animal sciences Specific subject area Spectroscopy, non-destructive test for animal feed quality evaluation Type of data Table  Graph Spectroscopic data How data were acquired Spectral datasets of animal feed samples were acquired using a benchtop Fourier transform infrared spectroscopy (Thermo Nicolet Antaris II TM). Spectra data were recorded in form of absorbance spectrum in wavelength range from 1000 to 2500 nm with co-added of 32 scans with resolution windows of 0.2 nm. On the other hand, to obtain actual reference data of neutral detergent fibre (NDF) and acid detergent fibre (ADF), standard laboratory procedures as proposed by Ref. [1] is employed. Feed samples were boiled in neutral and acid detergent solutions for an hour in sequential. NDF and ADF data were expressed in percent. Meanwhile, in vitro organic matter digestibility (IVOMD) and in vitro dry matter digestibility (IVDMD) were determined by subtracting residues of dry matter and organic matter prior to fermentation, respectively. The in vitro incubation was carried out in three runs and two serum bottles represented for each run. Value of the Data Spectral dataset of animal feed samples can be used to predict nutritive parameters derived from calibration models. It provides a rapid, non-destructive and simultaneous approach to determine nutritive attributes of biological objects like animal feed in this case. Data were benefited in animal feed industries for quality inspection of their feed products. This dataset can also be re-used to develop prediction models for other nutritive parameters like starch, protein, pH and others. Spectral dataset can be enhanced using several data pre-processing approaches and transferred onto established NIRS instrument. Prediction performances may vary, depends on spectra enhancement and regression approaches to be applied during calibration and prediction models development.
means of regression approaches. Beside relevant and important information, spectra data may also contain irrelevant information background known as noises due to light scattering [2]. These noises can interfere prediction accuracy and robustness resulted during calibration. Thus, in order to eliminate or minimize noises, spectra data can be corrected and enhanced using several pre-processing techniques such as spectra smoothing, normalization, multiplicative scatter correction (MSC), standard normal variate (SNV), orthogonal signal correction (OSC), spectra derivatives, de-trending, and combination among them [3]. The selection of spectra pre-processing method must be followed by the knowledge of the sample characteristics, sample measurement protocol, radiation interaction, and the requirements of the analytical problem [4]. Nutritive parameters such as NDF, ADF, pH, starch content, moisture content, IVOMD, IVDMD and others are buried in the spectra pattern along near infrared region. Specific wavelengths are corresponded to certain nutritive attributes from which represents the amount of light absorbed, reflected or transmitted. Light absorption and scattering of the NIR radiation are two main phenomena affecting the featured spectrum [5]. Spectra data can also be transformed into its derivative as presented in Fig. 2 to observe spectra feature more details and detect potential noises along near infrared wavelength region. Spectra derivatives is a good way to correct these distortions which commonly employed a Savitzky-Golay derivative algorithm.
The main aim of NIRS application is to establish models used to predict and determine nutritive or quality attributes of studied samples. In last few decades, multiple linear regression (MLR) was firstly applied with a few original variables previously transformed and selected to carry the relevant and important chemical properties information. Nowadays, other regression approaches like principal component regression (PCR) and partial least squares regression (PLSR) are usually employed [6].  These two regression methods were fitted like a glove in the emerging field of NIRS till now. Both methods seek to find best correlation between NIRS spectra data and respective nutritive or quality parameters such as NDF, ADF, IVOMD, and IVDMD in this case. Prediction models were developed by regressing spectra data (X variable) and actual nutritive attributes obtained by standard laboratory procedures (Y variable). Predicted results were then compared to the actual one, in order to judge the prediction performance as shown in Fig. 3.
Prediction models can be developed directly either using raw (un-treated) spectra data or enhanced (treated) spectra [7]. As mentioned previously, it is necessary to correct and pre-process the near infrared spectra data in order to achieve more accurate and robust prediction results. Table 1 showed a comparison of prediction performance between raw and corrected spectra data for feed nutritive parameters determination.
Prediction performance may vary depends on which spectra pre-processing algorithms and regression approach to be used. As presented in Table 2, prediction performances were varied among different spectra correction methods were in this data analysis using standard normal variate (SNV), baseline shift correction (BSC), and de-trending (DT) generally improved when the models are constructed using corrected spectra data.

Instrument setup
Near infrared spectra data of feed samples were acquired using a benchtop NIR instrument (Thermo Nicolet Antaris II TM). The instrument was controlled and configured under integrated software Thermo Integration® and Thermo Operation®. Specified tasks were performed by establishing workflow using Thermo Integration software. High resolution measurement with integrating sphere was chosen as a method for spectra acquisition [8].
For each spectra measurement, sample labelling was required automatically prior to acquisition in order to distinct feed samples respectively. Spectra data were acquired and recorded as absorbance spectrum in wavelength range from 1000 to 2500 nm and saved in three different file formats: Nicolet (.spa), Jcamp (.jdx) and comma separated value (.csv). Standard laboratory methods were also prepared for four mentioned nutritive parameters of feed samples.

Spectra data acquisition
Near spectra data were firstly acquired by means of NIR instrument for all feed samples, made from eight sources of agro-industrial residues by products (sago residues, coconut meal, soybean-ketchup by product, coffee pulp, cacao pod, sago tree, corncob, and rice brand). Around 33 g of a bulk of feed samples were placed centrally upon sample holder. Each bulk sample was hand placed manually right to the incoming hole (1 cm of diameter) of the light source to ensure direct contact and minimize noises due to light scattering. Absorbance spectrum in wavelength range from 1000 to 2500 nm were acquired with co-added of 64 scans. Sample was set to rotate during spectra acquisition to ensure uniformity.

IVOMD, IVDMD, NDF and ADF measurements
Once after spectra data acquisitions were completed, feed samples were directly analysed to determine actual nutritive parameters. For NDF and ADF measurement, feed samples were boiled in neutral detergent and acid detergent solutions for 1 h in sequential. The NDF value was determined without using a-amylase and sodium sulfite, whilst for ADF, both of them were used. Determination of NDF and ADF were based on neutral detergent insoluble CP (NDICP) and acid detergent insoluble CP (ADICP) contents respectively and expressed as exclusive of residual ash [9].
Moreover, in vitro rumen fermentation was performed to determine IVDMD and IVOMD. Briefly, rumen fluid was collected in the morning before feeding through a rumen fistulated cow. Rumen fluid was filtered with four layers of gauze before using [10]. A 125 ml serum bottle was prepared to fill in 0.75 sample and add 75 ml buffered rumen fluid with the ratio of rumen fluid buffer (1.4 v/v). Incubation was taken place in a water bath with the temperature of 39 C for 48 hours. After incubation, supernatant obtained was analysed to determine pH its residue was further incubated with 75 ml pepsin-HCl 0.2 N solution for another 48 hours [1,10]. Feed nutritive attributes: IVOMD and IVDMD were determined by subtracting residues of organic matter (OM) and dry matter (DM) from initial prior to fermentation respectively. The in vitro incubation was conducted in three replicates and two serum  bottles represented for each replicate. Descriptive statistics of actual measured IVOMD, IVDMD, NDF and ADF are shown in Table 3.

Sample outlier detection
In many practices, NIRS users sometimes faced with the problems in selecting the most representative sample datasets for calibration, splitting a large dataset into subsets, aiming at calibration and validation, or identifying and detecting samples that are somehow considerably different from the majority of the remaining samples from which known as outlier(s) [4,11]. These outliers can be found in the sample datasets used for model construction and validation, or arise among new samples during the use of those models for independent prediction.
There are several methods that can be used to detect and remove outliers, one of the most common method is the Hoteling statistics (or t 2 ) ellipse, used to define statistical boundaries assuming a normal distribution of scores of principal component analysis (PCA). Typically, outliers are identified as samples found outside or beyond the ellipse confidence limit, usually established at the level of 95% [12,13]. Another recommended method for outlier detection is the use of the Mahalanobis distance (leverage) and the spectral residual to detect outliers in the raw spectra datasets.
Once after the outliers have been detected and identified by any kind of those mentioned methods, they should not be simply removed from the datasets, but the reasons why the outliers were present must be verified [14,15]. This can help to increase the knowledge about the data set and provide information on how to improve its quality to achieve better model performance.

Infrared spectra data corrections
Before performing prediction model development, it is necessary to pre-process and correct spectra data in order to achieve more accurate and robust prediction results. Several correction methods were available and can be employed based on sample characteristics, spectra impact and other related knowledge that must be recognized prior to spectra corrections. In this study, we employ three spectra correction methods namely baseline shift correction (BSC), standard normal variate (SNV) and detrending (DT). Those three spectra corrections were then compared and see the impact on the prediction performance on nutritive parameters prediction. In NIRS practices, spectra corrections sometimes can be combined to generate a better prediction model.

Prediction models
The important core of NIRS practices and applications is to construct and develop models used to predict desired nutritive or quality attributes of studied samples. These quality attributes can be predicted rapidly and simultaneously through a process called as calibration, by regressing NIR spectra data (X variables) and actual measured nutritive attributes (Y variables). Ideally, the sample set employed in the regression stage must be representative of the present and of future prediction samples. It means that all expected sources of variability must be considered in both, the calibration and validation sample datasets.
In most common NIRS practices, partial least squares regression (PLSR) is one of the most widely used as regression method in constructing prediction models. The PLSR method continue to be the workhorse for regression in NIRS applications. The original PLSR is a linear method which assuming a linear relationship of the modelled nutritive parameters or concentrations as a function of the infrared spectral variations. Weak nonlinearities may be solved by increasing the number of latent variables (LVs) included in the PLSR model [12,16]. Another user preference for creating and developing prediction models in NIRS is principal component regression (PCR). It is a similar multivariate regression method works based on PCA and multiple linear regression (MLR).
The prediction performances were evaluated by means of these following statistical parameters: the coefficient of correlation (r) and determination (R 2 ) between predicted and measured nutritive parameters or quality attributes, prediction error which is defined as the root mean square error (RMSE) and the residual predictive deviation (RPD), defined as the ratio between standard deviation (SD) of the population's actual value of IVOMD, IVDMD, NDF and ADF, and the RMSE of predicted nutritive parameters [1,6]. Based on literatures, good model in NIRS should have coefficient of r and R 2 above 0.75 and RPD index above 2.5 respectively [4,17,18]. The higher value of RPD, the greater probability of models to predict desired nutritive parameters or chemical concentrations of samples dataset accurately and robustly [11,19].