Prediction of Lard in Palm Olein Oil Using Simple Linear Regression ( SLR ) , Multiple Linear Regression ( MLR ) , and Partial Least Squares Regression ( PLSR ) Based on Fourier-Transform Infrared ( FTIR )

Fourier-transform infrared (FTIR) offers the advantages of rapid analysis with minimal sample preparation. FTIR in combination with multivariate approach, particularly partial least squares regression (PLSR), has been widely used for adulterant analysis. Limited study has been done to compare PLSR with other regression strategies. In this paper, we apply simple linear regression (SLR), multiple linear regression (MLR), and PLSR for prediction of lard in palm olein oil. Pure palm olein oil was adulterated with lard at different concentrations and subjected to analysis with FTIR.)emarker bands distinguishing lard and palm olein oil were determined using Fisher’s weights. )e marker regions were then subjected to regression analysis with the models verified based on 100 training/test sets. )e prediction performance was measured based on the percentage root mean square error (%RMSE). )e absorption bands at 3006 cm, 2852 cm, 1117 cm, 1236 cm, and 1159 cm were identified as the marker bands. )e bands at 3006 and 1117 cm were found with satisfactory predictive ability, with PLSR demonstrating better prediction yielding %RMSE of 16.03 and 13.26%, respectively.


Introduction
Adulteration of oils is an issue persisting in the market [1].In 2013, a company in Taiwan was found to market cheaper oils as premium class oils. is was followed by an incident of lard-based cooking oil being adulterated with gutter oil where more than 1,300 food products were affected [2,3].Consumer Voice [4] further reported that 47.09% of 1,015 edible oil samples tested from 14 states in India were not in compliance with the Food Safety and Standards Regulations.
Lard is considered one of the cheaper oils in the food industry.It can be blended effectively with other oils, with the intention to reduce the production cost.e presence of lard in cooking oil is important due to two perspectives: economic considerations and religious restrictions.Religions such as Islam and Judaism forbid the consumption of swine and any of its derivatives [1,5] and hence should not be present in halal-labelled products.From the economic perspective, the credibility of Malaysia as a major producer and exporter of palm oil would be at risk should their products be found adulterated.A company in Malaysia was allegedly charged with intention to export palm oil adulterated with fatty acid to Sri Lanka [6].
Various methods have been developed to identify the adulteration of cooking oil; these include Gas Chromatography Mass Spectroscopy (GC-MS), High-Performance Liquid Chromatography Mass Spectrometry (HPLC-MS), Fourier-Transform Infrared (FTIR), Nuclear Magnetic Resonance (NMR), etc. e advantages and disadvantages of these analytical methods for adulterant analysis are summarized in Table 1.
Most of these techniques are costly and time-consuming.FTIR offers the advantages of rapid analysis with minimal sample preparation and is inexpensive.
is technique, integrated with statistical approach particularly partial least squares (PLS), has demonstrated promising sensitivity for adulterant analysis [12,13,14].FTIR coupled with PLS has been used for detection of adulterants in edible oils including avocado oil, sunflower oil, and palm oil with a detection limit as low as 2-3% [15,16,17].In some cases, the detection level may be much higher, for example, the quantification of hazelnut in virgin olive oil is reported at 25% or higher.PLS regression has been commonly coupled with FTIR technique for prediction of adulterants; there is however limited study on the possibility of other regression strategies.Hence in this paper, we apply simple linear regression (SLR), multiple linear regression (MLR), and partial least squares regression (PLSR) for prediction of lard in palm olein oil using FTIR. is will provide fundamental knowledge on the performance of different regression models for adulterant analysis contributing toward quality control purposes.

Sample Preparation.
Readily available palm olein cooking oil was purchased from the local market.Pure lard was extracted from adipose tissues of swine purchased from the local market.e adipose tissues were cut into small pieces and heated in an oven at 90 °C for 2 hours.e liquid fat was ladled into a glass jar.It was left to cool to room temperature before storage.Prior to use, lard was preheated with a block heater (Stuart SBH200D) at 50 °C for 1 hour, until the solidified lard turned into liquid.e lard, pure palm olein oil, and the adulterated olein oil at 20% and 50% were analysed.e samples were agitated with a vortex mixer (VELP Scientifica Model ZX4) for 1 minute to ensure homogeneity [1,17,18].

FTIR Spectra Measurement.
e samples were scanned with a Fourier-transform infrared spectrophotometer ( ermo Scientific Nicolet iS10) equipped with a diamond crystal attenuated total reflectance (ATR).e spectra were acquired at a resolution of 4 cm −1 with 64 scans in the range of 4000-525 cm −1 .e spectrum was ratioed against a fresh background spectrum recorded from the bare ATR plate.
Prior to collection of each background spectrum, the ATR plate was cleaned with pure ethanol.At each concentration level, a total of 20 replicates were scanned yielding 80 spectra.e spectra were saved in csv format for further analysis using Matlab R2013a.

Spectra Processing.
e spectra were baseline corrected and subjected to peak detection according to the first derivative approach.e peaks detected were then matched across samples to produce a peak table with rows and columns representing samples and variables (in wavenumber), respectively.e algorithm is referred to [19] for brevity.e resultant peak table was analysed to deduce the marker bands differentiating pure and adulterated samples.

Variable Selection.
Fisher Weights, a multiclass variable selection method, was employed to determine the variable(s) with discriminatory ability.e weight, w m , for each variable, m, according to class (c � 1. ..C) was calculated based on the following equation.
e variable with a higher magnitude of weight is elucidated with greater discriminatory ability [20].ey are called the marker bands which are used for prediction of lard adulteration in palm olein oil using SLR, MRL, and PLSR.
where x mc and x m are mean of the variable in class c and overall mean of the variable, respectively, S m is the pooled standard derivation, and N c is the number of members in class c.

Simple Linear Regression (SLR).
e peak area of a marker band was calculated as the sum of signal from peak start to peak end.e vector of peak area, X, is assumed with linear relationship with the corresponding lard concentration, C. e regression is expressed as  C � b 1 X + b 0 where b is the coefficient and  C is the predicted concentration.[10], and e [11].
2 Journal of Chemistry

Multiple Linear Regression (MLR).
e calibration model was built using the spectral data, X (a matrix), with its corresponding lard concentrations, C, in which C X • B and B (X9 considering only the linear terms [21].

Partial Least Squares Regression (PLSR).
e PLS calibration model was developed using the spectral data, X, and its corresponding lard concentration, C, based on two principal components.e PLS algorithm assumes a linear relationship between X and C. ey are decomposed into the models of X T • P + E and C T •q + f, where E and f are the noise, T is the scores matrix common for X and C, and P and q are the loadings matrices.e algorithm of PLS involves the projection of X onto the weight vector to get a scores vector, t.X is then projected into the scores to get loadings, p.After every PLS component, the X matrix is de ated by subtracting t • p from X. e algorithm of PLS according to NIPALS (non-linear iterative partial least squares) is explained in detail in [21].

Model Evaluation.
e models were built using the training samples and validated with the test samples.A twothird of the 80 spectra were used as the training samples with equal number from each class whilst the remaining served as the test samples.e samples were split randomly for 100 iterations, and these 100 training/test sets were subjected to SLR, MLR, and PLSR according to the selected spectral regions for prediction of lard.For PLSR, the matrix of training samples was in addition standardized, and the corresponding concentration, C, was mean-centred; the test set was standardized using the mean and standard deviation of the training samples.
e prediction performance was evaluated based on the percentage root-mean-squares error (%RMSE), in which A lower %RMSE signi es better prediction.Typically, the training samples will inherit better prediction than the test samples.However, if a model predicts exceptionally well for the training samples but not for the test samples, it implies that the model is over tted.Figure 1 illustrates the ow chart of the training/test set splitting for regression analysis.e process was programmed as a routine, and all analyses were performed in Matlab R2013a.
Analysis of Variance (ANOVA) with Tukey's test was performed to evaluate the %RMSE attained based on di erent spectral regions over 100 training/test splits to determine if there is a signi cant di erent at 95% con dence level.

Results and Discussion
e spectra pattern of pure and adulterated oil is shown in Figure 2; they are considerably similar with several major absorption peaks identified at the regions of 3000-2800 cm −1 , 1700-1600 cm −1 , and 1500-900 cm −1 .ese characteristic peaks are likewise reported by [1] with some discrepancies; the peak at 2954 cm −1 is shifted to 2922 cm −1 and that at 914 cm −1 is inconsistently detected.
Based on Fisher Weights, five peaks at 3006 cm −1 , 2852 cm −1 , 1117 cm −1 , 1236 cm −1 , and 1159 cm −1 were identified as variables with the most significant discriminatory ability, agreeing with [22].ese peaks were reported to reduce in intensity with increasing concentration of lard; nevertheless, this observation is not entirely evidenced in the present study.e peak at 3006 cm −1 was seen to increase corresponding to lard concentrations, opposing the findings of [22].For other marker bands, an inverse relationship is demonstrated between the peak intensity and concentration of lard as reported.Figure 3 illustrates the spectral regions of five variables with the most significant discriminating ability.
e peak at 3006 cm −1 is attributed to the stretching of cis C�CH bond in unsaturated fatty acids, whereby the more abundant the bond is, the higher the peak intensity [23].As stated on the label of palm olein oil used in this study, the product contains 43% saturated fats, 43% monounsaturated, and 14% poly-unsaturated fats.In comparison to the composition of lard with 48% and 11% mono-and polyunsaturated fats, as reported by [5], the lard is anticipated with richer cis C�CH bonds. is offers an explanation to the positive correlation between the peak intensity and lard concentrations.e peak at 2852 cm −1 is the characteristic of C-H stretching where the intensity is governed by the abundance of long-chain saturated fatty acids [24].Typically, lard contains higher amounts of stearic acid (18 : 0); nevertheless, its total saturated fatty acid (42%) is lower than palm olein oil (45.8%) supporting the reduced intensity at 2852 cm −1 as the lard concentration increases.e peak at 1117 cm −1 on the contrary is attributed to the out-of-plane CH bending; according to [13], a higher abundance of oleic acyl groups in oil (18 : 1) would evidence a reduction in the peak intensity.Lard typically contains 42% of oleic acid whilst palm olein oil comprises of 38% [25]; this suggests the inverse relationship between the peak intensity and lard concentrations.Other peaks at 1236 cm −1 and 1159 cm −1 are linked to the stretching of C-O group in esters.According to [1], the fingerprint region at 1500-1000 cm −1 is the most suitable for discrimination of pure oil from the admixture of lard.
A two-third of the 80 spectra was randomly assigned as the training samples (n � 52) to develop the calibration model whilst the remaining 28 samples were used to test the model.Note that, for the training set, each level of concentration has an equal number of samples.A total of 100 training and test sets were used to ensure the model is consistent and reliable for prediction.ese 100 training/test sets were subjected to SLR, MLR, and PLSR according to spectral regions of 3006 cm −1 , 2852 cm −1 , 1117 cm −1 , 1236 cm −1 , and 1159 cm −1 .
Table 2 summarizes the %RMSE of prediction according to spectral regions and training/test sets using various regression models.Evidently, the spectral regions with better predictive ability are those at 1130-1100 cm −1 and 3020-2990 cm −1 , where the peak maximum is recorded at 1117 and 3006 cm −1 , respectively.is is demonstrated in PLSR and MLR with the former outperforms the latter whilst SLR exhibits exceptionally poor prediction across all regions-presumably has no predictive ability.e %RMSE based on the regions at 1159, 1236, and 2852 cm −1 continue to increase in ascending order, according to PLSR, indicative of diminishing predictive ability.An extensive review on infrared spectroscopic technique for adulteration of food lipids [26] corroborated the aforementioned effective region at 3020-2990 cm −1 and 1130-1100 cm −1 for prediction of lard [1,13,[27][28][29][30][31].
Among the three regression models, PLSR demonstrates more reliable and consistent prediction; this approach has been widely used for prediction of adulterants exhibiting superior accuracy over other strategies such as principal component regression, ordinary least squares and ridge regression [32,33].MLR is a linear approach that models the relationship between a dependent variable with more than one explanatory variable (independent). is approach will fall short when the number of independent variable is more than the number of sample, such as the spectral data, and if the variables are not independent.Besides, if the variables are characterized with profound noise, the prediction may be very susceptible to changes [34].SLR on the other hand is very sensitive to outliers and tends to be overfitted.Figure 4 illustrates the predicted concentration versus the expected concentration of test samples based on three different models (SLR, MLR, and PLSR) with specific reference to the spectral regions of 3006 and 1117 cm −1 .

Conclusion
In this paper, we compared three di erent regression models (SLR, MLR, and PLSR) for prediction of lard in palm olein oil.e marker bands for di erentiation of lard and palm olein oil were identi ed at 3006 cm −1 , 2852 cm −1 , 1117 cm −1 , 1236 cm −1 , and 1159 cm −1 .e regions with promising predictive ability were con rmed at 3006 and 1117 cm −1 with PLSR demonstrating better accuracy.

Figure 1 :
Figure 1: Flow chart of the training/test sample splitting for regression analysis.

Figure 2 :Figure 3 :
Figure 2: e infrared spectra pro le of pure and adulterated oil.

Figure 4 :
Figure 4: Predicted concentration versus the expected concentration of test samples based on three di erent models (SLR, MLR, and PLSR) with speci c reference to the spectral region of 3006 and 1117 cm −1 .

Table 1 :
e advantages and disadvantages of some common analytical methods for adulterant analysis in edible oils.

Table 2 :
%RMSE of prediction according to spectral regions and training/test sets using various regression models.