Study of adulteration of extra virgin olive oil with peanut oil using FTIR spectroscopy and chemometrics

Abstract A methodology based on Fourier transform infrared spectroscopy with attenuated total reflectance sampling technique, combined with multivariate analysis, was developed to monitor adulteration of extra virgin olive oil (EVOO) with peanut oil (PEO). Principal components regression (PCR), partial least squares regression (PLS-R), and linear discriminant analysis (LDA) allowed quantification of percentage of adulteration based on spectral data of 192 samples. Wavenumbers associated with the biochemical differences among several types of edible oils were investigated by principal component analysis. Two sets of frequencies were selected in order to establish a robust regression model. Set A consisted on the frequency regions from 600 to 1,800 cm−1 and from 2,750 to 3,050 cm−1. Set B comprised 17 discrete peak absorbance frequencies for which the communality value was higher than 0.6. Analysis of an external set of 25 samples allowed the validation and evaluation of the predictability of the models. When using a specific set of discrete peak absorbance frequencies, the R 2 coefficients for the prediction were 0.960 and 0.977, and the root mean square error (RMSE) were 1.49 and 1.05% V/V when using the PCR or PLS-R models, respectively. LDA was successful in the binary classification presence/absence of PEO in adulterated EVOO (with 5% V/V of less of PEO). LDA provided 92.3% correct classification for the calibration set and 88.3% correct classification when cross-validated. The lowest detectable concentration of PEO in EVOO was the lowest adulteration level studied, 0.5% V/V.


PUBLIC INTEREST STATEMENT
Analysis of the quality and purity of edible oils, in general, and of extra virgin olive oil (EVOO), in particular, is of great importance. Adulteration of pure EVOO either with low-priced oils, with degraded used oils or with toxic mineral oils can have great economic and social impact, and is a serious public health problem because harmful substances are delivered to the organism of unwary consumers.
Several techniques can be used in the authentication of edible oils and other foodstuff, such as enzymatic and PCR based methodologies. Absorption spectroscopy in the visible, near infrared or in the mid-infrared has enormous advantages over classical methods as they provide real-time analysis.
In this work, infrared spectroscopy combined with statistical techniques was used to develop methodologies for the quantitative analyses of mixtures of EVOO with peanut oil in order to predict the adulteration level.

Introduction
Extra virgin olive oil (EVOO) is a vegetable edible oil made from healthy and intact fruits of the olive tree (Olea europaea L.) only by mechanical means (crushing, malaxation, and centrifugation) and can be directly consumed unrefined. No chemicals are used in this extraction process, thus keeping its original characteristics and constituents which are lost in refined oils (Nieto, Hodaifa, & Lozano Peña, 2010). EVOO has being extensively studied because of its biological and sensory properties, and because is an agricultural product of paramount reputation.
The detection of adulteration of food products is important for consumers, industries, and retailers. Analysis of the quality and purity of edible oils in general and of EVOO, in particular, is of great relevance and has been the subject of research of several authors (Ben-Ayed, Kamoun-Grati, & Rebai, 2013). Adulteration of pure expensive edible oils either with low-priced oils, with degraded used oils or with toxic mineral oils can have great economic and social impact, and is a serious public health problem (Johnson, 2014). Adulteration results in the integration of harmful substances in foodstuff supplied to unwary consumers. In particular, peanuts and derivatives, such as peanut butter or peanut oil (PEO), are known sources of allergies (Al-Muhsen, Clarke, & Kagan 2003). Consequently, the improvement of fast and non-expensive analytical methodologies having the ability of detecting such adulterations in EVOO is a current topic of research.
Quite a few techniques have been applied for quantification of adulteration of EVOO, for example, Fourier transform infrared (FTIR) spectroscopy, real-time mass spectrometry, gas chromatography, and high-performance liquid chromatography (Kataoka, Lord, & Pawliszyn, 2000). Optical techniques, such as Raman, fluorescence, and absorption spectroscopy, are reagentless, non-destructive analytical techniques having an increasing number of applications in the study of foodstuff. Namely, near infrared and FTIR spectroscopy found applications across a wide range of fundamental (Movasaghi, Rehman, & ur Rehman, 2008) and applied sciences (De Luca, Oliverio, Ioele, & Ragno, 2009) and production lines in industry (Roggo et al., 2007).
FTIR combined with chemometric methods is a powerful analytical approach. It does require minimal sample preparation, particularly when used in conjunction with attenuated total reflectance (ATR) (Vlachos et al., 2006). It has been recognized as a fast analytical technique to detect and quantify the presence of adulterants because it provides important information about the presence of certain functional groups. It is also considered a "fingerprint technique," meaning that there are no two oils with the same FTIR spectra either in the number of peaks or in the maximum peak intensities.
The principal component regression (PCR) and the partial least square regression (PLS-R) techniques are well-known statistical methods, often used in quantitative prediction methodologies based on spectroscopic data (Martens & Naes, 1989). Both techniques have large acceptance in a wide range of scientific fields. The main reason is that they have been designed to meet the situation where there are many, possibly correlated, predictor variables and few samples: a situation that is common, especially in food science where developments in spectroscopy since the seventies have revolutionized chemical analysis. They are based on reduction of data dimensionality and inverse calibration, in systems where there is a possibility to calibrate for the desired component while implicitly modeling the other source of variation (Miller & Miller, 2005). Both techniques can be applied to full infrared (IR) spectrum; however, some regions of the IR spectrum possess very little or irrelevant information. Useless signals coming from the interferences or instrumental drifts must be ignored (Centner et al., 1996;Wentzell & Vega Montoto, 2003).
In this work, FTIR-ATR spectroscopy combined with PCR and PLS-R techniques were used to develop methodologies for the quantitative analyses of mixtures of EVOO with PEO in order to predict the adulteration level of EVOO. In order to determine frequency regions useful to distinguish EVOO from other edible oils, the infrared spectra of soya bean oil (SOO), corn oil (CO), palm oil (PAO), sunflower oil (SFO), and grapeseed oil (GSO) were measured. To investigate the possibility of creating a robust regression model, a set of continuous spectral bands and a set of discrete wavenumbers were selected.

Sample preparation
Five brands of commercial EVOO were bought from local producers (samples were collected in the olive mills). Five brands of PEO were acquired from producers specialized in cold press oils. Four brands of each of these oils were used to prepare EVOO samples with different concentrations of PEO, constituting the so-called "calibration set", i.e. a set of samples used to build and validate the PCR and PLS-R models. The remaining brand of each oil was reserved to prepare a set of samples, referred below as the "external set", to test the predictability of the PCR and PLS-R methods.
The EVOO/PEO samples were prepared by mixing EVOO with PEO in different volume proportions, from 0.5 to 5% V/V in 0.5% V/V steps and from 6 to 30% V/V in 2% V/V steps (total of 23 sampling points) and mixed using a vortex mixer to ensure total homogenization. Each one of the sampling points of the calibration set was represented by eight samples: two out of four PEO brands were randomly selected and mixed with all four EVOO brands. The infrared absorption spectra of a set of 192 samples (184 EVOO/PEO, 4 pure EVOO, and 4 pure PEO) was measured and constituted the calibration spectral data-set. The remaining EVOO and PEO brands originated an external set of 25 spectra (23 EVOO/PEO, 1 pure EVOO, and 1 pure PEO). The pure oils were preserved in the original package and mixed samples were stored in polyethylene terephthalate flasks. Five other types of commercially available edible oils were acquired: SOO, CO, PAO, SFO, and GSO.

FTIR-ATR measurement
Infrared spectra were collected in a "Unicam Research Series" FTIR spectrometer equipped with a single-reflection "Golden Gate" diamond ATR module, a deuterated Lalanine doped triglycene sulfatedetector, and a KBr beamsplitter. The equipment is connected to computer and controlled by WinFirst 1.1 software (Madison, USA).
FTIR-ATR measurements were performed by pipetting a small drop (~5 μl) of edible oil on top of the ATR baseplate, which was kept at 30°C. All infrared spectra, recorded in absorbance mode, were collected in the region of 500-4,000 cm −1 , co-adding 128 interferograms at a resolution of 4 cm −1 , the collection time being approximately 2 min. Each measurement was repeated three times and averaged using the software that controls the equipment. From each of oil samples, five replicates were analyzed.
Each spectrum was subtracted against the background spectrum and at every three scans, a new background spectrum was taken. The ATR base was carefully cleaned in situ by scrubbing with ethanol and dried with soft tissue before measuring the next sample. The cleaning method was verified by collecting a background spectrum and comparing it with the previous one.

Mathematical treatment
Baseline drifts of the spectra were corrected by using a fourth-order polynomial. Smoothing of the spectra was based on the Savitzky-Golay algorithm using a third-order polynomial and frames of seven points. Afterward, data were mean-centered and standardized using the standard normal variate transformation. A principal component analysis (PCA) was initially utilized to inspect differences between samples. PCA transforms the large number of potentially correlated variables into a smaller number of uncorrelated factors (principal components, PCs), and thus reduces the size of the data-set. PCA permitted the identification of the most important variables (wavenumbers) associated with the differences between several edible oils. Elimination of uninformative spectral variables is important to achieve more robust and less complex models, nevertheless able to predict EVOO adulteration.
For qualitative analysis, principal components contributing to the variance of the data-set were subjected to linear discriminant analysis (LDA) in an attempt to predict the likelihood of a sample belonging to a previously defined group. LDA is a statistical method used to find a linear combination of structures with the potential to characterize or separate classes of objects or observations. The resulting arrangement may be used as a linear classifier or dimensionality reduction.
For quantitative analysis, principal components considerably contributing to the variance of the data-set were regressed using PLS-R and PCR onto the referred variables.
The calibration methodology for quantification of the adulteration relied on two steps, so-called calibration and validation. In the calibration step, a mathematical model was built to establish a relation between the matrix of FTIR spectra (predictor variables) and the concentration of analytes of interest (response variables), using a set of observations usually named calibration set. In the validation step, the developed model was used to calculate the concentration of samples not used to setup the model (De Luca et al., 2009).
The relative performance of the established model is evaluated by the root mean square error of calibration (RMSEC), root mean square error of cross-validation (RMSECV), and multiple coefficient of determination or regression coefficient (R 2 ). The model selected is then used to determine the concentration of samples of an independent prediction set (or external set). The predictive ability of the model is evaluated from the root mean square of prediction (RMSEP). The lower the RMSEP value, the higher the degree of accuracy of the prediction result provided by the calibration model. PCA, LDA, PCR, and PLS-R calculations were performed using the Excel-based "XLSTAT" V2006.06 package (Addinsoft, Inc., New York, USA) and statistical software "Unscrambler" V9.6 package (Camo, Oslo, Norway).

FTIR spectra analysis
All the spectra were collected in mid-infrared region between 500 and 4,000 cm −1 . Figure 1 displays the average FTIR spectra of five replicates of each one of the seven edible oils in the range 600-3,050 cm -1 . All spectra look very similar because all vegetable oils are essentially composed by 92% of triacylglycerols, approximately 5% of di-and monoacylglycerols and small levels of other components. There are minor differences between the oils, namely between EVOO and PEO. A closer observation, inset of Figure 1, reveals small band shifts and changes in the relative peak intensity (absorbance), especially at about 1,117, 2,922, and 3,007 cm −1 . The band at 3,007 cm −1 can be attributed to the stretching vibration of cis-double bonds while peak at 1,117 cm −1 corresponds to the CO group vibration. Small differences in relative intensity values were exploited in the classification and quantification of EVOO adulterated with PEO, as discussed below.

Exploratory principal component analysis
Each spectrum was divided in two regions, from 600 to 1,800 cm −1 and from 2,750 to 3,050 cm −1 . The regions left out of the analysis presented an overall low signal-to-noise ratio and were found to be prone to induce misclassification. In addition, variation in the laboratory atmosphere during measurement induces random spectral changes in the region between 1,800 and 2,750 cm −1 .
PCA was then applied to the two selected spectral regions to investigate the similarities and differences between EVOO and other vegetable oils. It was shown that the total variance of the dataset could be explained by 29 principal components, among which the first six principal components, with eigenvalue higher than 1, explain approximately 91.6% of the total variance.
In statistics, communality is defined as the sum of the squared principal component loadings for all principal components for a given original variable. It is the variance in that variable accounted for by all the principal components. In other words, the communality measures the percentage of variance in a given variable explained by all the principal components jointly and may be interpreted as the consistency of the indicator (Abdi & Williams, 2010). By definition, the initial value of the communality in PCA is 1. Small communalities' values after extraction indicate variables that do not fit well the principal component solution and should be dropped from the analysis (Field, 2005). According to Stevens (2002), a lower limit of 0.6 should be used.
Wavenumbers (predictor variables) for which the communality value of each principal components out of the six was higher or equal to 0.6 were considered as meaningfully explaining the variance of the spectral data-set and then were considered as prospective wavenumbers associated with the chemical differences among the seven types of vegetable oils considered. These wavenumbers belong to 17 bands with peaks at 3,007, 2,952, 2,922, 2,853, 1,745, 1,701, 1,463, 1,417, 1,376, 1,361, 1,234, 1,160, 1,117, 1,099, 1,037, 850, and 723 cm −1 , which were then selected for a second PCA. The peak at 723 cm −1 is associated with a CH 2 rocking mode, while the peaks at 1,099, 1,117, 1,160, and 1,234 cm −1 correspond to CO stretching vibration. The peak at 1,463 cm −1 is associated with a CH 2 bending (scissoring) vibration and the peak at 1,376 cm −1 is ascribed to symmetrical bending of CH 3 . A large peak at 1,745 cm −1 is responsible for the CO stretching vibrations. Symmetric and asymmetric stretching vibrations of CH 2 are caused by the absorbance at 2,853 and 2,922 cm −1 , respectively (Downey, 1998;Guillén & Cabo, 1997;Lerma-García, Ramis-Ramos, Herrero-Martínez, & Simó-Alfonso, 2010). Figure 2 represents the score plot obtained from a PCA using the absorbance at the mentioned peaks. The first principal component (F1) explained 54.0% variance, while the second (F2) and the third (F3) explained 29.4 and 9.1%, respectively; therefore, approximately 92.6% of variance can be described by only three principal components.
As in the score plot of Figure 2, each oil occupy different position in the F1/F2/F3 space, the obvious conclusion is that PCA allows qualitative discrimination between the edible oils under analysis. PEO, PAO, and EVOO are clearly separated from cluster formed by CO, SO, SFO, and GSO. In particular, PEO and EVOO are well apart in the F1/F2/F3 space (they are in opposite sides of the F2 axis) thus making these oils distinguishable from each other. CO and SOO are very close to each other; therefore, they would be very difficult to distinguish using infrared spectroscopy.
The loading plot in Figure 3 (two-dimensional plot, for the sake of clarity) reveals, that frequencies of 3,007, 2,922, 2,853, and 1,117 cm −1 and, at a much smaller extend, the frequencies of 1,745, 1,160, and 1,099 cm −1 , are the most important frequencies for the formation of principal components F1 and F2. Together, they explain 84.6% of the total variance.

Supervised analysis of EVOO adulteration with PEO
The average absorption spectra of EVOO samples adulterated with PEO and the spectra of pure oils are shown in Figure 4, where for the sake of clarity, only the spectra of 5, 10, 15, 20, 25, and 30% V/V PEO samples are represented. A sub-set of samples, constituted by EVOO adulterated with 5% V/V or less of PEO, pure EVOO and PE, was classified by LDA using the absorbance at the 17 frequencies referred in previous section. LDA was based in the à priori classification of each sample as EVOO, PEO/EVOO, or PEO. This classification constitutes the categorical or dependent variable. Figure 5 shows the similarity map as defined by discriminant factors DF1 and DF2, which explained the total variance. This is a sub-set of the calibration set and was formed by four aliquots of each type of admixture.  Note: For the sake for clarity, only the spectra of 5, 10, 15, 20, 25, and 30% V/V PEO samples are represented.
The classification in Table 1 resulting from the LDA provided 92.3% correct classification for the calibration set and 88.3% correct classification when cross-validated. From the cross-validation results, it can be seen that the EVOO/PEO samples misclassified as pure EVOO is 10%. Notice that those samples belong to the set of samples adulterated with 0.5% V/V of PEO. Therefore, the lowest detectable concentration of PEO in EVOO was the lowest adulteration level studied, 0.5% V/V. This value is not a limitation of the experimental/theoretical methodology followed in this work; it is a consequence the fact that this is the minor value of adulteration used in this work. For the detection of adulteration of EVOO with cottonseed and rapeseed oils, detections limits of 1.4 and 1.32% V/V, respectively, were obtained (Gurdeniz & Ozen, 2009), values slightly higher than the obtained in the present work.

Model for prediction of olive oil adulteration based on FTIR spectral data
In the PCR and PLS-R calibration models, the evaluation of the linearity of the methods was carried out in order to show a proportional relationship between predictor variables (band intensity) and the percentage of EVOO adulteration with PEO.
The quality of the fitting was scrutinized by the RMSEC, multiple coefficient of determination or regression coefficient (R 2 , where R is the correlation factor), and by the RMSECV (Wang, Lee, Wang, & He, 2006). The optimum number of factors, (principal component or latent variables for PCR and  PLS-R models, respectively), was determined using Leave-One-Out cross-validation method. This is done by plotting the number of factors against the RMSECV and from this, the optimum number of factors is selected (Naes, Fearn, & Davies, 2002) for both PCR and PLS-R models. The optimum number of factors is the one that minimizes the RMSECV. The capability of the models to predict the percentage of EVOO adulteration with PEO for external samples was inspected by the RMSEP.
To build PCR or PLS-R models, regions containing significant information were selected, and the noisy signals arising from external interferences were ignored (Wang et al., 2006). The frequency regions used for quantification should be based on its capability to provide high correlation between actual PEO concentration and the corresponding FTIR-predicted levels.
Based on this reasoning two sets of frequencies were selected to establish an optimized regression model. Set A consisted on the frequency regions from 600 to 1,800 cm −1 and from 2,750 to 3,050 cm −1 ; Set B comprised the 17 peak absorbance frequencies in which the LDA was based on.
The behavior of the RMSEC and RMSECV as a function of the number of factors for PCR and PLS-R models was calculated for both sets of frequencies. The higher the number of factors, the lower the RMSEC value. A model build with such high number of factor would lead to over fitting, conducting to a very low RMSE value for the calibration samples but would give rise to high values of RMSE for an external set of samples. Table 2 shows the number of factors corresponding to the minimum RMSECV for PCR and PLS-R and both sets of variables. Figure 6 represent the plot of the measured percentage of adulteration against the predicted values from FTIR measurements, using Set A and B of variable, respectively, which reflect the accuracy and the performance of the PCR and PLS regression models. Table 2 also displays the quality parameters of the four multivariate calibrations models in terms of the RMSEC, RMSECV, RMSEP, and R 2 coefficients.
The coefficient R 2 measures the correlation between the measured values and the values predicted by the model. The closer the value to 1, the higher the correlation between the data. For PCR and PLS-R models, R 2 values are 0.960 and 0.977, respectively, for the external set of samples and for Set B of variables. As a general remark, both PCR and PLS regressions techniques offer low RMSEP values as compared with other regression techniques, such as multiple linear regression MLR (Martens & Naes, 1989); however the later offers better results, with a RMSEP value of 1.04% V/V. The quantification of CO in EVOO was studied in (Rohman & Man, 2012b). The lowest value of RMSEC was 0.019% V/V; however, the model showed a high value of RMSEP of 2.34% V/V, a value slight higher than the obtained in the present work. In addition, the number of factors used is high (8 factors), meaning that over-fitting occurs for such model. For binary mixtures of EVOO with sunflower, corn, soybean, and hazelnut oils, Lerma-García et al. (2010), developed MLR models capable of detecting a low-cost oil content in EVOO with RMSEP between 1.5 and 2.0% V/V, values of the same order of magnitude as in this work. The minimum number of factors corresponding to optimum RMSECV is lower for Set B and for PLS regression (4 factors). Therefore, it may be concluded that it is not necessary to use data from the whole spectrum over the full mid-infrared region but only from discrete frequencies that compose Set B. Moreover, the RMSE for calibration, cross-validation, and prediction are lower for PLS-R and Set B. Under these conditions, PLS-R provides accurate estimation of the concentration of PEO in EVOO.

Conclusions
In this work, we presented an exploratory study of the applicability of FTIR-ATR spectroscopy in the prediction of the adulteration level of EVOO with PEO. PCR and PLS multivariate regression techniques were found suitable for the application of a practical experimental methodology.
Using PCR, we were able to establish the most informative wavenumbers that provide distinction between edible oils. PCR and PLS-R were applied to full infrared spectrum as well as to the selected set of wavenumbers.
Validation and evaluation of the predictability of the models was attained by the analysis of a set of external samples. A root mean square error of the prediction value of ~1% V/V was obtained for a four-factor PLS regression model based on the discrete set of wavenumbers. Furthermore, LDA of spectral data was able to differentiate between pure EVOO and EVOO adulterated with 5% V/V or less of PEO. The lowermost detectable concentration was the lowest concentration studied in this work, 0.5% V/V. Additional research is necessary to find if a lower limit is attainable.
Our results indicate that FTIR spectroscopy combined with PLS-R applied to the wavenumbers of 3,007, 2,922, 2,853, 1,754, 1,160, and 1,117 cm −1 is reliable methodology for the quantification and discrimination of PEO in EVOO.
This study opens the perspective of future research with the goal to further decrease the number of wavenumbers necessary to discriminate edible oils. The use of a standard bulk FTIR spectrometer