Prediction of fatty acid composition in intact and minced fat of European autochthonous pigs breeds by near infrared spectroscopy

The fatty acids profile has been playing a decisive role in recent years, thanks to technological, sensory and health demands from producers and consumers. The application of NIRS technique on fat tissues, could lead to more efficient, practical, and economical in the quality control. The study aim was to assess the accuracy of Fourier Transformed Near Infrared Spectroscopy technique to determine fatty acids composition in fat of 12 European local pig breeds. A total of 439 spectra of backfat were collected both in intact and minced tissue and then were analyzed using gas chromatographic analysis. Predictive equations were developed using the 80% of samples for the calibration, followed by full cross validation, and the remaining 20% for the external validation test. NIRS analysis of minced samples allowed a better response for fatty acid families, n6 PUFA, it is promising both for n3 PUFA quantification and for the screening (high, low value) of the major fatty acids. Intact fat prediction, although with a lower predictive ability, seems suitable for PUFA and n6 PUFA while for other families allows only a discrimination between high and low values.

). As it has been noted, the shapes of the spectra are homogeneous and almost overlapping. However, as expected, the absorbance of intact samples was slightly higher than that of minced samples for almost all the wavenumbers. In all breeds, the absorbance peaks between 5100-5200 cm −1 , referred to the combinational vibration O-H stretching, could be associated with water 29 . Instead, N-H vibrational overtones, linked to protein content, were not evident at typical wavenumbers (4415, 5917, 6623, 8425 cm -1 ) 30 probably because in the fat samples there was a low content of this constituent. The C-H absorption bands, characterizing the fat content, could be identified between 5700 and 5800 cm −1 which corresponds to the first overtone C-H stretching, between 8200 and 8500 cm −1 , i.e. second overtone C-H stretching. Regarding fat, spectra also showed the peaks around 7100 cm −1 , the consequential peaks between 4200 and 4400 cm −1 , linked to combination of C-H stretching and finally, it is evident the consequential absorption peaks in the 4500-4600 cm −1 .   Table 1 as for calibration and validation set.
Fatty acid profile of samples showed a wide variability which may be associated with genetic and production system diversity that characterize each autochthonous breed. This variability, highlighted in both data set, is important in NIRS models, especially if the assessment of reliable and reproducible predictive ability by NIRS is tested of the whole variability.
NIRS statistics results. The summaries of the statistics obtained from calibration, cross validation and external validation models in intact and minced samples were showed in Tables 2 and 3, respectively. For each parameter, range within NIR spectrum, optimal number of PLS factors and mathematical pre-treatment used was shown. Wavenumbers, selected in order to achieve best models, were reported for each parameter. Both for intact and minced samples a high number of PLS factors were necessary to develop the model. Standard normal variate followed by detrend as baseline correction resulted in the treatment that allowed to achieve the most accurate models. In addition, in intact samples, for some parameters it was necessary to apply a Savitzky-Golay polynomial filter (SG) to reduce the additive and multiplicative effects on spectral data 31 even if when possible the lower number of pre-processing treatment were used. For almost all parameters, the best models were obtained taking into consideration specific regions of the near infrared spectrum. It seems that the length and the area of the near spectrum were linked to a group of fatty acids: individual and total SFA obtained the best model between 5400-7500 cm −1 region while total PUFA and relative fatty acids between 5400-6100 and 7400-8400 cm −1 . Contrary, MUFA showed the best model considering the full spectrum of near a region slightly restricted in which only the tails have been cut (initial and final). As regards RPD, which represents a ratio between RMSE of cross validation and SD, in the case of intact samples, cross validation values were between 1.5 and 2.5 for C16:0, C18:0, SFA, C18:3 n3, C20:2 n6, C20:3 n3, and n3 PUFA. Values higher than 2.5 were achieved for C18:1, SFA and MUFA, whileC18:2 n6, PUFA and n6 PUFA reached the best RPD ranging from 3.9 to 4.3. All the other fatty acids showed RPD values below 1.5 in cross validation. In the external validation, the RPDs were generally lower with values between 1.5 and 2.5 for Table 2. Prediction statistics of fatty acid profile of intact fat. SNV standard normal variate, DT de-trending, SG Savitzky-Golay filter, nPLS number of partial least square terms, R 2 c coefficient of determination in calibration, RMSEC root mean square error of calibration, R 2 cv coefficient of determination in cross-validation, RMSEcv root mean square error of cross validation, RPDcv residual prediction deviation in cross validation, R 2 v coefficient of determination of external validation, RMSEv root mean square error of external validation, RPDv residual prediction deviation in external validation, RER range error ratio in external validation, SFA saturated fatty acids, MUFA monounsaturated fatty acids, PUFA polyunsaturated fatty acids. www.nature.com/scientificreports/ C16:0, C18:0, SFA, C18:1, MUFA, C18:3 n3, C20:2 n6, C20:3 n6, C20:4 n6 and n3 PUFA. Also, in prediction the best RPDs were achieved by C18:2 n6, PUFA, n6 PUFA (values between 3.2 and 3.7). Finally, the value RER (indicative of the suitability of models to categorize or quantify the samples), in models developed for intact samples showed values above 4, allowing a discrimination for all group or individual fatty acids. The RER limit of 9 was obtained by C13:0, C16:0, C18:0, C18:1, C18:2n6, C18:3 n3, C20:3 n3, SFA, MUFA, PUFA, n6 PUFA, n3 PUFA even if in the case of C13:0 it was linked to a very low R 2 indicating that the model was not applicable.
Realistically, a RER above 10, linked both to an RPD close to 3 and a R 2 > of 0.87 (in both validation models) was reported by C18:2 n6, PUFA and n6 PUFA while C18:1, SFA, MUFA. It thus seems easier to achieve more accurate models in cross validation losing accuracy in terms of RPD in external validation.

Minced fat result.
In minced samples among the SFA, C16:0, C18:0 showed the highest R 2 of about 0.80 in calibration while in cross validation and external validation R 2 was slightly lower and was between 0.76 and 0.79. The other SFA fatty acids showed modest R 2 included between 0.18 and 0.48 in calibration and between 0.15 and 0.46 in both validation models. The sum of SFA presented a R 2 of 0.89 and 0.87 respectively in calibration and validation. With regard to MUFA, the highest R 2 was obtained for C18:1 achieving R 2 of 0.89 in calibration and 0.87 in both validation models. The other MUFA showed models with R 2 between 0.40 and 0.56 in calibration and of 0.34 and 0.50 in validation, except for C20:1 that exhibited a lower value. Total MUFA had calibration R 2 of 0.90 and slightly lower values in cross and external validation (respectively 0.89 and 0.88). For individual PUFA, as in intact fat, the highest calibration R 2 values were achieved in the case of C18:2 n6 (0.95), followed by C18:3 n3 (0.85) and C20:2 n6 (0.79), while in cross validation and external validation R 2 ranged between 0.74 Table 3. Prediction statistics of fatty acid profile of minced fat. SNV standard normal variate, DT de-trending, nPLS number of partial least square terms, R 2 c coefficient of determination in calibration, RMSEC root mean square error of calibration, R 2 cv coefficient of determination in cross-validation, RMSEcv root mean square error of cross validation, RPDcv residual prediction deviation in cross validation, R 2 v coefficient of determination of external validation, RMSEv root mean square error of external validation, RPDv residual prediction deviation in external validation, RER range error ratio in external validation, SFA saturated fatty acids, MUFA monounsaturated fatty acids, PUFA polyunsaturated fatty acids. As expected, root mean square errors were lower in calibration than in cross validation or external validation, while R 2 value showed inverse pattern. The differences in root mean square error seem to depend on variability of the parameter considered: fatty acids with low concentration linked to less variability presented lower errors than higher values of the most abundant fatty acids.

Discussion
The features of the spectra, belonging to the NIR region, are represented by the absorption produced by the combination of harmonics and overtones of the fundamental frequencies of the functional groups. Visual identification is suggested in NIRS studies in order to detect the presence of compounds and to reduce the spectral region from which to extract the useful information, even if the recognition of the individual chemical compounds is not always accessible. Moreover, in the studied NIR spectra, the visual evidence of the main tissue constituents (moisture/water, fat and protein) was confirmed but the characteristic bands showed a slight shift along the wavenumber axis depending on the type of samples or instrument 32,33 .
The absorption bands of water always found in biological samples could easily be distinguished thanks to the presence of first stretch overtones and the valley absorption curve at 5500-6200 cm −1 that follows the first harmonic transitions of C-H bonds 33,34 . Typical fat spectra bands were evident around the characteristic wavenumber according to previous study on meat 32,35,36 . The consequential peaks at 5670 and 5800 cm −1 could be indicative of cis double bonds of unsaturated fatty acids according to Pieszczek et al. 36 and Garrido-Varo et al. 37 . As for proteins, the very low content in the fat samples, taking the 5% limit proposed by ElMasry et al. 33 as a reference, represent a modest contribution to the characteristics of the spectra. Furthermore, the protein absorption was likely masked by the strong peak of water and fatty acids in the same wavenumber regions as reported by Tsai et al. 38 and ElMasry and Nakauchi 33 . The development of predictive models showed that the best relationships were obtained in a specific region of the near infrared spectrum. In almost all cases, the importance of selecting a specific spectral region was confirmed within each group of fatty acids. In agreement with our results, different studies on pig loin 4,32,39 reported the best predictive models with the selected spectral range. In contrast, other studies 15,40 suggested that the optimization of spectral regions demands more processing time than considerable improvements. In particular, Cáceres-Nevado et al. 15 comparing full spectrum and a selected range didn't achieve statistically significant difference in calibration approaches. In all cases, pretreatment of the spectra, SNV and DT proved useful to remove the effects of scattering and reduce the multicollinearity. In addition, the confounding effects of baseline shift and curvature were likely reduced due to spectral difference calculations 1 . However, the lower number of math pre-processing treatments on spectra was always considered in order to avoid the complexity of interpretation, the loss of some information and the minor structural differences among very similar signal profiles 41 In various autochthonous breeds used in this research, spectral behavior was similar, exception being Crna Slavonska where the spectral discrepancy in absorbance occurred in minced samples. The difference in absorption capacities between intact and minced samples was consistent with previous studies on meat 4,15 which reported the effect of the structural loss of tissue. Cozzolino et al. 42 and Fan et al. 43 , working on lamb and pork muscles, respectively suggested that grinding interferes with structure of muscles thus affecting light absorbance. The wide variability in fatty acids values, especially for the fatty acids present in greater quantities was related to the diverse production systems and diets to which the different local pig breeds are subjected in their respective farms and countries 44 . This variability could be useful and positive for the development of predictive models by NIRS technology 45,46 . As expected, and as mentioned by other authors 15,43 the best coefficients of determination for cross-validation and external validation were observed in the minced presentation mode, even if the trend of the results was the same. Sample preparation conditions are recognized as one of the key factors influencing the capacities of the NIRS and it is well known that homogenization improves the accuracy of NIRS 26 . However, mincing the fat is time consuming and could be difficult to homogenize with a mixer because the composition changes and tissue components attaching to the equipment can create errors 26  www.nature.com/scientificreports/ depth of light penetration), chemometrics models, parameters used and environmental conditions present 39 . The NIRS research developed for the prediction of fatty acids were more abundant and spread for minced samples than for intact samples. However also in the case of minced samples often studies considered only one breed (Iberian pig samples) obtaining statistics in validation similar or slightly higher for R 2 to present work, but moderately lower errors 27 probably due to the different set sizes. Prevolnik Povše 24 , who studied fat of two local breeds (Slovenian Krškopolje pig and Croatian Turopolje pig) reported similar or slightly lower cross validation R 2 for SFA, MUFA and PUFA group, while the cross-validation errors slightly higher in our research. In the study by Müller et al. 40 on pork fat from different carcass batches, a similar R 2 was found in the prediction for MUFA, PUFA, C18:1 and C18:3 n3 while lower R 2 were obtained for C18:2 n6 and higher for SFA and C18:0. Also in this case, the errors in prediction presented were lower compared to those obtained by our models except for C18:3 n3. Previous study on melted fat obtained better results in all cases than in the present study 25,26,47 with errors of cross-validation or prediction ranging from 0.26 to 0.87 for C16:0, from 0.27 to 0.64 for C18:0, from 0.20 to 0.59 for C18:1 and from 0.15 to 0.36 for linoleic acid. Those results were probably linked to the melting condition that can affect precision and accuracy in the results. Also, Flåtten et al. 48 reported that in purified fat better results were achieved, even if in his study LC PUFA were predicted by mid-infrared transmission. Regarding the NIRS results of prediction (external validation), the obtained values were in the same order as those obtained with the cross validation confirming the goodness of the proposed models of the present work. Even if the higher number of samples considered in our study positively affected the applicability of NIRS, the relevant number of factors variability involved within samples sets (diet, rearing systems, etc.) have probably affected accuracy and precision of estimation statistics with direct effects on errors. Moreover, in our study variability of each breed was directly affected by different traditional breeding conditions of each country.
Considering the main objective of this work, the evaluation of the NIRS method for simultaneous measurement of fatty acid composition in back fat of different autochthonous pig breeds, the RPD and RER indexes used to evaluate the capacity of the models suggested that the NIR equation of C18:1, C18:2 n6, SFA, MUFA, PUFA and n6 PUFA could be considered usable in most applications, including quality starting from minced fat samples. Promising results were also obtained for the quantification of C18:3n3 and n3 PUFA. For some of the major constituents (C16:0, C18:0, C20:2 n6, C20:3 n3), the RPD achieved, linked to a RER > 9 and to a R 2 > 0.62 allowed for discrimination to differentiate high, medium and low values that could be useful on quality control categorization 49 . According to Müller et al. 40 , the calibration of minor fatty acids resulted generally poorer in terms of R 2 , RMSE, RPD and RER suggesting that NIRS cannot be used to quantify all individual fatty acids simultaneously although minced samples were used. In addition, as reported by Gjerlaug-Enger et al. 26 NIRS has the best predictive ability for organic components with large volumes. However, both RPD and RER as well as R 2 are highly dependent on the range of values in the calibration. Finally, even if RPD statistic is widely used in NIRS research for assessing the predictions efficiency 50 , Cáceres-Nevado et al. 15 suggested that this criterion cannot be generalized to all types of products or all NIRS instruments.
Considering the results obtained for intact fat, Pérez-Marín et al. 39 working on skin-free subcutaneous intact fat of Iberian pig on cross validation, reported higher R 2 than our results for C16:0 (0.88), C18:0 (0.80) and C18:1 (0.92) and lower error on average. González-Martin et al. 1 on subcutaneous fat of Iberian pigs achieved in external validation both slightly higher results of R 2 and lower errors for C16:0, C18:0 and C18:1. On the contrary, the coefficients of determination of C18:2 and especially C18:3 n3 are higher in our study than in the mentioned research. Also, Pérez-Marín et al. 39 reported calibration models poorer than our study for C18:2 (R 2 0.42) connecting these results with a lower variation data set as shown by the standard deviation of 1/3 than ours (0.75 vs. 3.0%). Minor fatty acids are rarely reported in that research and the results are often inconsistent: González-Martín et al. 1 reported higher R 2 than ours for C14:0 and lower values for C20:1.
Regarding the fatty acid group, in external validation González-Martín et al. 1 achieved lower R 2 for MUFA and PUFA, and higher R 2 and lower error for PUFA. However, it must be noted that the variation of fatty acids for Iberian pigs 1,39 was generally lower than those considered in our research because in Spain pigs are fed several diets but on the basis of the same extensive feeding programs and similar strategies. Gjerlaug-Enger et al. 26 , working on fat layers from Norwegian Landrace and Duroc pigs cut into small pieces (brick size: 3-5 mm), obtained slightly higher results for R 2 , RPD and RER for both the group and individual fatty acids. Nevertheless, a relative variation was considered by Gjerlaug-Enger et al. 26 study because calibrations were made starting from pigs fed almost the same diet and tested in two experimental stations. It is stressed that in NIRS prediction the variation should cover the population in which the calibrations will be used for subsequent predictions: a larger variability in fatty acids could be obtained if pigs came randomly from different rearing system even if the NIRS capacity and accuracy can tend to decrease 26 . However, Prieto et al. 46 suggested the use of specific prediction equations within each breed, as breed differences in NIRS meat fatty acid estimation were determined in finished animals fed a similar dietary regime and sourced from a single experimental farm. These authors reported genetic differences between breeds as the most influential factor in the accuracy of fatty acid estimation, that could be also associated to a different size of the adipocytes between the breeds linked to the absorbance in the collection of NIR spectra. At the current state, more research is needed to validate the patterns and results of NIRS estimation within different breeds.
In order to consider the model suitability for estimation of fatty acids from intact fat, the simultaneous consideration of the coefficient R 2 , RPD and RER indicates that the model was efficient for the practical quantification application of C18:2 n6, PUFA and n6 PUFA. The SFA and MUFA group, as well as C18:1 models could be considered to be suitable for screening purposes [50][51][52] . Finally, the possibility to categorize sample discriminating between high and low fatty acids values with acceptable precision seem to be promising for C16:0, C18:0, C18:3 n3 as well as C20:2 n6 and n3 PUFA.

Methods
Ethics approval and consent to participate. Animal Care and Use Committee approval was not necessary because backfat samples were collected after slaughtering of animals. The authors did not have direct control over the care of the animals because the experimentation of this study did not include the analysis of the subjects' life stages.

Sample collection.
A total of 439 backfat samples were collected after slaughter from subjects belonging to 12 European local pig, in the frame of H2020 project TREASURE (Table 4). Subcutaneous fat (backfat) was sampled 1-2 days after slaughter from the left half-carcasses between the second to the fifth lumbar vertebra, individually vacuum packed and frozen at − 20 °C and sent to the University of Florence laboratory. After thawing, intact fat samples were scanned by FT-NIRS. Subsequently, samples were minced by electric meat grinder and scanned by FT-NIRS. Once the scans were acquired, the same samples were further used for gas chromatographic analysis. For each animal, all analysis were performed in duplicate.
Reference analysis. Total lipids content was determined using the method of Folch et al. 53 ; fatty acid profile of total lipids, using the modified technique of Morrison and Smith 54 .Fatty acids (FAs) methyl esters were analyzed by gas chromatography using a Varian 430 apparatus (Varian Inc., Palo Alto, CA, USA) equipped with a flame ionisation detector. FAs separation occurred in a Supelco Omegawax TM 320 capillary column (30-mlength; 0.32 mm internal diameter; 0.25 lm film thick-ness; Supelco, Bellafonte, PA, USA). The chromatographic conditions were an initial temperature of 160 C, which was then increased by 2 C/min until the temperature reached 220 C. One microliter of sample in hexane was injected with the carrier gas (helium) at a constant flow of 1.5 mL min −1 and at a split ratio of 1:20. The detector temperature was set at 260 C. The chromatograms were recorded using computing integrator software (Galaxie Chromatography Data System 1.9.302.952; Varian Inc.). The percentage of each fatty acid was calculated on the total of fatty acids detected and expressed as g/100 g of FAMEs. Fatty acid groups were obtained as sum of all saturated fatty acids (SFA) detected, sum of all monounsaturated fatty acids (MUFA) and sum of all polyunsaturated fatty acids (PUFA).
FT-NIRS data pre-treatment and chemometric analysis. Spectra were processed by chemometric approach using Unscrambler CAMO ® software. To optimize the accuracy of calibration, several mathematical pre-treatments were applied: multiplicative scatter correction (MSC) and standard normal variate (SNV), with or without the de-trending (DT) option were applied for the correction of scatter effects in the spectra, spectral derivative Savitzky-Golay polynomial filter (SG) including a smoothing step before derivation (with 10 smoothing left side points and 9 smoothing right side points) avoiding reduction of the signal to noise ratio were applied when necessary. Furthermore, to optimize the extraction of useful information a selection or reduction from spectra were applied analyzing the spectra at the specific wavenumbers. Outliers were detected by both observing spectra line plot and principal component analysis (PCA) results. Possible outliers were identified as samples with high residual values and high Hotelling's T 2 statistic referred to spectra range (T > 2.5 as often reported for the removal of outliers) 55 . A scatterplot of leverage, respectively for intact ( Supplementary Fig. S2) and minced ( Supplementary Fig. S3) samples, were also considered in order to detect outliers. The obtained data set was split in two stratified data sets: a training (calibration) set with 80% of the samples and a validation set including the remaining 20% of the samples. In both sets, however, all breeds were included, guaranteeing the presence of 20% of animals for each breed in validation set. All models were built using partial least square regression (PLS), after other models as principal component regression (PCR) were evaluated and discarded because of lower predictive ability. To develop the model, for each parameter, the optimum number of PLS factors (nPLS) has been selected based on the one that determined the lowest error in cross validation and thus avoiding overfitting. Indeed, an internal cross-validation using the leave one-out method was applied on the training set and both the coefficient of determination of cross validation (R 2 cv) and root mean square errors in cross validation (RMEcv) were obtained. All calibrations were evaluated on the basis of both the entire NIR spectrum and spe- www.nature.com/scientificreports/ cific regions, considering previous studies and the band/overtone present in our spectral data set 32,35,36 . The best model for each trait was evaluated based on the highest coefficient of determination in calibration (R 2 ) and in external validation or prediction (R 2 v) as well as on the lowest root mean square error in calibration (RMSE) and prediction (RMSEv). Residual prediction deviation (RPD) index was calculated as standard deviation (SD) of the set of samples and the RMSE ratio in cross validation (RPDcv) and in external validation (RPDv), in order to evaluate goodness of fit and model accuracy. The relationship between the interval of composition of the reference data for the collective calibration (Ymax − Ymin) and the RMSEv, known as the range error ratio (RER) index, was calculated as statistics indicators of the greatest weight in the precision of a NIRS calibration model 50 . The model performance can be considered sufficient for a rough screening if RPD is between 1.5 and 2.5 52 . Williams and Sobering 51 suggested an 'accurate estimation capacity' if RPD values were higher than the limit of 2.5, even though afterwards the limit for the accuracy evaluation was increased to 3 52 , because the error of prediction is reduced by a factor of more than three 56 . A RER between 4 and 8 suggests the possibility of discriminating high values from low ones, while RER values in the range of 8-10 represent the possibility of predicting quantitative data and an RER above 10 or 12 indicates good predictability 5,49 .

Conclusion
In conclusion it seems possible to use NIRS technology for the prediction of principal fatty acid families (SFA; MUFA and PUFA as well as n6 PUFA) and some singular fatty acid as C18:1 and C18:2 n6 coming from a large population of European autochthonous pigs' breeds if minced fat samples are used. The homogenization of fat is promising for the quantification of C18:3 n3 and n3 PUFA and allow the screening (high and low value) for some major important constituents (C16:0, C18:0, C20:2 n6, C20:3 n3,) while it seems to be more difficult for other fatty acids.
Prediction on intact fat samples, although displaying lower predictive ability, has the advantage of being instantaneous and could be applied on marketable products. It seems suitable for PUFA and n6 PUFA as well as for C18:2 n6 while for other families (SFA and MUFA) as well as for C18:1 a discrimination between high and low values would be feasible.
The study of the specific wavenumbers at which NIR are closely associated with the fatty acid group composition resulted useful in order to achieve accurate calibrations. Moreover, the large variability of fatty acids used in this study could have affected the robustness of models. NIR spectroscopy will become more widely used in quality control, industries or breeding programs as more attention is given to reduce errors.

Data availability
The datasets used and/or analyzed during the current study are available from the corresponding author on reasonable request.