1H NMR Spectroscopy to Characterize Italian Extra Virgin Olive Oil Blends, Using Statistical Models and Databases Based on Monocultivar Reference Oils

During the last few years, the global demand for extra virgin olive oil (EVOO) is increased. Olive oil represents a significant percentage of world fat consumption determining an important development of its market. In this context, the problems related to counterfeiting and product fraud is becoming extremely relevant. Thus, the quality and authenticity control of EVOOs is nowadays mandatory. In this study we focused on the use of 1H NMR technique associated with multivariate statistical analysis to characterize Italian EVOOs commercial blends. In particular, a specific database including 126 monocultivar EVOOs reference samples, was used to characterize a total of 241 Italian EVOOs blends over four consecutive harvesting years. Moreover, the effect of the minor components (phenolic compounds) on the qualitative characterization of blended EVOOs was also evaluated. The correlation analysis of classification scores obtained using two pairwise orthogonal partial least square-discriminant analysis models (built with major and combined major–minor components NMR data) revealed that both could be profitably used to generally classify the studied Coratina containing blends.


Introduction
To date, due to its sensorial, nutraceutical, and other well-known health properties, extra virgin olive oil (EVOO) is considered one of the most important classic products of the Mediterranean countries, often called "the yellow gold" [1][2][3]. World olive oil production is still essentially concentrated in the Mediterranean basin, in particular in Spain and Italy, two countries representing almost all world exports (60% Spain and 20% Italy). Italian product covers on average 15% of world production (compared to 45% of Spain) [4]. Italian EVOO is worldwide recognized as a high value product thanks to its well established "health appeal", quantitatively limited production and organoleptic characteristics [1,5]. Indeed, Italian food market is widespread known for its richness of varieties and high quality products [6]. Over the years, the high national and international commercial value of Italian EVOOs has led to their adulteration with low quality foreign olive oils and in some cases, also with the addition of other low cost edible vegetable oils of uncertain origin. Therefore, the establishment and systematic implementation of a reliable quality control methodology for certifying the EVOO authenticity is a priority issue. The European Union Regulation 182 of 6 March 2009 [7], declared the OPLS-DA models were therefore built using both the standard zg NMR spectra and the combined zg noesy NMR spectra. These latter better accounts for the phenolic components responsible of the blends bitterness/pungency characteristics essentially due to their expected Coratina content. The studied blend samples were classified by prediction on these two models and the correlation observed for the resulting classification scores suggests a reasonable efficiency of both models. This NMR-bases method could be profitably used as a tool to classify commercial oil samples, putting a gate around high quality blend EVOOs and defining their characteristics with respect to the blend constituents.

Materials and Methods
Chemicals: All chemicals for analysis were of analytical grade: CDCl 3 (99.8 atom %D) and tetramethylsilane, TMS (0.03 v/v%) were purchased from Armar Chemicals (Döttingen, Switzerland). The different size of the reference samples number was chosen according to their provider declared use in the blends production. Being the model designed for Coratina based blend classification, the number of Coratina oils samples was higher in order to take into account the prevalence of Coratina cultivar and its variability within Apulia Region [27]. All the samples were stored in sealed dark glass bottles at room temperature in the dark until NMR analysis. The information reported on the label of each sample was described in detail in Table 1.

Sample Preparation for NMR Analyses
Each sample was prepared by dissolving~140 mg of olive oils in a volume of deuterated chloroform (CDCl 3 ), calculated on the basis of the ratio of 13.5% oil/86.5% CDCl 3 , w/w (standard Bruker methodology). Then, 600 µL of the prepared mixture was transferred into a 5 mm diameter NMR tube and subjected to spectroscopic analysis. Oil samples were provided by the producers before bottling. According to the previously published work flow chart illustrating the procedures sequence for blend EVOO origin assessment [12] 1 H NMR analyses were performed within one month of receiving the samples, within the optimal shelf life [30].

Acquisition and Processing of 1 H NMR Spectra
1 H NMR spectra were acquired using a Bruker Avance spectrometer (Bruker Italia, Milano, Italy) operating at proton frequency of 400.13 MHz, T = 300 K, equipped with a PABBI 5-mm inverse detection probe with a z axis gradient coil. The experiments were conducted under full automation, after loading individual samples on a Bruker Automatic Sample Changer (BACS-60), interfaced with the IconNMR software (Bruker Italia, Milano, Italy). In order to characterize fatty acids signals and to enhance signals of the minor components by the suppression of these strong fatty acids signals respectively, for each sample, both a standard 1 H zg NMR and a multi-suppressed 1 H noesygpps NMR (with suppression of strong fatty acids signals) experiments were performed.
Each 1 H NMR spectrum was acquired following the conditions: zg pulse program, 64 k time domain, spectral amplitude 20.5555 (8223.685 Hz), p1 12.63 µs, pl1 −1.00 db, 16 repetitions. Each multi-suppressed spectrum was acquired under the conditions: noesygpps1d.comp2 pulse program, 32 k time domain, spectral amplitude 20.5555 (8223.685 Hz), p1 12.63 µs, pl1 −1.00 db, 32 repetitions. The chemical shifts of sample signals were calculated according to the internal standard (TMS), whose signal was set at 0 ppm. The spectra were acquired and processed using the Topspin 3.1 software (Bruker Italia, Milano, Italy). 1 H NMR spectra were segmented in rectangular fixed (0.04 ppm width) buckets and integrated by Amix 3.9.15 (Analysis of Mixture, Bruker BioSpin GmbH, Rheinstetten, Germany) software. Bucketing of 1 H zg and 1 H noesygpps NMR spectra were performed within the range 10.0-0.5 ppm and 10.0-5.6 ppm, respectively. In both cases, the spectral region between 7.6 and 6.9 ppm was excluded to eliminate from the analysis the residual solvent signal peaks area. In order to minimize small differences due to olive oil concentration and/or experimental conditions among samples the total sum normalization was then applied [31]. The Pareto scaling procedure, performed by dividing the mean-centered data by the square root of the standard deviation, was applied to the variables. The data tables obtained with all the aligned buckets row reduced spectra were used for statistical analysis.

Multivariate Statistical Analysis Applied to NMR Spectroscopy Data
After the data processing, multivariate statistical analysis, was then performed by using the Simca-P version 14 (Sartorius Stedim Biotech, Umeå, Sweden) software. In particular, unsupervised principal components analysis (PCA) and supervised partial least squares discriminant analysis (PLS-DA) and orthogonal partial least squares discriminant analysis (OPLS-DA) were performed. PCA is a chemometric method applied in order to extract the maximum information from a multivariate data structure, reducing it in a few linear combinations of the variables [32]. The PCA is used, in the first data processing step, to obtain a general overview of the samples distribution and their possible grouping in homogeneous clusters also identifying the presence of possible outliers [33]. The correlation between the clusters distribution of the analyzed samples and the considered classes (such as variety and/or geographical origin), is therefore assessed by supervised multivariate statistical analyses. PLS-DA is the most used supervised analysis for the discrimination between samples classes with different characteristics [34]. The PLS-DA is performed in order to obtain the maximum separation between groups of observations and information about the variables responsible for the observed separation, by rotating the main components (the axes that express the variance of the data) [35]. OPLS-DA is a modification of the PLS-DA technique which filters out variation not directly related to the discriminating response. This is realized by separating the portion of the variance useful for predictive purposes from the non-predictive variance (which is made orthogonal). The result is a model characterized by an improved interpretability [36]. In our study when we considered six categories (all the different cultivars) PLS-DA rather than OPLS-DA analysis was preferred for further classification purposes [37]. On the other hand, OPLS-DA analysis was used in the pairwise comparisons between Coratina and Sweetener cultivars (considered as a single class) in order to better specify the molecular components responsible for the observed discrimination, being a superior discriminating tool for two classes models [37]. The robustness of the predictive ability for the OPLS/PLS-DA model was evaluated by the misclassification tables and classification list. Classification list provides the overall classification results, according to the predicted values for Y variables (YPredPS) of observations in the prediction set. Membership of a class depends upon matching the value of YPredPS (classification score). A value close to one indicates membership of the workset class. Class membership was defined as follows; YPredPS < 0.35 (also negative values); observation does not belong to the class; 0.35 < YPredPS < 0.65; observation is borderline; YPredPS > 0.65; observation belongs to the class [38]. The misclassification table is complementary to the classification list but presents the classification results in a more compact format. The internal cross-validation default method (7-fold) and the permutation test (40 permutations), both available on the SIMCA-P software [38,39], were used in order to validate the statistical models. The quality of the obtained models was described by the R 2 and Q 2 parameters. The first (R 2 ) is a cross validation parameter indicating the portion of data variance explained by the models and represents the goodness of fit. R 2 X and R 2 Y indicate the fraction of variance of the X and Y matrix, respectively. The second (Q 2 ) is a goodness-of-prediction parameter representing the portion of variance in the data predictable by the model. Values of R 2 X(cum), R 2 Y(cum) and Q 2 (cum) represent cumulative R 2 X, R 2 Y and Q 2 up to the specified component. The minimal number of necessary components can be defined since R 2 (cum) and Q 2 (cum) parameters show completely diverging behavior as the model complexity increases [39]. For each of these three model parameters a value greater than 0.5 indicates good model quality [25]. The variables responsible for the observed discrimination (loadings) were identified by using the statistical tool S-line plot. The S-line plot, which creates a plot of the loading vectors for two components (usually the first two) is specifically tailor-made for NMR spectroscopy data. Indeed, this plot resembles a NMR spectrum where the original buckets representing the loading and colored according to the absolute value of the correlation p(corr) [1] are displaced in opposite direction depending on their values and the considered components.

New Reference Model: PLS-DA Analysis
Starting from a previously published work [12] we used the same methodology: a reference model built with monocultivar EVOOs samples from specific geographical origins was used to obtain an indication of the composition of a test set Coratina based blends expected to contain the same cultivars from the same geographical areas. Thus, we built a new reference model with 126 reference oils mostly used to produce commercial EVOOs. These were represented by monocultivar EVOO samples from specific geographical origins (Coratina, Cima di Mola, Ogliarola, and Cellina from Apulia; Carolea and Rossanese from Calabria). The supervised PLS-DA analysis performed on the zg NMR spectra ( Figure S1 (6 components explained 95.3% and 75% of the total variance of the X and Y matrix respectively and 71.1% of predicted variance (R 2 X = 0.953; R 2 Y = 0.750; Q 2 = 0.711)). PLS-DA model was validated testing for non-casualty by the permutation test performed with 40 cycles of Y variables random permutations for the considered model [34]. The model is considered successfully validated when the R 2 -intercept and Q 2 -intercept do not overcome 0.3-0.4 and 0.05, respectively [38]. The permutation test exhibited Y-intercept and Q 2 -intercept pair values at 0.0163 (R 2 ) and -0.307 (Q 2 ), 0.0259 (R 2 ) and -0.212 (Q 2 ), 0.0236 (R 2 ) and -0.237 (Q 2 ), 0.066 (R 2 ) and −0.13 (Q 2 ), and 0.0119 (R 2 ) and -0.238 (Q 2 ) for Coratina, Ogliarola, Cima di Mola, Rossanese, Cellina di Nardò, respectively, thereby demonstrating that the PLS-DA classification model was successfully validated ( Figure S2). NMR signals indicating the molecular constituents distinctive for each extra virgin olive oil class and discriminating along the t[1] component were defined by examining the loading line plot for the model (Figure 1b). The variables, indicating signals with chemical shift (δ H ) at 1.3, 2.02, and 5.34 ppm, corresponding to the methylene (n-CH 2 ), allylic (-CH 2 CH=CH-) and olefinic (-CH=CH-) protons respectively of oleic acid are higher for Coratina oil with respect to the sweetener cultivars according to literature data [40]. As known, oleic acid (C18:1) is the main monounsaturated fatty acid (MUFA) in olive oil, and it is known to play a protective effect against several diseases, such as liver dysfunction and gut inflammation [41]. Interestingly, signal ascribable to squalene (1.7 ppm) was observed in the loading line revealing higher relative content for Coratina class samples of this triterpene known for important health properties, also able to improve the olive oil stability, and thus shelf life [42,43]. On the contrary, resonance at δ H 1.26 ppm corresponding to the methylene protons (n-CH 2 ) of the saturated acyl groups showed higher values for the sweetener cultivars than for the Coratina class. The analysis of the variable trend plot for the discriminating 1.26 ppm variable (Figure 2), showed a significant higher content of saturated fatty acid, associated with sweetness flavour characteristic of the Carolea samples class. The discrimination among the sweetener cultivars, from the analysis of the 1.26 ppm bucket trend plot. Carolea cultivar is known to be characterized by high level of palmitic (C16) and stearic acids [28].
The analysis of the variable trend plot for the discriminating 1.26 ppm variable (Figure 2), showed a significant higher content of saturated fatty acid, associated with sweetness flavour characteristic of the Carolea samples class. The discrimination among the sweetener cultivars, from the analysis of the 1.26 ppm bucket trend plot. Carolea cultivar is known to be characterized by high level of palmitic (C16) and stearic acids [28]. The obtained model was used to perform the 1 H NMR analysis of commercial blend samples from four different harvesting years (Experimental section). All the samples were declared as Coratina based blends containing also Ogliarola, Cellina, Cima di Mola, Rossanese, and Carolea as "sweeteners" cultivars. In order to analyze the resulting classification scores with respect to the reference classes, each prediction set, constituted by the Italian commercial blends for a specific harvesting year, was predicted into the PLS-DA model (Figure 3a-d). Each blend sample was classified according to its classification score which reflects the blend content (Table 2). Classification scores above a fixed value limit of 0.65 resulted in a sample assignment to that specific class. In all the other case, the samples were ranked as borderline between the classes resulting with a classification score below 0.65 for that sample. On the other hand, all the analyzed oil test sets resulted essentially mixed composition blends based on Coratina (with smoother cultivars) according to their classification score for Coratina higher than 0.35 (Table S1). The PLS-DA class discrimination occurs according to the differences in the spectral fingerprints for sweeteners with respect to Coratina cultivar essentially observed along t [1] component and already discussed ( Figure  1b). Moreover the discriminating ability of the PLS-DA model also results from the differences in the spectral fingerprints among the sweetener cultivars observed along t[2] component ( Figure S3). Specific information could be obtained also by examining the line plot for the model, indicating the 1 H NMR chemical shifts of the signals, characteristic of specific metabolites, discriminating the classes along t [2]. Higher relative content of saturated fatty acids (1.26 ppm corresponding to the methylene of the saturated acyl group) were observed for Carolea cultivar, whereas higher level of polyunsaturated acyl groups (PUFA), (signals at 1.34, 2.38, 2.78 ppm) were observed for Cima di Mola cultivar ( Figure S3).  The obtained model was used to perform the 1 H NMR analysis of commercial blend samples from four different harvesting years (Experimental section). All the samples were declared as Coratina based blends containing also Ogliarola, Cellina, Cima di Mola, Rossanese, and Carolea as "sweeteners" cultivars. In order to analyze the resulting classification scores with respect to the reference classes, each prediction set, constituted by the Italian commercial blends for a specific harvesting year, was predicted into the PLS-DA model (Figure 3a-d). Each blend sample was classified according to its classification score which reflects the blend content (Table 2). Classification scores above a fixed value limit of 0.65 resulted in a sample assignment to that specific class. In all the other case, the samples were ranked as borderline between the classes resulting with a classification score below 0.65 for that sample. On the other hand, all the analyzed oil test sets resulted essentially mixed composition blends based on Coratina (with smoother cultivars) according to their classification score for Coratina higher than 0.35 (Table S1). The PLS-DA class discrimination occurs according to the differences in the spectral fingerprints for sweeteners with respect to Coratina cultivar essentially observed along t [1] component and already discussed (Figure 1b). Moreover the discriminating ability of the PLS-DA model also results from the differences in the spectral fingerprints among the sweetener cultivars observed along t[2] component ( Figure S3). Specific information could be obtained also by examining the line plot for the model, indicating the 1 H NMR chemical shifts of the signals, characteristic of specific metabolites, discriminating the classes along t [2]. Higher relative content of saturated fatty acids (1.26 ppm corresponding to the methylene of the saturated acyl group) were observed for Carolea cultivar, whereas higher level of polyunsaturated acyl groups (PUFA), (signals at 1.34, 2.38, 2.78 ppm) were observed for Cima di Mola cultivar ( Figure S3).   1 "0.65" is the fixed value to assign observations to all classes above the limit.

2016/2017 Harvesting Year
A total of 38 100% Italian commercial samples were classified by prediction in the PLS-DA model (Figure 3a). From the predicted scores plot the commercial blends confirmed their consistency with cultivars from the specific geographical origins declared by the supplier. Predicted samples of

2016/2017 Harvesting Year
A total of 38 100% Italian commercial samples were classified by prediction in the PLS-DA model (Figure 3a). From the predicted scores plot the commercial blends confirmed their consistency with cultivars from the specific geographical origins declared by the supplier. Predicted samples of mixed cultivars oils set were clustered in the middle of the scores plot slightly closer to the Coratina class, except for a small subset of samples clearly placed close to the "sweeteners" cultivar. The analysis of the classification scores reported in the Classification list (Table S1) and summarized in the misclassification table (Table 2), revealed as most of the predicted samples (33) were assigned to Coratina class (classification scores for Coratina > 0.65). One of the samples resulted assigned to Carolea class, and four of the total were not assigned to any specific class. Nevertheless, for all the samples assigned to cultivars different from Coratina or not assigned to a specific cultivar a classification score for Coratina higher than 0.35 was in any case observed (Table S1).

2017/2018 Harvesting Year
A total of 74 100% Italian commercial samples were classified by prediction in the PLS-DA model (Figure 3b). The bi-dimensional plot for the model revealed a clear compact clustering of the commercial samples in the middle of the predicted scores plot. From the predicted scores plot the placement of the commercial blends confirmed the consistency with cultivars from the specific geographical origin declared by the supplier. From the classification score analysis, the prediction of the commercial samples was evaluated. Most of the samples (63) were assigned to Coratina class (classification scores for Coratina > 0.65) while, 11 samples were not assigned to any specific class, being the predicted values borderline between classes and below the fixed 0.65 limit ( Table 2, Table S1). Moreover, in this case, for most of the samples assigned to cultivars different from Coratina or not assigned to a specific cultivar, a classification score for Coratina higher than 0.35 was observed. Only three samples were characterized by a relatively smaller classification score for Coratina (0.20 < 0.35) (Table S1).

2018/2019 Harvesting Year
A total of 80 100% Italian commercial samples were classified by prediction in the PLS-DA model. As observed in the bi-dimensional predicted scores plot for the model (Figure 3c) the predicted set is consistent with cultivars from the specific geographical origin declared by the supplier. In particular, predicted samples of mixed cultivars oil set were clustered in the middle of the scores plot confirming their blended composition. The misclassification table reported the assignment according to the prediction classification scores ( Table 2). A relevant number of the test samples (24) were directly assigned to Coratina class, confirming that the Coratina is the predominant cultivar in the provided blend samples. A total of 20 samples were also assigned to Cellina class, also accordingly to their position in the predicted scores plot. Some samples (17) overlapping between Cellina-Coratina classes were assigned to a combined Cellina Coratina class. A further subset of test samples (19) was not assigned to any specific class, being their predicted classification score values borderline between classes and below the fixed 0.65 limit (Table S1). Anyhow, for all the samples assigned to cultivars different from Coratina or not assigned to a specific cultivar a classification score for Coratina higher than 0.35 was observed (Table S1).

2019/2020 Harvesting Year
A total of 49 100% Italian commercial samples were classified by prediction in the PLS-DA model. As observed in the bidimensional scores plot for the model (Figure 3d) the predicted set is consistent with cultivars from the specific geographical origin declared by the supplier. In this harvesting year, the predicted test sample which constitute the central cluster of the scoreplot appears somehow more evenly distributed along the first component tPS [1]. The misclassification table (Table 2) confirmed the visual inspection of the scores plot. A total of 39 samples were directly classified as Coratina or combined Cellina Coratina class, nine samples were assigned to Cellina class and one blend was not assigned to any specific class. Again, all the samples not directly assigned to Coratina or combined Coratina classes were in any case characterized by a classification score for Coratina higher than 0.35 (Table S1).
As known, blended EVOOs are produced by combining monocultivar oils with different flavor and taste profiles in order to meet the demands of national and international market. In this study a 126 monocultivar samples database was used to build a PLS-DA model. A total of 241 Coratina based commercial blends samples obtained using selected cultivars from specific geographical origins produced over four consecutive harvesting years were classified by prediction using the PLS-DA model. The classification scores obtained for the test samples of the considered harvesting years reflected their expected blend composition and could provide to the producers useful information on the organoleptic aspects of the product (e.g., bitter and spicy). As already reported a minimal harvesting year effect is expected for the Coratina based blends [13]. Therefore, the observed slight variability of the blend samples according to the production year could be possibly ascribed to a variation in the EVOOs suppliers. Thus, the here reported tailor-made dataset could be used to assess the quality of the commercial blend samples with similar declared composition and highlight possible deviations with respect to the expected standard product features. Our assignment purpose was limited to assess, with statistical models, the samples compliance with the expected blend characteristics defined by Coratina and a range of sweeteners monocultivar reference oils from specific geographical origins. This assignment was simply performed according to the provider declaration of the used cultivars, without considering the percentage composition. On the other hand, a multivariate analysis (MVA)-based NMR approach-such as PLS regression (PLSR)-proved a very useful tool to assess quantitative blend composition although limited to simplified binary system [24].

Coratina vs. Sweetners Cultivars: OPLS-DA Pairwise Analysis
In order to obtain a possible classification of the examined blends according to their expected bitterness and/or pungency characteristics, a pairwise OPLS-DA analysis was further performed by considering Coratina cultivar class and the sweeteners cultivars grouped together as a single class. In this respect the resulting OPLS-DA models could be considered as a simplified classification tools with respect to the PLS-DA models described above. In the first instance the default bucket table (zg acquisition) was used for this purpose. As expected, the obtained model was characterized by a marked separation between the two class and described by excellent descriptive and predictive parameters: one orthogonal and one predictive components gave R 2 X = 0.673; R 2 Y = 0.951; Q 2 = 0.945 (Figure 4a). By examining the S line plot of the loadings for the model (Figure 4b) the molecular components discriminant for the two classes were defined. As already described for the signals, discriminating the classes along t [1] in the zg NMR PLS-DA model, the variables, indicating signals with chemical shift (δ H ) at 1.3, 2.02, and 5.34 ppm, ascribable to oleic acid are higher for Coratina oil. Moreover, resonance at δ H 1.26 ppm corresponding to the saturated acyl groups showed higher values for the sweetener cultivars than for the Coratina class.  Moreover, because the phenolic compounds characteristic of the Coratina cultivar are responsible for the bitterness and/or pungency sensory attributes of the Italian EVOO blends, a further refined model was also considered. As known, the organoleptic attributes of pungency and bitterness in olive oil are attributed to phenolic compounds [44,45]. Furthermore, phenolic compounds are also the basis of oxidative stability and of the main nutritional properties of oils and this makes the analysis of the contribution of polyphenols essential for the extra virgin olive oil research [44]. In order to enhance the polyphenol contribution, another pairwise OPLS-DA analysis, using combined bucket reduced NMR spectra was performed. The combined bucket table was generated by combining in one matrix the 1 H noesygpps NMR spectra ( Figure S4) (within the range 10.0-5.6 ppm) and the 1 H zg NMR spectra (within the range 5.6-0.5 ppm) as previously reported [10]. Further in this case, the obtained OPLS-DA model for the two classes (Coratina and sweeteners cultivars) showed a clear separation between the groups and very good statistical parameters (one predictive and one orthogonal component, R 2 X = 0.579; R 2 Y = 0.945; Q 2 = 0.936) (Figure 5a). The S line plot of the loadings for the model showed the molecular components of triglycerides and p(ctr) [  Moreover, because the phenolic compounds characteristic of the Coratina cultivar are responsible for the bitterness and/or pungency sensory attributes of the Italian EVOO blends, a further refined model was also considered. As known, the organoleptic attributes of pungency and bitterness in olive oil are attributed to phenolic compounds [44,45]. Furthermore, phenolic compounds are also the basis of oxidative stability and of the main nutritional properties of oils and this makes the analysis of the contribution of polyphenols essential for the extra virgin olive oil research [44]. In order to enhance the polyphenol contribution, another pairwise OPLS-DA analysis, using combined bucket reduced NMR spectra was performed. The combined bucket table was generated by combining in one matrix the 1 H noesygpps NMR spectra ( Figure S4) (within the range 10.0-5.6 ppm) and the 1 H zg NMR spectra (within the range 5.6-0.5 ppm) as previously reported [10]. Further in this case, the obtained OPLS-DA model for the two classes (Coratina and sweeteners cultivars) showed a clear separation between the groups and very good statistical parameters (one predictive and one orthogonal component, R 2 X = 0.579; R 2 Y = 0.945; Q 2 = 0.936) (Figure 5a). The S line plot of the loadings for the model showed the molecular components of triglycerides and unsaponifiable fractions as discriminant for the two groups (Figure 5b). Higher oleic acid could be observed for Coratina (1.3, 2.02, and 5.34 ppm), as already found in the t [1] line and S line plots of the loadings for the PLS-DA (Figure 1b) and OPLS-DA (Figure 4b) models built with the standard zg NMR spectra. Moreover, Coratina class was also characterized by higher content of phenolic moieties of oleuropein and ligstroside aglycones such as tyrosol and hydroxytyrosol derivatives (6.78 ppm), as well as for higher secoiridois derivatives oleocanthal and oleacein (9.22 ppm). These secoiridoic phenolics are known for their antioxidant and anti-inflammatory properties and organoleptic association with bitterness and pungency [45]. Therefore, these signals could be also related to a possible classification of the examined Coratina based blends according to their expected bitterness and/or pungency characteristics. On the other hand, only a specific correlation study with organoleptic analyses could buttress this features assignment to the oils classified by the model. In the case of the sweeteners class, again, the variables (buckets) ascribable to saturated fatty acids (1.26 ppm) were observed as discriminating together with signals of compounds associated to degradation processes (5.98, 5.58 ppm) [46]. The predictive capability of the two models described by the high Q 2 values was then tested by classifying the whole Italian 100% EVOOs blend test set (Table 3 and Table S2). The predictive capability of the two models described by the high Q 2 values was then tested by classifying the whole Italian 100% EVOOs blend test set (Table 3 and Table S2).
For the OPLS-DA model built with the zg NMR spectra, the classification scores analysis revealed as more of the 98% out of the considered 233 observations showed classification scores values for Coratina higher than 0.35. In particular, 56% and 43% showed classification scores for Coratina above 0.65 and between 0.35 and 0.65, respectively. Only 1.2% of the classification scores showed values lower than 0.35 (although above 0.25) with the predicted samples assigned to Sweeteners class. A slightly lower percentage (96%) of the considered 233 observations showed classification scores values for Coratina higher than 0.35 predicted on the OPLS-DA model built with bucket reduced combined zg-noesy NMR spectra. Among them, 34% and 62% showed classification scores for Coratina above 0.65 and between 0.35-0.65, respectively. Furthermore, in this case, a small number of the predicted samples (4%) were assigned to sweeteners cultivars, showing a classification scores values for Coratina lower than 0.35 (although above 0.26). The model built with the combined bucket table, seems to be characterized by a greater selectivity with respect to prediction scores values for Coratina when compared to the "standard" major components model (zg spectra bucket table). Indeed, the combined buckets model, better accounts for the presence of specific components such as polyphenols and other molecules related to the expected bitterness and/or pungency characteristics of Coratina based blend oils. Therefore, the use of the combined buckets model, restricting the requirements associated to the predictions scores for Coratina, results in an increased percentage of samples not assigned to any specific class or assigned to the sweeteners class (classification scores values for Coratina between 0.35 and 0.65 or below 0.35, respectively). The relation between the classification scores for the two models ( Figure S5) was also investigated. The regression was significant with acceptable R 2 value (0.7029) and slope (0.86) revealing a good correlation between the predictive OPLS-DA models built with zg and combined zg-noesy bucket reduced spectra. Therefore, both models could be profitably used to generally classify the studied Coratina based blends, from specific geographical origins, taking into account only the major lipid fraction (standard zg spectra) or also the minor phenolic component (combined zg-noesy spectra). This classification may also constitute an indirect method to rank commercial Coratina based blend samples according to their expected bitterness and/or pungency characteristics. 1 "0.65" is the fixed value to assign observations to all classes above the limit.

Conclusions
In this work, based on 1 H NMR data, a total of 241 of commercial 100% Italian blend olive oil samples from four different harvesting years, were classified by prediction using a reference olive oils database built with 126 monocultivar EVOOs. This dataset includes Carolea, Cellina, Cima di Mola, Coratina, Ogliarola, and Rossanese monocultivar olive oil samples, from specific geographical origins. In particular, a supervised PLS-DA model, was built and used for classification purposes of the commercial blend samples. All the classified commercial blend samples resulted essentially mixed composition blend based on Coratina (with smoother cultivars) according to their resulting classification score for Coratina higher than 0.35.
In order to obtain a simple classification tool for ranking the examined blends according to their expected bitterness and/or pungency characteristics, an OPLS-DA analysis was also used in a pairwise comparisons between Coratina and Sweetener cultivars considered as a single class. For this purpose, two different OPLS-DA models were obtained by using both the default (zg acquisition) and the combined (zg 0.5-5.6 and noesygpps 5.6-10.0 ppm) bucket reduced 1 H-NMR spectra datasets.
The predictive capability of the two models, both characterized by good Q 2 values, was tested by evaluating the classification scores for Coratina of the Italian 100% EVOOs blend test set. The analysis revealed as most of the considered 233 observations (98% and 96% for the OPLS-DA models obtained using the default and the combined bucket table, respectively) showed a classification scores values for Coratina higher than 0.35. The combined buckets model, better accounts for the presence of polyphenols and other molecules related to the organoleptic characteristics of Coratina based oils. Accordingly, its higher discrimination power results in an increased percentage of samples (4%) characterized by classification scores values for Coratina below 0.35. Nevertheless, the correlation analysis for the obtained classification scores, showed that both OPLS-DA models could be profitably used to generally classify the studied Coratina based blends, from specific geographical origins, taking into account only the major lipid fraction (standard zg spectra) or also the minor phenolic component (combined zg-noesy spectra). Therefore, tailor-made databases based on 1 H NMR data of monocultivar oils from specific geographical origins could be profitably used to build a gate around high quality blend EVOOs and define their characteristics with respect to a specific monocultivar reference oils dataset. The described models may also offer an indirect method to classify commercial samples according to their expected bitterness and/or pungency characteristics, although a further specific correlation study with organoleptic analysis is required to buttress this result.  Figure S1 Representative zg 1 H NMR spectra of EVOO sample. Main metabolites are indicated; Figure S2. Permutation test performed with 40 cycles of random permutation of Y variables on PLS-DA models for Coratina, Ogliarola, Cima di Mola, Cellina, Carolea and Rossanese cultivars; Figure S3. Line plot for the Figure 1a model, indicating the 1 H NMR chemical shifts of the signals, characteristic of specific metabolites, discriminating the classes along t [2] and colored according to the correlation-scaled loading (* p(corr) ≥ |0.5|). w*c [1] axis represented the weighted correlation vector; Figure S4 Representative noesygpps 1 H NMR spectra of EVOO sample. Main metabolites are indicated; Figure S5. Relationship between classification scores of the commercial blend samples predicted on the bucket reduced combined zg-noesy NMR spectra.
Funding: This research received no external funding.