Fatty Acids-Based Quality Index to Differentiate Worldwide Commercial Pistachio Cultivars

The fatty acid profiles of five main commercial pistachio cultivars, including Ahmad-Aghaei, Akbari, Chrok, Kalle-Ghouchi, and Ohadi, were determined by gas chromatography: palmitic (C16:0), palmitoleic (C16:1), stearic (C18:0), oleic (C18:1), linoleic (C18:2), linolenic (C18:3), arachidic (C20:0), and gondoic (C20:1) acid. Based on the oleic to linoleic acid (O/L) ratio, a quality index was determined for these five cultivars: Ohadi (2.40) < Ahmad-Aghaei (2.60) < Kale-Ghouchi (2.94) < Chrok (3.05) < Akbari (3.66). Principal component analysis (PCA) of the fatty acid data yielded three significant PCs, which together account for 80.0% of the total variance in the dataset. A linear discriminant analysis (LDA) model that was evaluated with cross-validation correctly classified almost all of the samples: the average percent accuracy for the prediction set was 98.0%. The high predictive power for the prediction set shows the ability to indicate the cultivar of an unknown sample based on its fatty acid chromatographic fingerprint.


Introduction
Nuts are nutrient-dense foods, which can be part of a healthy diet. The potential of nuts consumption in the secondary prevention of heart diseases has been related to unsaturated fatty acids, vitamins, mineral constituents, and secondary metabolites, such as alkaloids, flavonoids, tannins, and anthraquinones [1]. Most epidemiologic studies have found that diets containing a high monounsaturated to saturated fat ratio as well as a polyunsaturated to saturated fat ratio may reduce serum cholesterol levels and consequently the risk of coronary artery disease [2]. The pistachio (Pistacia vera L.) is a nut with peculiar organoleptic characteristics. It is widely consumed as a fresh product, snack food, or ingredient of confectionery and some sausages. The pistachio consists of a shell-that is, a hard layer-surrounding the edible kernel of the nut, which also has a papery coat (skin). Oleic acid followed by linoleic and palmitic acid are the dominant components of glycerides in nuts [3]. In comparison to other edible nuts, pistachio has a higher content of monounsaturated fatty acids (MUFA) and a lower ratio of polyunsaturated to saturated fatty acids, which indicates the cholesterol-reducing potential, as well as a lower glycemic index (GI), which reduces the risk of diabetes [4][5][6][7].
Since the pistachio is a dry-climate tree, most pistachio production comes from countries with a warm, arid climate. Iran, the United States (USA), Turkey, Syria, Italy, Tunisia, and Greece the fingerprint approach in multivariate classification is that it does not require the identification of the individual components in a chromatographic profile. Peak areas or the entire profile can be used without the identification of the corresponding substances [31,32]. Nowadays, modern separation devices, such as high-performance liquid chromatography (HPLC), gas chromatography (GC), and capillary electrophoresis (CE) are able to produce fingerprints of compounds in highly complex samples [33]. These chromatographic fingerprints have already been recognized as powerful tools for identifying, classifying, and recognizing samples. This approach has been used in several application fields, such as the classification, detection of adulteration [15][16][17][18][19][20][21]34], food authentication, and the identification of the geographical origin of the food products [15,35].
The sensory stability of pistachio nut varieties during storage using descriptive analysis combined with chemometrics can help producers in the management of storage length, and more particularly, export circumstances [36]. In another work, the chemometric data of fatty acids and crude fat are used to characterize the varieties of coffee [37]. In this sense, this paper reports on the employment of gas chromatographic fatty acid fingerprints in combination with multivariate data analysis to classify pistachio from different cultivars in Iran. Our findings are on the differences in fatty acid composition amongst the studied cultivars, together with the proposal of a quality index based on the oleic/linoleic acid (O/L) ratio, since oleic acid is monounsaturated, and its higher levels contribute to a higher oxidative stability and a large shelf life.

Sample Collection
Pistachio samples were obtained from the Pistachio Research Center in Kerman, Iran. Five pistachio cultivars, which are commercially important worldwide and widely grown in Iran, were selected for this research. The cultivars are Akbari, Ahmada-Ghaie, Kalle-Ghouchi, Chrok. and Ohadi. The samples were obtained from Rafsanjan, which is located in the Kerman province, during the harvesting period of 2016. Two kg of pistachio was selected from each cultivar. The further sampling was followed a random selection of 30 pistachios from each cultivar. The pistachios were oven-dried at 30 • C for at least five days, and then stored in a refrigerator at 4 • C until further analysis.

Preparation of Pistachio Samples
The sample preparation process includes two steps: crude oil extraction and methyl esterification. Fatty acids were transformed into their methyl esters using methanolic sodium hydroxide solution in order to be analyzed by gas chromatography. The details of the process have been described elsewhere [16].

Chromatographic Conditions
A gas chromatograph (7890N series, Agilent Technologies, Santa Clara, CA, USA) with a flame ionization detector (FID) and split/splitless injector carried out the analysis of the fatty acid methyl esters (FAMEs). Separation was performed based on a DB-WAX fused silica capillary (30 m × 0.25 mm, 0.25-µm film thickness; ARudent J and W Scientific, Folsom, CA, USA). Hydrogen gas for FID was generated with a hydrogen generator (OPGU-1500S, Shimadzu, Kyoto, Japan) at a flow rate of 30 mL min −1 , the flow rate of air for FID was 350 mL min −1 , and the carrier gas was nitrogen, with a flow rate of 1.0 mL min −1 . The FID and injector temperatures were set at 220 • C and 250 • C, respectively. A volume of 1.0 µL of FAMEs, dissolved in petroleum ether, was injected directly into Molecules 2019, 24, 58 4 of 16 the gas chromatograph for analysis using a split ratio of 30:1. The "hot needle injection" technique was used in order to improve the repeatability. Oven temperature was maintained at 50 • C for one minute, and then programmed to 200 • C at a rate of 25 • C/min and then further increased at three • C/min to reach 230 • C, which was maintained for 13 min. Thus, the total time of one GC run was about 30 min. The peak areas of the FAMEs were determined by the ChemStation software, and then used for multivariate data analysis.

Multivariate Data Analysis
The dataset of the chromatograms of pistachio nuts was split into two: a calibration (100 samples) and a test set (50 samples). The samples from each cultivar, comprising 30 samples, were randomly divided into two categories: 20 samples for the calibration and 10 samples for the prediction set. The calibration set was used for unsupervised pattern recognition (PCA) and the development of the calibration models of LDA, while the test set then was used to establish the sensitivity and specificity of the models. In order to evaluate the effect of the different fatty acids, before applying data analysis, the peak areas were auto-scaled by subtracting sample means and then dividing the resulting difference by the corresponding standard deviations. PCA and LDA were implemented with an in-house program that was written in Matlab (version 6.5; Mathworks, Natick, MA, USA).

Principal Component Analysis (PCA)
PCA is a commonly applied linear transformation method that is used for dimension reduction, visualization, and the exploration of multivariate data. PCA is a systematic method for analyzing multivariate data by producing new orthogonal variables, which are called principal components, and are obtained as linear combinations of the original variables. PCA is usually able to describe most of the variation in the data in the first few components. This method is also an unsupervised classification technique, projecting multidimensional data into lower dimensions with a minimal loss of information. It is employed for understanding data patterns and anomaly detection.

Linear Discriminant Analysis (LDA)
LDA is a well-known supervised pattern recognition method that uses linear combinations of original variables to build a classifier model. The main aim of LDA is to find vectors by which the projection of points from the original space lead to maximum separation between the classes. The eigenvectors are obtained by maximizing the ratio of the between-group variance to the within-group variance. Simultaneously, the LDA method can be used for feature extraction, dimension reduction, and discrimination purposes. Figure 1 shows a typical GC chromatogram of FAMEs obtained from an Iranian pistachio sample. Gas chromatography mass spectrometry (GC-MS) was carried out to identify the fatty acid composition of pistachio kernel oil. Palmitic acid (C16:0) as saturated fatty acid, and oleic acid (C18:1) and linoleic acid (C18:2) as unsaturated fatty acids, are the major compounds in the chromatogram. Other types of fatty acids, including palmitoleic acid (C16:1), stearic acid (C18:0), linolenic acid (C18:3), arachidic acid (C20:0), and gondoic acid (C20:1) are found as minor compounds. Myristic acid (C14:0), Margaric acid (C17:0), and heptadecenoic acid (C17:1) were present in all of the samples in trace amounts, and they were not considered for further statistical analysis. According to Satil et al. [38], these minor acids do not exceed 0.5%. Table 1  The fatty acid percentages that were obtained in this work were close to other reports about Iranian pistachio [3,39]. acid (C14:0), Margaric acid (C17:0), and heptadecenoic acid (C17:1) were present in all of the samples in trace amounts, and they were not considered for further statistical analysis. According to Satil et al. [38], these minor acids do not exceed 0.5%. Table 1 shows the fatty acid composition of the 150 samples. The principal fatty acids were palmitic (8.35-10.53%), palmitoleic (0.45-0.88%), stearic (0.98-1.8%), oleic (60.71-69.84%), linoleic (18.37-27.39%), linolenic (0.26-0.39%), arachidic (0.12-0.19%), and gondoic acid (0.41-0.59%). The fatty acid percentages that were obtained in this work were close to other reports about Iranian pistachio [3,39].   Table 2 shows the range and the mean of fatty acids for the five pistachio cultivars. The results show that the sum of oleic and linoleic acids accounts for almost 80% of the total fatty acids detected in pistachio samples, which is also seen for other nuts, such as peanut [40]. As it can be seen in Table 2, in the varieties of Akbari, Kalle-Ghouchi, and Ohadi, the oleic acid content is Ohadi < Kalle-Ghouchi < Akbari, while for linoleic acid, this trend was Akbari < Kalle-Ghouchi < Ohadi. Our results are in agreement with those obtained by Roozban et al. [41], which determined oleic and linoleic acid in Akbari, Kaleghuchi, and Ohadi varieties. The same trend in oleic acid and linoleic acid content were reported as in our results. It is well-known that cultivar and environmental factors affect the composition, and consequently the price, of food from plants. For nuts, oil is one of the main derived products, and therefore, its quality and characteristics of its fatty acid profile are very important. The storage quality of nuts depends on the relative ratio of their saturated and unsaturated fatty acids [42]. The oxidative rancidity of most nut oils increases with increasing levels of polyunsaturated fatty acids. Therefore, the higher the unsaturation, the lower the quality of the oil. The ratio of oleic to linoleic acid (O/L), which is called the "quality index", is commonly used as a measure to predict the shelf life and stability of the oil. A higher O/L value represents greater chemical stability and longer shelf life. The quality index of the cultivars that are assayed in this work is reported in Table 2. The highest quality index of 3.66 corresponds to Akbari, and the lowest of 2.35 was recorded for Ohadi.

Results and Discussion
In the next step, the areas of the specified peaks in the chromatograms were used as input parameters for fingerprint analysis. The resulting two-dimensional matrix (150 samples and eight peak areas, 75 × 8) was used for subsequent supervised and unsupervised pattern recognition analysis. Auto-scaling was used as data pretreatment for all applied methods.

Unsupervised Pattern Recognition Analysis Using PCA
Principal component analysis was applied to the data matrix of the eight major fatty acids in five pistachio cultivars. PCA was performed on the standardized peak areas matrix to generate PCs comprising a new set eight orthogonal variables. The analysis showed that 47.2% of the total variation was explained by the first principal component, while 64.9% was explained by the first two, and 80.0% was explained by the first three. Figure 2a shows the scores of the first three principal components, illustrating the distribution of the different pistachio cultivars. As it can be seen, the different cultivars are discriminated rather clearly. These preliminary results demonstrate the feasibility of discriminating the Iranian pistachio varieties based on their fatty acid compositions.
In order to evaluate the relationship between parameters, the PC1-PC2 loading plot (Figure 2b) was also examined. A loading plot visualizes the weight of the variables on a given score. Variables that are far from the origin have the highest weight (loading). The correlation between two variables can be regarded simply as the cosine of the angle enclosed between their vectors. Positive correlations correspond to angles below 90 degrees, and negative correlations correspond to angles above 90 degrees, while uncorrelated variables show right angles (angles equal to 90 degrees).
(A) (B) Figure. 2. PC1-PC2-PC3 score plot from the matrix of the gas chromatographic peak areas of the fatty acid fingerprints from the pistachio samples (A), and PC1-PC2 loading plot from the matrix of the gas chromatographic peak areas' fatty acid fingerprints from the pistachio samples (B).

Supervised Pattern Recognition Analysis Using LDA
Linear discriminant analysis was applied to the above-mentioned peak areas in the chromatograms of the pistachio samples from five cultivars, in order to develop a mathematical model for their classification and identification. The LDA model was created using the training set consisting of 100 samples, while 50 samples were used as the test set to validate the predictive properties of the model. The calibration data matrix was obtained by recording the chromatograms of 100 oils (pistachio samples) and eight variables (peak areas in each chromatogram). The pistachio cultivars are the categorized dependent variables, while the independent variables include the peak areas of the fatty acids on the chromatograms.
A probability density function of the projected points [15,43], using the first, second, and the third projection vector, is shown in Figure 3A. As it is seen in Figure 3A (a), Akbari, Kalle-Ghouchi and Ohadi are separated based on the first LDA eigenvector. Figure 3A (b) illustrates that again Akbari, Kalle-Ghouchi, and Ohadi are discriminated based on the second eigenvector. Figure 3A (c) also shows that Ahmad-Aghaei and Chrok are separated. Therefore, it seems that the discrimination of cultivars would be feasible using three first eigenvectors. With this model, an acceptable distinction between all of the classes was obtained ( Figure 3B).  The cultivars of the calibration samples were found correctly for almost all of the samples based on leave-one-out cross-validation (99.0%). Furthermore, bootstrapping as a resampling method was also used to generate different distributions of predicted classification results and subsequently assess classification accuracies. Bootstrapping was performed by randomly selecting samples for individual subpopulations 50 times, including five samples for prediction in each step; the average accuracy of the corresponding method was 98.75%. The developed LDA model was subsequently applied to the data of the 50 test set samples in order to predict their cultivars. The detailed LDA classification results are listed in Table 3B. The classification accuracy, sensitivity, and specificity values were reported in order to evaluate the classification performance of the model. The classification sensitivity for each class, which was calculated as TP/(TP+FN) (TP: True Positive, FN: False Negative), and the specificity, which was calculated as TN/(TN + FP) (TN: True Negative, FP: False Positive) are tabulated in Table 3B. The model provided an average accuracy of 98.0%, while the average sensitivity and specificity were 98.0% and 99.6%, respectively. The high percentage of classification accuracy of the training data indicates a reasonable relationship between the fatty acid chromatographic data and the cultivars, while the high correct accuracy percentage for the prediction set demonstrates the ability of the developed model to indicate the cultivar of an unknown pistachio sample based on the fatty acid profile.

Conclusions
In this study, the potential of GC coupled to supervised and unsupervised pattern recognition methods for the traceability and classification of Iranian pistachio nuts based on their fatty acid composition was demonstrated. A classification model with high sensitivity was constructed to predict the cultivars of the samples, which were then evaluated by cross-validation and additionally by the external test set. To the best of our knowledge, the classification of these cultivars has not yet been reported yet. The outcome for this classification approach is very satisfying: the cultivars of the validation samples were classified correctly by LDA modeling in more than 98% of cases. The models that were built were very sensitive and highly specific for authenticating the provenance of pistachio nuts.