Tunisian Table Olive Oil Traceability and Quality Using SNP Genotyping and Bioinformatics Tools

To enhance and highlight the authentication and traceability of table olive oil, we considered the analysis of 11 Tunisian table olive cultivars based on seven SNP molecular markers (SOD, CALC, FAD2.1, FAD2.3, PAL70, ANTHO3, and SAD.1) localized in six different genes. Accordingly, we assessed the potential genotype-phenotypes links between the seven SNPs, on the one hand, and the quantitative and qualitative parameters, on the other. The obtained genotypes were analyzed with computational biology tools based on bivariate analysis, multinomial logistic regression, and the Bayesian networks modeling. Obtained results showed that PAL70 SNP marker was negatively influenced by the phenol rate (r = -0.886; p <0.001), the oxidative stability (r = -0.884; p <0.001), traducing a direct effect of the PAL70 genotype deviations on the proportion of total phenol for each variety. Additionally, we revealed a significant association of SAD.1 marker with the content of the linolenic unsaturated fatty acids (C18:3; p=0.046). Moreover, SAD.1 was positively correlated with the saturated stearic acid C18:0 (r = 0.644; p = 0.032) based on multinomial logistic regression and Bayesian networks modeling, respectively. This research work provides better understanding and characterization of the quality of Tunisian table olive and supplies a significant knowledge and data information for table olive traceability and breeding.


Introduction
In the Mediterranean basin countries, olive is one of the most important agricultural products. It is used for olive oil extraction or processed as either table olives. These latter are chosen from cultivated olive trees (Olea europaea L.) with regard to their size, volume, taste, and other organoleptic properties that make them suitable for table consumption.
According to the current data provided by the International Olive Oil Council [1], the world production of table olive is evaluated at 2,953,500 tons in the 2017/2018 season showing an impressive increase of 211 % in the global production of table olives in the period between 1991 and 2018. The most dramatic raises have been noted in Egypt, Turkey, Spain, Algeria, Greece, Argentina, Iran, and Morocco.
Initially, table olive production was restricted to the producing regions, mainly European Union, Egypt, Turkey, Algeria, Morocco, Argentina, and Syria. However, nowadays, table olive production and exports have extended to other countries like USA and Jordan with 6 and 5 tons, respectively [1]. In recent years, several countries such as Tunisia, Argentina, Jordan, and Morocco have enhanced their production of table olives compared to the previous season unlike some producer countries that remained constant or sustained a cutback, like Syria by 47 % and Peru by 1 % [1].
According to our previous results from studies performed on main world table olive varieties [2], the genetic diversity and distribution of table olive varieties are related to several qualitative and quantitative parameters. Additionally, biological and organoleptic markers together with computational biology tools could help characteristics determination of table olives and hence start resolving its authenticity. Moreover, this study highlighted that some varieties could be more suitable as olive oil cultivars than table olive consumption regarding their high yield and consistent oil fruit content (22%) [2]. For these reasons, it is crucial to develop strategies and procedures of traceability and authentication that allows rapid and relevant identification and then valorisation of cultivars. Generally, traceability, authenticity, and detection of fraudulency in olive oil are performed by analytical techniques. However, biochemical approach and analyses are not sufficient to assess olive oil authenticity due to the influence of environmental conditions on oil components [3][4][5].
More recently, the use of DNA molecule-based analyses in olive oil becomes of a great interest to meet the needs of consumers and will be essential for studying the traceability of olive oil because of their several advantages, particularly, the reliability and reproducibility of results.
Seven SNPs localized in five different genes: fatty acid desaturase, anthocyanidin synthase (ANS), calcium-binding protein, stearoyl-acyl carrier protein desaturase (SAD), and Lphenylalanine ammonia lyase. The FAD2.1 and FAD2.3 SNPs are both harboured by the FAD gene which is involved in the biosynthesis of highly unsaturated fatty acids (HUFA) from the precursor polyunsaturated fatty acids (PUFA) [6]. The third studied SNP, named ANTHO3, is localized in the anthocyanidin synthase gene, a 2-oxoglutarate irondependent oxygenase, and catalyzes the penultimate step in the biosynthesis of the anthocyanin class of flavonoids [7]. The CALC SNP is carried by the calcium-binding protein gene that is involved in response to abiotic constraints (salinity, cold, and drought) [8]. The fifth SNP localized in the Stearoyl-acyl carrier protein desaturase gene is involved in the desaturation of C18:0 to C18:1, monounsaturated oleic acid intermediates [9]. The sixth SNP, named SOD, is an insertion/deletion polymorphism type localized in Cu-Zn superoxide dismutase gene associated with the oxidative stress response [10,11]. The last SNP is the PAL70, located in the L-phenylalanine ammonia lyase gene that is implicated in phenolic biosynthesis, including the formation of flavonoids, lignin, and hydroxycinnamic acids [12].
Our study aims to assess the correlations between the seven SNPs and table olive oils quality parameters and their efficiency in the authentication and traceability of Tunisian table olive oil.
. . Olive Oil Extraction. Fully ripened fruits coming from different dual purpose and table Tunisian olive varieties served for olive oil extraction. Olive fruit samples were immediately after harvest carried and stored into the laboratory for further oil extraction. In order to obtain olive oil, 2.5 kg of stoned olives was grinded, and olive oil was extracted by mechanical press. Standard methods commonly used in oil factories were followed in the procedure of monovarietal oil extraction and obtention, including milling, mixing at 25 ∘ C for 30 min, and centrifugation for 3 min at 2000g and the final step for olive oil obtention was by natural decantation. Samples were stored at 4 ∘ C into dark glass bottles until analysis.
. . DNA Isolation. DNA extraction from olive oil was performed by using the QIAmp DNA tool mini kit (Qiagen) according to the protocol described by Ben Ayed et al. (2012) [12] with slight modifications. DNA quantification was carried out by spectrophotometry (Tecan GENIOS Plus spectrofluorometer) and with Hoechst H33258 dye incorporation. Dilution series of Lamda DNA (D150A Promega) were used with standard calibration. Finally, genomic DNA was diluted in TE buffer (10 mM Tris-HCl pH 8.1 mM EDTA pH 8) and stored at -20 ∘ C.
. . SNP Genotyping. We considered seven SNPs (FAD2.1, FAD2.3, ANTHO3, CALC, ACP1, SOD, and PAL70) in our study; all SNPs were selected in the coding regions of FAD , ANTHO, CALC, SAD , SOD, and PAL genes, all of them being involved in fruits pomology and associated with olive oil composition and therefore easily correlatable to phenotypic characters.
The SNP SOD (insertion/deletion type) was genotyped by a simple polymerase chain reaction followed by revelation through agarose gel electrophoresis, whereas the other six SNPs (FAD2.1, FAD2.3, ANTHO3, CALC, ACP1, and PAL70) were genotyped by a polymerase chain reactionrestriction fragment length polymorphism (PCR-RFLP) method ( Table 1). The PCR product (171 bp) of the SNP (ANTHO3) was digested by MspI restriction enzyme (Fermentas, LIFE SCIENCES) at 37 ∘ C overnight. This restriction enzyme recognizes the sequence AA/GG. The G-allele carrying PCR product was cleaved once by the enzyme generating two fragments (64 and 107 bp). The PCR product (476 bp) of SNP (CALC) was digested by BstZI restriction enzyme (Promega) at 50 ∘ C overnight. This restriction enzyme recognizes the sequence CC/GG. The C-allele carrying PCR product was cleaved once by the enzyme producing two fragments (316-160 bp). The two other SNPs (FAD2.1 and FAD2.3) were analyzed using PCR-RFLP. The PCR product 241bp of the SNP FAD2.1 and 240 bp of the SNP FAD2.3 were digested by BamHI restriction enzyme (Fermentas, Life Sciences) and Alw I, respectively, at 37 ∘ C overnight. The sizes of the restriction fragments of PCR product were 224 and 17 bp and 130 and 110 bp for CC genotype of FAD2.1 SNP and FAD2.3 SNP, respectively. The PCR product (330 bp) of SNP (SAD.1) was digested by TaqI restriction enzyme (Vivantis) at 65 ∘ C for 16 hours. This restriction enzyme recognizes the sequence CC/TT. The C-allele carrying PCR product was cleaved twice by the enzyme producing four fragments (263, 158, 105, and 67 bp). The PCR product (400 bp) of the SNP (PAL70) was digested by HinfI restriction enzyme (Fermentas, Life Sciences). The size of the restriction fragments of PCR product was 308, 52, and 40 bp for AA genotype of PAL70 SNP.
All digestion products were separated by electrophoresis on 3% Nusieve ethidium bromide-stained agarose gels and visualized under UV light. . . Statistical Analysis. The analysis of the correlation between SNP markers and the studied parameters was performed in different steps including numerous statistical methods. In the beginning, the Chi-square test was used to evaluate the differences between the classes of qualitative traits in allele and genotype frequencies. Subsequently, a student test was used for quantitative traits, to assess the significant difference between the means of genotype groups for each SNP. R software packages were used to study the association between SNP markers and quantitative and qualitative parameters. All tests were declared statistically significant when P values are <0.05. Thereafter, to study the relationship of the studied seven SNPs with quantitative traits, a variance multiway analysis was carried out. In addition, multinomial logistic regression was applied in order to test the associations of the seven SNPs with qualitative traits independently.
To draw the directed acyclic graph (DAG), we used the R language and the 'grow shrink' algorithm. The algorithm proficiently filters links out of a full skeletal DAG, in which all nodes are primarily connected (excluding those having no relationships with others), based on tests of conditional independence between a pair of nodes given all possible subsets of the rest. Logical rules are applied to create the direction of links (conditional dependence between variables), so that cycles are not introduced and patterns of conditional independence are found in the data match the generated DAG. We predicted association power in the final DAG by calculating approximately the beta-coefficient for a regression for each potential causal effect in which the variable at the base of the arrow ('cause') was considered a covariate, and the variable at the head of the arrow ('effect') was considered the outcome or dependent variable [14].

Results and Discussion
. . Genotyping and Characteristics of the SNP Markers. The observed heterozygosity for the studied SNP markers ranged from 0.363 (SAD.1) to 0.909 (FAD2.1) (0.545 average), whereas the expected heterozygosity ranged from 0.297 to 0.495 with an average of 0.492 indicating a high level of heterozygosity for all markers (Table 1).
Polymorphism Information Content (PIC) value was determined for the studied table olive cultivars.  [15]. SOD gene, for superoxide dismutase, encodes for an antioxidant enzyme that plays a pivotal role in protecting cells against superoxide radicals accumulation [16]. Phenylalanine ammonia lyase PAL70 catalyzes the reaction of trans-cinnamic acid formation via L-phenylalanine deamination and is therefore associated with phenolic compounds content in olives [17].
The highest discriminating power (DP) value 0.528 was shown for FAD2.3 marker. The mean value is 0.464. Using SSR markers, Reale et al. (2006) [18] and Muzzalupo et al. (2009) [19] obtained similar values, respectively, with DNA samples extracted from 65 olive cultivars and 39 Italian cultivars (0.38). However, the average values are lower than those found by Cipriani et al. (2002) [20] in 12 Italian cultivars (0.44) using SSR markers (0.71). Fatty acid desaturase gene which is involved in the biosynthesis of HUFA from PUFA precursor has been demonstrated to be associated with oleic/linoleic acid ratio content of olive oils from Tunisian olive oil cultivars [21].
The allele frequencies of the seven studied SNPs revealed a dominance of all markers, except for PAL70 whose two alleles displayed similar frequencies.
. . Association between SNP Polymorphisms and Olive Oil Quality Parameters. In order to illustrate the association between quality of the table olive cultivars and gene information, we applied the likelihood ratio test (LRT). Thus, a genome-wide association was carried out to identify table olive fruit quality susceptibility alleles. We studied 7 SNPs located in 6 genes for 11 table olive samples. Then, we evaluated the values of the LRT and Ki 2 tests. The results are summarized in Table 2 and demonstrated the absence of any significant associations between the seven SNPs (SOD, FAD2.1, FAD2.3, ANTHO3, CALC, SAD.1, and PAL70) genotypes and none of the qualitative traits is considered in this work. As shown in Table 3, significant associations between CALC SNP and one parameter which is palmitic acid (C16:0) were found. However, the average rate of C16:0 between the heterozygote varieties with CG-CALC and GG-CALC genotypes (p = . ) was significantly different. This positive association among CALC polymorphisms and C16:0 parameter suggests that the heterozygote varieties with CG genotypes produce, on average, higher levels of C16:0 than GG genotypes varieties. As shown in Table 5, this significant correlation is proved by multinomial logistic regression modeling ( = 0.039).
Besides, FAD2.3 SNP was found to be highly associated with three quantitative parameters, namely, acidity ( =0.024), rate of carotene (p= . ), and cholesterol content (p= . ) ( Table 3). The homozygous varieties (CC-FAD2.3) were the main genotypes concerned by these positive associations. FAD gene is known to be involved in the synthesis pathway of the unsaturated fatty acid [6], suggesting a direct effect of FAD2.3 genotypic variations on the rate of PUFA (such as C18:2 and C18:3) for each variety and hence influenced the acidity parameter. In fact, the homozygous varieties CC (Toffehi, Fakhari, and Fougi cultivars) have a higher acidity than the heterozygous varieties CG (other cultivars). Nevertheless, the cholesterol rate was significantly higher in the varieties carrying the homozygous genotype CC-FAD2.3 (particularly Toffehi with a cholesterol rate of 1.98, and Fakhari with 2.17), than the heterozygous genotypes (Table 3). Moreover, CC varieties contain more carotene pigment than the heterozygous genotypes. Moreover, the FAD2.3 SNP marker is significantly associated with the acidity and the cholesterol rate by using the analysis of variance ( =0.006, p< . ) (Table 4) and the multinomial logistic regression modeling ( =0.026, p< . ) (Table 5), respectively.
The study of ANTHO3 SNP led to the identification of two genotypes: AA and AG, with a level of AG-ANTHO3 heterozygosity of 63.63 %. A significant association was established with the rate of total sterols (p = . ), which has a higher average for heterozygous cultivars (AG) (representing 63.63% % of all samples) ( Table 3). This significant correlation is proved by multinomial logistic regression modeling ( = 0.04) ( Table 5).
Moreover, PAL70 SNP is clearly associated with 3 parameters, namely, chlorophyll, total phenol contents, and the oxidative stability (Table 3). However, a variability of the rate of chlorophyll pigment among the heterozygote varieties with AG-PAL70 and AA-PAL70 genotypes (p= . ) was noted. A positive correlation between the total phenol content and the genotype variation for this marker (p< . ) could also be observed, where the varieties with AA genotypes displayed the highest total phenolic content. Regarding the relationship with oxidative stability (p< . ), the homozygous varieties AA behaved with better oil stability than the heterozygous varieties AG. Moreover, the PAL70 SNP marker is significantly associated with the chlorophyll rate (p= . ) by using the analysis of variance (Table 4).
Additionally, multivariate analyses were used to study the association between olive oil parameters and the PAL70 SNP marker, showing an important significant association between this SNP marker and the acidity parameter (Tables  4 and 5). The values of this association were =0.004 and = 0.036 using the analysis of variance and the multinomial logistic regression modeling, respectively.
The association between total phenol rate and the PAL70 SNP marker is biologically relevant since the PAL70 marker is located in the L-phenylalanine ammonia lyase gene that is involved in the biosynthesis of phenylpropanoid compounds [12].
The relationship between the PAL70 SNP and the phenol level was assessed by Bayesian networks modeling. The derived DAG (directed acyclic graph) is shown in Figure 1 where directed arrows indicate the direction of 'causal' influence between variables. Three direct influences are identified: effect of PAL70 marker on the phenol rate, oxidative stability, and chlorophyll content. In fact, Figure 1 shows that the      PAL70 SNP was negatively influenced by the phenol rate (r=-0.886; p<0.001), the oxidative stability (r=-0.884; p<0.001), and the chlorophyll (r=-0.814; p=0.002). Furthermore, PAL70 node was not influenced by the sterol level (r=0.223; p=0.510).
The oxidative stability of the olive oil samples is directly influenced by the total phenol level. Besides, total phenol amount is directly influenced by the PAL70 marker. The latter plays a key role in the total phenol level of each of the olive oil varieties. This finding could be explained by the fact that PAL70 SNP is located within a gene involved in the phenolic biosynthesis (Balsa et al. 1979), suggesting the direct effect of the PAL70 genotype variations on the percentage of total phenol for each variety. For SAD.1 SNP study, two genotypes were identified: TT and CT. About 64 % of the varieties were homozygous TT-SAD.1 and including both two dual-use cultivars (Chemchali and Oueslati) and two table olive cultivars (Toffehi and Fakhari). Two significant associations of this marker were shown with the accumulation of the linolenic unsaturated fatty acids (C18:3; p= . ) and with the rate of carotene (p= . ) ( Table 3).
The relationship between the molecular marker SAD.1 and fatty acid composition was also analyzed by Bayesian networks modeling.
Firstly, 3 nodes were considered as represented in Figure 2. Pearson correlation coefficients among fatty acid compositions in olive oil varieties are presented in Table 6. Moreover, SAD.1 was positively influenced by the saturated stearic acid C18:0 (r = 0.644; p = 0.032).
SAD gene is known to be associated with the transformation of the saturated stearic fatty acid C18:0 to the monounsaturated oleic fatty acid C18:1, therefore, suggesting the direct effect of the SAD.1 genotype variations on the fatty acid content [6].

Conclusions
SNP genotyping is a valuable approach for marker assisted selection in crops. For this reason, we studied in this current work the correlations between the six SNPs and table olive oils quality parameters and their usefulness in the traceability of Tunisian table olive oil. We revealed that PAL70 SNP marker was negatively influenced by the phenol rate (r=-0.886; p<0.001) and the oxidative stability (r=-0.884; p<0.001). Besides, we reported a significant association of SAD.1 marker with the accumulation of the linolenic unsaturated fatty acid (C18:3; p= . ) and that SAD.1 was positively influenced by the saturated stearic acid C18:0 (r=0.644; p=0.032) based on multinomial logistic regression and Bayesian networks modeling, respectively. To the best of our knowledge, this is the first work that analyses the SNP markers of Tunisian table olive oil and the quality of the oil.

Data Availability
All data generated or analyzed during this study are included in this published article.

Ethical Approval
This paper complies with ethical standard.

Conflicts of Interest
The authors declare that they have no conflicts of interest regarding the publication of this article.