Combined Metabolome and Transcriptome Profiling Reveal Optimal Harvest Strategy Model Based on Different Production Purposes in Olive

Olive oil has been favored as high-quality edible oil because it contains balanced fatty acids (FAs) and high levels of minor components. The contents of FAs and minor components are variable in olive fruits of different color at harvest time, which render it difficult to determine the optimal harvest strategy for olive oil producing. Here, we combined metabolome, Pacbio Iso-seq, and Illumina RNA-seq transcriptome to investigate the association between metabolites and gene expression of olive fruits at harvest time. A total of 34 FAs, 12 minor components, and 181 other metabolites (including organic acids, polyols, amino acids, and sugars) were identified in this study. Moreover, we proposed optimal olive harvesting strategy models based on different production purposes. In addition, we used the combined Pacbio Iso-seq and Illumina RNA-seq gene expression data to identify genes related to the biosynthetic pathways of hydroxytyrosol and oleuropein. These data lay the foundation for future investigations of olive fruit metabolism and gene expression patterns, and provide a method to obtain olive harvesting strategies for different production purposes.


Introduction
Olive oil originated in the Mediterranean and is a widely recognized, high-quality, edible vegetable oil [1]. The consumption of olive oil has gradually increased over the past 10 years, even in some non-oil olive producing areas like Japan and Canada [2]. This increase in popularity is due in large part to its nutritional and health-promoting effects. Olive oil contains a high abundance of mono-and poly-unsaturated fatty acids that are more easily absorbed by humans [1,3]. Olive oil also contains many minor components that are beneficial to human health, including chlorophyll, polyphenols, and tocopherols. In particular, hydroxytyrosol, oleuropein, and squalene are substances in olive oil, which are otherwise very rare in plant oils and have great human health benefits owing to antioxidative capacity [4][5][6].
The quality of olive oils depends on many characteristics, including the specific cultivar, growth environments, and harvest times. Traditional harvesting of green fruits during the maturation period will reduce oil yields and increase the abundance of minor components, including polyphenols, while harvesting fully mature purple fruits will increase oil yields, but reduce polyphenol contents. Therefore, harvesting is typically conducted by mixing green fruits with semi-ripe or ripe fruits to ensure specific oil contents and polyphenol content characteristics [7,8]. Studies have examined the effects of harvest time on oil yields and qualities; however, studies on the effect of harvesting time on olive oil

Plant Material
Mature olive fruits including green, turning stage (semi-green/half-purple), and purple fruits (designated as F1, F2, and F3, respectively) were collected during the harvest season (15 October 2018) from 15-year-old olive trees (O. europaea L. cv. Frantoio) planted in experimental olive orchards at the Research Institute of Forestry of the Chinese Academy of Forestry in Longnan within the Gansu province. Samples were quickly frozen in liquid nitrogen and stored at −80 • C until further processing. Fruit pulps with six biological replicates were used for each sample in targeted metabolomics assays. Full-length Iso-Seq transcriptomics were conducted using a mixture of fruits at different maturity levels. Three biological replicates were used for each sample in Illumina RNA-seq transcriptomics.

GC/MS-and LC/MS-Based Targeted and Untargeted Metabolomic Analysis
GC/MS-based fatty acid analysis and GC/MS-based targeted metabolomics analysis were conducted using previously described methods [13]. For LC/MS-based untargeted polyphenol analysis, we used standards to quantify the 12 minor nutrients. First, 100 mg of olive pulp was placed in a 5 mL centrifuge tube with five steel balls. The samples were then frozen in liquid nitrogen for 5 min, and then pulverized for 1 min by a sample preparation system (TISSUELYSER, Shanghai jingxin, China). One milliliter of methanol at −20 • C was then added and followed by vortexing for 30 s and sonication for 30 min at room temperature. Chloroform (750 µL) was then added along with 800 µL of hydrogen peroxide (at 4 • C) and followed by vortexing for 1 min. The mixture was then centrifuged at 12,000 RPM for 10 min at 4 • C, followed by a transfer of 1 mL of the supernatant to a new 1.5 mL centrifuge tube. The supernatant was then concentrated with a vacuum centrifuge concentrator and dissolved in 250 µL of a 2-chlorophenylalanine (4 ppm)/aqueous methanol solution (1:1, 4 • C). The sample was then filtered through a 0.22 µm membrane filter to obtain the sample for LC-MS analysis. Chromatographic separation was conducted on a Thermo Ultimate 3000 system equipped with an ACQUITY UPLC ® HSS T3 (150 × 2.1 mm, 1.8 µm, Waters) column maintained at 40 • C. The temperature of the autosampler was set at 8 • C. Gradient elution of the analytes was conducted with 0.1% formic acid in water (C) and 0.1% formic acid in acetonitrile (D) or 5 mM ammonium formate in water (A) and acetonitrile (B) at a flow rate of 0.25 mL/min. Injection of 2 µL of each sample was conducted after equilibration. An increasing linear gradient of solvent B (v/v) was used as follows: 0-1 min, 2% B/D; 1-9 min, 2-50% B/D; 9-12 min, 50-98% B/D; 12-13.5 min, 98% B/D; 13.5-14 min, 98-2% B/D; 14-20 min, 2% D-positive mode (14-17 min, 2% Bnegative mode). LC/MS-based polyphenol measurements were conducted according to previous study [14].  The minor component abundances within the F1-F3 fruits were also investigated using targeted LC/MS metabolomics, yielding quantification of a total of 12 minor components such as polyphenols ( Figure 1B) (Table 1). Greener, less mature fruits have been previously shown to contain higher polyphenol contents [4]. However, our analyses revealed a lack of uniformity among the polyphenols within all samples that were analyzed. Maslinic acid was generally the dominant polyphenol acid component in all samples, and its content exceeded 1000 μg/g in all samples. Squalene, rutin, and luteolin all exhibited moderate contents among fruit samples, with concentrations ranging between 50 and 300 μg/g. However, these three metabolites varied among the F1 to F3 samples. In particular, the F2 samples contained ~270 μg/g higher squalene contents than those of F1. However, the squalene content in the F3 samples was ~100 μg/g lower than in the F2 samples. In contrast, rutin exhibited relatively consistent concentrations among all three  The minor component abundances within the F1-F3 fruits were also investigated using targeted LC/MS metabolomics, yielding quantification of a total of 12 minor components such as polyphenols ( Figure 1B) (Table 1). Greener, less mature fruits have been previously shown to contain higher polyphenol contents [4]. However, our analyses revealed a lack of uniformity among the polyphenols within all samples that were analyzed. Maslinic acid was generally the dominant polyphenol acid component in all samples, and its content exceeded 1000 µg/g in all samples. Squalene, rutin, and luteolin all exhibited moderate contents among fruit samples, with concentrations ranging between 50 and 300 µg/g. However, these three metabolites varied among the F1 to F3 samples. In particular, the F2 samples contained~270 µg/g higher squalene contents than those of F1. However, the squalene content in the F3 samples was~100 µg/g lower than in the F2 samples. In contrast, rutin exhibited relatively consistent concentrations among all three samples, while luteolin first decreased with increasing maturity and then increased. It is worth noting that tyrosol and hydroxytyrosol contents did not decrease with increasing fruit maturity, but rather significantly increased. Oleuropein exhibits antioxidant, anti-inflammatory, anti-atherogenic, anti-cancer, antimicrobial, antiviral, hypolipidemic, and hypoglycemic effects [16]. Oleuropein first exhibited increased concentrations with maturity stage, that later decreased ( Figure 1B). Summing all of the FAs or polyphenols that were detected provides an approximation of the total content for these two metabolites within fruits at different maturity levels. The total fatty acid content increased notably during fruit maturation, but the total polyphenol content in F2 was lower than in the F1 and F3 samples (Figure 2A,B). samples, while luteolin first decreased with increasing maturity and then increased. It is worth noting that tyrosol and hydroxytyrosol contents did not decrease with increasing fruit maturity, but rather significantly increased. Oleuropein exhibits antioxidant, anti-inflammatory, anti-atherogenic, anti-cancer, antimicrobial, antiviral, hypolipidemic, and hypoglycemic effects [16]. Oleuropein first exhibited increased concentrations with maturity stage, that later decreased ( Figure 1B). Summing all of the FAs or polyphenols that were detected provides an approximation of the total content for these two metabolites within fruits at different maturity levels. The total fatty acid content increased notably during fruit maturation, but the total polyphenol content in F2 was lower than in the F1 and F3 samples (Figure 2A,B). GC/MS-and LC/MS-based untargeted metabolomics analyses were used to obtain a more comprehensive evaluation of olive fruit metabolites at different maturity levels. A total of 181 metabolites were obtained and quantitatively analyzed. The metabolites were primarily classified as organic acids, polyols, amino acids, FAs, and sugars. Cluster and heat map analysis of the different metabolites and their respective abundances were then used to evaluate among-sample differences ( Figure 2C,D, Supplementary Figure S1). Eight phosphorylated compounds were identified and classified into two groups. One group comprised glucose 6-phosphate, fructose-6-phosphate, and sorbitol-6-phosphate, with highest contents in the F2 samples. The second group comprised fructose 1, 6-bisphosphate, pyrophosphate, phosphoric acid, monomethyl-phosphate, and myo-Inositol-1-phosphate, with F3 samples exhibiting the highest contents ( Figure 2D). These phosphorylated compounds could also be produced by the methanol extraction procedure [17]. A total of 13 sugars were quantified among all samples and exhibited two different distribution patterns. One group exhibited highest contents in the lowest maturity stage fruits (green fruits) and gradual decreases with increasing maturity, suggesting that these sugars are consumed during maturation and may be used as an energy source during fruit development. The other metabolite group increased with fruit ripening and exhibited highest contents in the matured purple fruits, suggesting that these sugars may be essential metabolites in fruit ripening ( Figure 2C). The polyol (Supplementary Figure S2a) and amino acid (Supplementary Figure S2b) contents were similarly distributed among fruits of different maturity stages, although purple fruit (F3) had the highest contents, while contents did not change between green and semi-green fruits (F1 and F2, respectively). GC/MS-and LC/MS-based untargeted metabolomics analyses were used to obtain a more comprehensive evaluation of olive fruit metabolites at different maturity levels. A total of 181 metabolites were obtained and quantitatively analyzed. The metabolites were primarily classified as organic acids, polyols, amino acids, FAs, and sugars. Cluster and heat map analysis of the different metabolites and their respective abundances were then used to evaluate among-sample differences ( Figure 2C,D, Supplementary Figure S1). Eight phosphorylated compounds were identified and classified into two groups. One group comprised glucose 6-phosphate, fructose-6-phosphate, and sorbitol-6-phosphate, with highest contents in the F2 samples. The second group comprised fructose 1, 6bisphosphate, pyrophosphate, phosphoric acid, monomethyl-phosphate, and myo-Inositol-1-phosphate, with F3 samples exhibiting the highest contents ( Figure 2D). These phosphorylated compounds could also be produced by the methanol extraction procedure [17]. A total of 13 sugars were quantified among all samples and exhibited two different distribution patterns. One group exhibited highest contents in the lowest maturity stage fruits (green fruits) and gradual decreases with increasing maturity, suggesting that these sugars are consumed during maturation and may be used as an energy source during fruit development. The other metabolite group increased with fruit ripening and exhibited highest contents in the matured purple fruits, suggesting that these sugars may be essential metabolites in fruit ripening ( Figure 2C). The polyol (Supplementary Figure S2a) and amino acid (Supplementary Figure S2b) contents were similarly distributed among fruits of different maturity stages, although purple fruit (F3) had the highest contents, while contents did not change between green and semi-green fruits (F1 and F2, respectively).

Transcriptomic Profiling with PacBio Full-Length and Illumina RNA-Seq
Olive fruits at different maturity stages were first mixed and then subjected to PacBio ISO sequencing. A total of 23.66 Gbp of clean data and 476,672 circular consensus (CCS) reads were obtained. Among these, 422,444 were full-length non-chimeric (FLNC) reads. The FLNC reads were clustered to obtain consensus sequences that were then polished to obtain a total of 43,748 high-quality consensus sequences. In addition, a total of 63.66 Gbp clean Illumina RNA-seq data were generated for nine samples (F1-F3 with three biological repeats), with the clean data (Q30 value greater than 91.09%) for each sample exceeding 6.44 Gbp (Supplementary Table S1). The second-generation transcriptomic data was then used to correct the Iso-Seq consensus sequences. After calibration and combination with high-quality consensus sequence for de-redundancy, a total of 19,958 transcript sequences were obtained.
The RNA-seq clean data from each sample were mapped to the reference olive genome [10], with mapping rates for all samples exceeding 91.56% (Supplementary Table S2). After combining the Iso-Seq and RNA-seq data, alternative splicing (AS) analysis was conducted on the whole gene dataset. Five major types of AS events were evaluated among transcripts [18]. A total of 202 AS events were observed; among these, intron retention was most prevalent (39.06%), followed by alternative 3 -splicing sites (31.88%), alternative 5 splicing sites (14.26%), exon skipping (13.96%), and mutually exclusive exons (0.84%) ( Figure 3A).
Olive fruits at different maturity stages were first mixed and then subjected to Pac-Bio ISO sequencing. A total of 23.66 Gbp of clean data and 476,672 circular consensus (CCS) reads were obtained. Among these, 422,444 were full-length non-chimeric (FLNC) reads. The FLNC reads were clustered to obtain consensus sequences that were then polished to obtain a total of 43,748 high-quality consensus sequences. In addition, a total of 63.66 Gbp clean Illumina RNA-seq data were generated for nine samples (F1-F3 with three biological repeats), with the clean data (Q30 value greater than 91.09%) for each sample exceeding 6.44 Gbp (Supplementary Table S1). The second-generation transcriptomic data was then used to correct the Iso-Seq consensus sequences. After calibration and combination with high-quality consensus sequence for de-redundancy, a total of 19,958 transcript sequences were obtained.
The RNA-seq clean data from each sample were mapped to the reference olive genome [10], with mapping rates for all samples exceeding 91.56% (Supplementary Table S2). After combining the Iso-Seq and RNA-seq data, alternative splicing (AS) analysis was conducted on the whole gene dataset. Five major types of AS events were evaluated among transcripts [18]. A total of 202 AS events were observed; among these, intron retention was most prevalent (39.06%), followed by alternative 3′-splicing sites (31.88%), alternative 5′ splicing sites (14.26%), exon skipping (13.96%), and mutually exclusive exons (0.84%) ( Figure  3A). Long non-coding RNA (LncRNA) sequences were predicted using widely used modeling approaches including coding potential calculator (CPC) analysis, coding-non-coding index (CNCI) analysis, Pfam protein domain analysis, and coding potential assessment tool (CPAT) analysis [19][20][21][22]. A total of 281, 713, 267, and 999 LncRNAs were identified using the CNCI, CPAT, CPC, and Pfam methodologies, respectively. A total of 129 lncRNA transcripts were predicted using all four of the methods above ( Figure 3B). The target genes corresponding to these 129 lncRNAs were predicted, and indicated that each lncRNA has several target genes (Supplementary Table S3).
The mapped reads were compared against the original genomic annotation of the reference olive genome to identify originally unannotated transcribed regions, identify Long non-coding RNA (LncRNA) sequences were predicted using widely used modeling approaches including coding potential calculator (CPC) analysis, coding-non-coding index (CNCI) analysis, Pfam protein domain analysis, and coding potential assessment tool (CPAT) analysis [19][20][21][22]. A total of 281, 713, 267, and 999 LncRNAs were identified using the CNCI, CPAT, CPC, and Pfam methodologies, respectively. A total of 129 lncRNA transcripts were predicted using all four of the methods above ( Figure 3B). The target genes corresponding to these 129 lncRNAs were predicted, and indicated that each lncRNA has several target genes (Supplementary Table S3).
The mapped reads were compared against the original genomic annotation of the reference olive genome to identify originally unannotated transcribed regions, identify new transcripts and genes of olive fruits, and supplement the original olive genome annotation. A total of 7998 new genes were identified in combination with new transcripts from AS sequences. The new genes were annotated using the COG, GO, Kyoto Encyclopedia of Genes and Genomes (KEGG), KOG, Pfam, Swissprot, eggnog, and Nr databases, result-ing in the annotation of a total of 7962 new genes (Supplementary Table S4). The majority of annotated genes corresponded to known nucleotide sequences of plant species with 5073 (63.87%) transcripts sharing homology to those from Sesamum indicum, followed by Erythranthe guttata (599, 7.54%), and Coffea canephora (405, 5.01%) ( Figure 3C).
To investigate differential expression of genes in fruits at different maturity stages, gene expression was normalized as fragments per kilobase of transcript per million (FPKM). In addition, Differential expression genes (DEGs) were detected based on criteria including a fold change of ≥2 and a false discovery rate (FDR) of <0.01. The green fruit (F1) and purple fruit (F3) samples exhibited the most DEGs, totaling 2820 DEGs with 1164 upregulated and 1656 down-regulated. The F2_F3 comparison yielded the fewest DEGs (780) with 557 up-regulated and 223 down-regulated, while the F1_F2 comparison yielded the highest with 1204 DEGs comprising 417 that were up-regulated and 787 that were downregulated ( Figure 4A,B). The DEGs were annotated using the public databases described above, with 1189, 2782, and 773 DEGs annotated using the Nr database for the F1_F2, F1_F3, and F2_F3 comparisons, respectively (Supplementary Table S5).
new transcripts and genes of olive fruits, and supplement the original olive genome annotation. A total of 7998 new genes were identified in combination with new transcripts from AS sequences. The new genes were annotated using the COG, GO, Kyoto Encyclopedia of Genes and Genomes (KEGG), KOG, Pfam, Swissprot, eggnog, and Nr databases, resulting in the annotation of a total of 7962 new genes (Supplementary Table S4). The majority of annotated genes corresponded to known nucleotide sequences of plant species with 5073 (63.87%) transcripts sharing homology to those from Sesamum indicum, followed by Erythranthe guttata (599, 7.54%), and Coffea canephora (405, 5.01%) ( Figure  3C).
To investigate differential expression of genes in fruits at different maturity stages, gene expression was normalized as fragments per kilobase of transcript per million (FPKM). In addition, Differential expression genes (DEGs) were detected based on criteria including a fold change of ≥2 and a false discovery rate (FDR) of <0.01. The green fruit (F1) and purple fruit (F3) samples exhibited the most DEGs, totaling 2820 DEGs with 1164 up-regulated and 1656 down-regulated. The F2_F3 comparison yielded the fewest DEGs (780) with 557 up-regulated and 223 down-regulated, while the F1_F2 comparison yielded the highest with 1204 DEGs comprising 417 that were up-regulated and 787 that were down-regulated ( Figure 4A,B). The DEGs were annotated using the public databases described above, with 1189, 2782, and 773 DEGs annotated using the Nr database for the F1_F2, F1_F3, and F2_F3 comparisons, respectively (Supplementary Table S5). Kyoto Encyclopedia of Genes and Genomes (KEGG) mapping analysis indicated that the DEGs primarily belonged to five pathways ( Figure 4C). Most DEGs were ascribed to carbon metabolism-related pathways including those of photosynthesis, carbon fixation in photosynthesizers, carbon metabolism, and porphyrin and chlorophyll metabolism. The F1_F2 and F1_F3 comparisons had the most DEGs involved in the above-mentioned pathways, while the F2_F3 comparison had the fewest DEGs. Thus, these results suggest that the green fruits (F1) may require higher expression of genes involved in carbon biosynthesis for fruit ripening. A similar abundance of DEGs was observed in sugar metabolism-related pathways as were identified in the carbon metab- Kyoto Encyclopedia of Genes and Genomes (KEGG) mapping analysis indicated that the DEGs primarily belonged to five pathways ( Figure 4C). Most DEGs were ascribed to carbon metabolism-related pathways including those of photosynthesis, carbon fixation in photosynthesizers, carbon metabolism, and porphyrin and chlorophyll metabolism. The F1_F2 and F1_F3 comparisons had the most DEGs involved in the above-mentioned pathways, while the F2_F3 comparison had the fewest DEGs. Thus, these results suggest that the green fruits (F1) may require higher expression of genes involved in carbon biosynthesis for fruit ripening. A similar abundance of DEGs was observed in sugar metabolismrelated pathways as were identified in the carbon metabolism-related pathways. These included the pentose phosphate pathway, pentose and glucoronate interconversions, glycolysis/gluconeogenesis, galactose metabolism, fructose and mannose metabolism, and starch and sucrose metabolism. Thus, green fruits may require higher expression of genes involved in energy metabolism in order to drive maturation. The most DEGs associated with fatty acid metabolism were enriched in the F1_F2 and F1_F3 comparisons, while the fewest DEGs were observed in the F2_F3 comparison group, indicating that genes associated with fatty acid biosynthesis reached stable levels in the half-green and purple fruits (F2). DEGs associated with flavonoid metabolism were mainly enriched in the F1_F2 comparison, which corresponded to samples that were green-colored, unripe fruits with high levels of polyphenol.
olism-related pathways. These included the pentose phosphate pathway, pentose and glucoronate interconversions, glycolysis/gluconeogenesis, galactose metabolism, fructose and mannose metabolism, and starch and sucrose metabolism. Thus, green fruits may require higher expression of genes involved in energy metabolism in order to drive maturation. The most DEGs associated with fatty acid metabolism were enriched in the F1_F2 and F1_F3 comparisons, while the fewest DEGs were observed in the F2_F3 comparison group, indicating that genes associated with fatty acid biosynthesis reached stable levels in the half-green and purple fruits (F2). DEGs associated with flavonoid metabolism were mainly enriched in the F1_F2 comparison, which corresponded to samples that were green-colored, unripe fruits with high levels of polyphenol.

Discussion
The optimal harvest time of olives is related to the cost of olive oil (oil yield) and oil quality (e.g., the distribution of unique metabolites in fruits including polyphenols). To obtain higher oil yields and higher qualities, the empirical strategy of harvesting that is

Discussion
The optimal harvest time of olives is related to the cost of olive oil (oil yield) and oil quality (e.g., the distribution of unique metabolites in fruits including polyphenols). To obtain higher oil yields and higher qualities, the empirical strategy of harvesting that is generally used is to mix green fruits and semi-green fruits. Few producers use fully ripened fruits (e.g., purple fruits) to produce olive oil because fully ripened fruit is thought to contain very low polyphenols that are detrimental to olive oil quality [8,27]. The metabolomics data in this study indicated that total polyphenol contents did not significantly differ between green and purple fruits, and that the polyphenol contents of both fruit types were higher than those of semi-green fruits ( Figure 2B). The combined use of green and semi-green fruits during harvesting can, indeed, achieve a high level of polyphenols. However, the total fatty acid content in purple fruits is significantly higher than in fruits that are not completely mature. Therefore, the empirical harvesting strategy indirectly increases the cost of producing olive oil.
The metabolomics data in this study indicated that the content of eight main fatty acids increased as the color became darker in olive fruits ( Figure 1A), and the purple fruits indeed have the highest total contents of FAs ( Figure 1B), suggesting that if producers want to increase the output of fatty acids, they should add more purple fruits, which is also consistent with the traditional harvesting method based on experience. The primary minor compounds hydroxytyrosol and oleuropein give extra-virgin olive oil its bitter, pungent taste. In addition, oleuropein exhibits pharmacological activities via antioxidant, anti-inflammatory, anti-atherogenic, anti-cancer, antimicrobial, antiviral, hypolipidemic, and hypoglycemic effects [16,28]. Oleuropein content increased from 15.18 µg/g in F1 to 22.32 µg/g in F2, and then decreased to 6.76 µg/g in F3, which had a similar or lower contents compared to previous studies [29][30][31] (Table 1). Therefore, if producers want to improve the quality of olive oil, from the perspective of total minor component content, green fruits and purple (ripe) fruits have the same effect. Thus, a good strategy to achieve the popular bitter and pungent tastes of olive oil is to extract oil from green and semigreen fruits rather than from ripe fruits. Squalene is a natural triterpene and one of the main components of skin surface lipids. It is a natural antioxidant and functions in skin hydration [32]. Squalene is also important in determining the quality of olive oil and can play a role in cancer prevention in humans [6,33,34]. The contents of squalene in green (F1), semi-green (F2), and purple fruits (F3) were 262.49 µg/g, 518.22 µg/g, and 425.14 µg/g, respectively. Thus, to obtain olive oil containing higher squalene abundances, an optimal harvesting strategy is to mix and harvest half-ripe and ripe fruits.
Generally speaking, different cultivars of olives would be planted in the same plantation so that the olives will mature in different periods, which can relieve the time pressure on harvest. Our research could help producers to determine the metabolites of different olive cultivars in the same plantation, and combine the actual production needs to give the best harvesting strategy. For example, we can mix cultivars with different maturity levels in order to achieve the best harvest ratio of olive fruits. Manual sorting of fruits of different colors will increase costs, but producers can use machines to sort fruits of different colors, which can greatly reduce labor costs. In addition to the pungent taste and bitterness mentioned above, olive oil also has pleasant flavors such as fruity, vegetable, and grassy flavors. Our research does have limitations in this regard. In the future, we can determine the source of these flavors by measuring aromatic or volatile metabolites, and further determine the best varieties and harvest ratios. Of course, our research only focuses on one olive cultivar, but we provide a set of harvesting strategy models, which will be of guiding significance for future research on the harvesting strategies of olive varieties.
Hydroxytyrosol and oleuropein are considered to be the most representative key nutrients in olive oil. In addition to giving olive oil special fragrance and taste, it can also promote human health. However, the biosynthetic pathway of hydroxytyrosol and oleuropein is unknown, and its biosynthesis mechanism cannot be analyzed at the molecular level, which is not conducive to promoting the transformation of the olive industry from traditional breeding to molecular breeding. This research used the combined Pacbio Iso-seq and Illumina RNA-seq gene expression data to identify genes related to the biosynthetic pathways of hydroxytyrosol and oleuropein. A total of 178 and 80 genes were identified in the hydroxytyrosol and oleuropein biosynthesis pathways, respectively. Although we have not fully analyzed the metabolic pathways of biosynthesis, our research can provide more abundant data for the subsequent study of these two biosynthetic pathways.

Conclusions
Here, we used combined metabolomics, full-length Iso-Seq and second-generation RNA-seq transcriptome approaches to investigate the relationships between metabolites and gene expression in olive fruits of different color that were collected during the same harvesting period. From these data, changes in fatty acid and polyphenol contents were measured in the fruits at different maturity stages, and a harvesting method was proposed that is optimal for different production purposes. Moreover, we identified putative genes related to the biosynthesis pathways of characteristic metabolites (i.e., hydroxytyrosol and oleuropein) in olive fruits. These comprehensive data and analyses lay the foundation for subsequent investigations of the relationship between metabolites and gene expression in olive fruits.
Supplementary Materials: The following are available online at https://www.mdpi.com/2304-815 8/10/2/360/s1, Figure S1: Heat map visualization of organic acids among olive fruits at different maturity stages, Figure S2: Heat map visualization of amino (a) acids and (b) polyol among olive fruits at different maturity stages, Figure S3: Heat map visualization of ALDH family genes among olive fruits at different maturity stages, Table S1: Data statistics for each sample of RNA-seq, Table S2: Genome mapping of each RNA-seq sample, Table S3: LncRNA target predicting, Table S4: New gene annotation statistic, Table S5: DEG annotation statistic.  Institutional Review Board Statement: Not applicable.

Informed Consent Statement: Not applicable.
Data Availability Statement: Sequence data generated in this study were deposited in GenBank under the bioproject accession numbers as follows: Pacbio Iso-seq data: SRR8692278; Illumina RNAseq data: the three F1 biological repeats-SRR8690139, SRR8690138, and SRR8690141; the three F2 biological repeats-SRR8690140, SRR8690135, and SRR8690134; and the three F3 biological repeats-SRR8690137, SRR8690136, and SRR8690133.