The Genetic Basis of Tomato Aroma

Tomato (Solanum lycopersicum L.) aroma is determined by the interaction of volatile compounds (VOCs) released by the tomato fruits with receptors in the nose, leading to a sensorial impression, such as “sweet”, “smoky”, or “fruity” aroma. Of the more than 400 VOCs released by tomato fruits, 21 have been reported as main contributors to the perceived tomato aroma. These VOCs can be grouped in five clusters, according to their biosynthetic origins. In the last decades, a vast array of scientific studies has investigated the genetic component of tomato aroma in modern tomato cultivars and their relatives. In this paper we aim to collect, compare, integrate and summarize the available literature on flavour-related QTLs in tomato. Three hundred and fifty nine (359) QTLs associated with tomato fruit VOCs were physically mapped on the genome and investigated for the presence of potential candidate genes. This review makes it possible to (i) pinpoint potential donors described in literature for specific traits, (ii) highlight important QTL regions by combining information from different populations, and (iii) pinpoint potential candidate genes. This overview aims to be a valuable resource for researchers aiming to elucidate the genetics underlying tomato flavour and for breeders who aim to improve tomato aroma.


Introduction
Tomato (Solanum lycopersicum L.) is one of the most important crops on the market, used worldwide as basis in many national traditional dishes [1]. Conventional wisdom suggests that breeding tends to reduce the genetic basis of a cultivated species, but tomato genetic diversity appears to have actually been enhanced in the last fifty years. A recent study [2] investigated the genetic diversity of cultivated tomato varieties in The Netherlands (NL) from 1950s to 2010s, observing that tomato commercial varieties from 1950s and 1960s were mainly homozygous, with narrow genetic variation among them. From 1970s onwards, genetic diversity in tomato has increased, thanks to the application of introgression breeding programs using wild relatives of tomato. The first genetic diversity boost appeared to take place with the introgression of tomato mosaic virus (ToMV), southern root-knot nematode (Meloidogyne incognita) and leaf mold disease (Cladosporium fulvum) resistance from Solanum peruvianum and Solanum pimpinellifolium [3][4][5]. These introgressions varied in size from~5% of the chromosome (introgression of Cf-4 and Cf-9 on top of Chr.01 for resistance to Cladosporium fulvum) up to half a chromosome 9 (ToMV), and together led to a significant increase in overall genetic diversity [2]. A second diversity boost, starting in the late 1980s affected both fruit size and quality traits: in particular the introgression of parts of Chr. 4, 5 and 12 from S. pimpinellifolium led to fruit size variation among cultivated varieties and the introduction of cherry, cocktail and large fruited varieties to the NL market [3].
Not only disease resistance and fruit size, but also flavour has been targeted by breeding programs in the last thirty years [2]. The main components of tomato fruit phenotypic data. As outlined in more detail below, advances in the development of highthroughput (HTP) molecular marker platforms, the availability of genomic information and progress in phenotyping methodologies, such as metabolomics, have led to an increasing importance and use of marker-assisted breeding strategies, such as QTL analysis, association mapping and genomic prediction in current plant breeding practice and transformed plant breeding into a hight-tech industry.
Marker-assisted introgression breeding: Tomato is highly autogamous, a characteristic that, together with the loss of many genes and alleles during domestication and crop improvement, led to a narrow genetic basis of cultivated tomato, compared with its 12 wild relative species [26]. For this reason, wild relatives have been used as potential sources of lost alleles in the development of new cultivated tomato varieties [27,28]. Marker-assisted selection in plant breeding programs relies on genetic linkage analyses, which are based on the principle of genetic recombination during meiosis. This allows the construction of linkage maps, composed of genetic markers linked to genes or Quantitative Trait Loci (QTL's) affecting traits of interest for a specific population. QTL analysis is mostly done using biparental segregating populations based on a cross of two contrasting genotypes. In order to discover and elucidate the genetic basis of agricultural traits, segregating populations have not only been made from intraspecific crosses, but also from interspecific crosses with various tomato wild relatives (S. pimpinellifolium, S. pennellii, S. lycopersicoides and S. habrochaites) in which genomic regions of the wild donor have been introgressed in the cultivated tomato genetic background, allowing the identification of potential new alleles for traits of interest [29][30][31][32][33]. Interestingly, researchers not only investigated Solanum lycopersicum wild relatives, but also Solanum lycopersicum var. cerasiforme L, the expected ancestor of the domesticated tomato [34].
The impact of genomics: In the last decades, the advances in genomics have provided new tools for discovering and tagging novel alleles and genes. The advent of next generation sequencing (NGS) techniques has considerably accelerated and simplified the genomewide detection of single-nucleotide polymorphisms (SNPs), which have become the most popular molecular markers. The development of the reference tomato genome from the inbred cultivar "Heinz 1706" [35] represented a milestone in the genomic era. The comparison of the cultivated tomato genome with the genome of a wild relative S. pimpinellifolium revealed the potential of high-throughput sequencing in comparative genetics, confirming the previously reported introgression of S. pimpinellifolium in the "Heinz 1706" genome [36] and the identification of thousands of SNPs between the two relatives. Further genome and transcriptome resequencing aimed at detecting genetic variation in tomato paved the way for the development of relatively universal genotyping platforms (i.e., SNP arrays) that can be applied for the genetic analyses of different populations-the SolCap SNParray [37,38] and the CBSG array [3]. A further progress in the application of NGS is represented by the genotyping-by-sequencing (GBS) approaches, based on the use of restriction enzymes to decrease genome complexity before sequencing. These techniques include over a dozen of reduced-representation sequencing (RRS) approaches [39] and have been recently applied for high-resolution QTL mapping in tomato, especially in interspecific crosses [40][41][42][43]. These protocols have been applied in the development of high-density genetic maps in many different species [44][45][46][47][48][49][50][51][52][53][54][55], making it possible to perform comparative and quantitative genetics in virtually any genomic background. The current advances in genome sequencing technologies, such as a revolutionary increase of sequencing throughput and concomitant reduction in costs per sequenced nucleotide, allowed to unravel the genetic variation in tomato to its full extent [25,56,57] and has helped to realize that the concept of one or a few reference genomes is not sufficient to fully understand the genetic control of traits. For this reason, the pangenome concept has been introduced in plant genomics, investigating the sum of genes that can be found in a specific species [58][59][60][61][62][63]. Specifically, the pangenome is defined as "the full complement of genes of a species, which can be partitioned into a set of core genes that are shared by all individuals and a set of dispensable genes that are partially shared or individual specific" [64]. In the Solanaceae family, the pangenomes of tomato and pepper (Capsicum spp.) have been recently released, identifying missing genes involved in resistance mechanisms and quality traits [26,65,66], but eggplant (Solanum melongena L.) and potato (Solanum tuberosum L.) are still pangenome-orphan species. The availability of the tomato pangenome allowed the identification of presence/absence variations (PAVs) and the identification of structural variants (SVs) of functionally important genes [26,65]. Interestingly, [26] identified a rare promoter allele for the TomLoxC gene, a lipoxygenase that has been reported to be crucial in C5 and C6 lipid-derived volatiles biosynthesis and apocarotenoid production [67,68]. Moreover, the newly developed tomato pangenome [65] resolved the genomic region flanking NSGT1, a functional gene associated with the production of guaiacol, methylsalicylate and eugenol, three phenylalanine-derived volatiles. Five different haplotypes were identified in the analyzed tomato germplasm, providing new insights in the understanding of the genetic variation of NSGT1. Pangenome appears to be the most novel tool that breeders and researchers have at their disposal: it may facilitate the mining of natural genetic variation and could contribute to crop improvement by supporting molecular breeding programs and gene function studies.
Linking genetic markers to traits: In the genomics era, novel genotyping techniques and the availability of multitudes of molecular markers, in combination with new highthroughput phenotyping technologies (i.e., phenotyping platforms), supported the development of new methodologies to link genetic markers to phenotypic traits. Not only the above-mentioned QTL mapping has become more precise, also association mapping and genomic prediction are now widely used in breeding [69][70][71][72]. Genome-Wide Association Studies (GWASs) are based on genotyping of a set of accessions representing the variability in a given species and rely on the linkage disequilibrium (LD) between a marker and its associated trait [73]. This technique has been applied in tomato, identifying interesting associations for many fruit quality traits [74][75][76][77][78][79]. Zhao et al. performed a meta-GWAS analysis, by combining datasets of several GWAS panels. This analysis not only led to confirmation of existing, but also to the discovery of novel QTLs and candidate genes for several flavour-related traits [77]. Genomic Prediction (GP) is a selection tool that makes use of genetic markers to predict the genetic potential of untested lines in breeding [80]. While QTL mapping and GWAS rely on the (statistically-significant) association between phenotypic variation and specific molecular markers, GP calculates the genetic potential of breeding candidates by the application of Bayesian or mixed statistical models that take all the genome-wide marker information into account to predict the phenotype. Genomic prediction is a selection tool rather than a research tool and performs better with traits that are controlled by a large number of small-effect QTLs which are hard to detect by QTL analysis of mapping populations or GWAS [81,82]. Genomic prediction is particularly useful for traits for which phenotyping is expensive, difficult or time consuming, since no phenotyping is needed for selection, once a good prediction model based on data of a representative training population is available. This technique has been widely applied in animal selection [83][84][85], while its practical application in plant breeding is still limited to major crops, such as maize and wheat [82,86,87], in which QTLs for important traits, such as yield, have already been fixed in the elite germplasm [88], or to tree crops, where early selection is very useful and cost-effective [89].
The combination of all the above-mentioned approaches, except GP, led to identification of a multitude of QTLs for many agronomic and quality traits. In this review we aim to collect, compare, integrate and summarize the available literature on flavour-related QTLs in tomato. We selected 16 scientific papers and supplemental data focusing on QTLs for tomato aroma and fruit quality. This not only provides an overview of the known flavour QTLs, but the combined and integrated information also makes it possible to (i) pinpoint potential donors described in literature for specific traits, (ii) highlight important QTL regions by combining information from different populations, and (iii) pinpoint potential candidate genes. This overview aims to be a valuable resource for researchers aiming to elucidate the genetics underlying tomato flavour and for breeders who aim to improve tomato aroma.

Construction of a Unified QTL Map of Tomato Aroma
A literature search was performed with the aim of collecting articles reporting QTLs for tomato aroma. In order to compare and integrate QTLs from different studies, the availability of marker information was an essential requirement for inclusion in this review. The identified QTLs have been organized and are available as Supplemental Material (Table S5), reporting the biosynthetic pathway, the QTLs Genomic Regions (QGR), the QTL's original name, the related compound, the chromosome, the correlated markers, their position in cM and bp, their p-value, their LOD score, the SolycID of the gene in which the markers have been found, the percentage of explained variation by the QTL, the effect, the donor parent, the crossing population or association panel used and the reference of the primary resource.
We identified 16 articles reporting QTLs for tomato aroma including marker information ( Table 2). In the pre-genomics era only genetic linkage positions (cM) of QTLs could be reported, since a reference genome sequence was not available the. This made it difficult to align QTLs of that period with the more recent studies utilizing modern genomics technologies. To circumvent this, the physical position of genetic markers was retrieved from the tomato genome (https://solgenomics.net/search/markers (accessed on 20 November 2020)) whenever marker sequence information was available, using the SL2.50 genome version. Two criteria were used to cluster QTL information from the different studies into QTLs Genomic Regions (QGRs) in a unified physical map ( Figure 1): (i) a biochemical relationship between aroma volatiles and their possible precursors, such as similar chemical structure or common biosynthetic pathway and (ii) overlapping of the QTLs. Since most of the studies only reported the most significant marker(s) of an identified QTL, while information on genetic confidence intervals or (average) linkage disequilibrium (LD) decay was lacking, it was virtually impossible to determine the size of QTL regions. For this reason, we standardized the potential positional error across the reported studies by setting an empirically defined window of ±2.5 Mb around each identified QTL. This window was derived from the average inter-study standard deviation of QTL positions of the three most functionally explored tomato aroma loci-floral aroma on chromosome 4 [25,90,99,102], smoky aroma on chromosome 9 [25,90,102] and the malodorous locus on chromosome 8 [92,93,99]. The major genetic factors underlying these QTLs were identified and, therefore, the dispersion of QTL positions in these different studies could most likely be attributed to non-genetic sources of variation. This window as well as resulting QGRs only serve as means to classify QTLs and indicate the possibility that the individual QTLs they harbor may be affected by one or a few co-localized genetic factors.
identified QTL. This window was derived from the average inter-study standard deviation of QTL positions of the three most functionally explored tomato aroma locifloral aroma on chromosome 4 [25,90,99,102], smoky aroma on chromosome 9 [25,90,102] and the malodorous locus on chromosome 8 [92,93,99]. The major genetic factors underlying these QTLs were identified and, therefore, the dispersion of QTL positions in these different studies could most likely be attributed to non-genetic sources of variation. This window as well as resulting QGRs only serve as means to classify QTLs and indicate the possibility that the individual QTLs they harbor may be affected by one or a few colocalized genetic factors. The identified QGRs were mined for the presence of candidate genes based on their annotation (ITAG2.40) and on their expression in tomato fruit, using publicly available dataset and tools [103]. Candidate genes were defined by two criteria: (1) genes belonging to families known from literature to be involved in VOCs biosynthesis and expressed in tomato fruit and (2) genes reported in literature with a demonstrated function in VOCs biosynthesis, irrespective of their expression in tomato fruit. A complete list of the potential candidate genes (with and without expression in the fruit tissues) can be found as Supplemental Material (Tables S1-S4). The genes which have been demonstrated to functionally underlie aroma QTLs in tomato were highlighted in bold. The identified QGRs were mined for the presence of candidate genes based on their annotation (ITAG2.40) and on their expression in tomato fruit, using publicly available dataset and tools [103]. Candidate genes were defined by two criteria: (1) genes belonging to families known from literature to be involved in VOCs biosynthesis and expressed in tomato fruit and (2) genes reported in literature with a demonstrated function in VOCs biosynthesis, irrespective of their expression in tomato fruit. A complete list of the potential candidate genes (with and without expression in the fruit tissues) can be found as Supplemental Material (Tables S1-S4). The genes which have been demonstrated to functionally underlie aroma QTLs in tomato were highlighted in bold.

Fatty Acids Derived Volatiles (FA VOCs)
The volatile compounds originating from the degradation of linolenic and linoleic acid accumulate during tomato ripening and can also markedly increase their emission upon fruit tissue disruption. They provide a note of freshly cut grass to the aroma bouquet [104]. These compounds are the most abundant volatiles in tomato fruit and are mainly represented by the C 5 volatile 1-penten-3-one, a few C 6 volatiles, such as 1-hexanol, (Z)-3-hexenal, (E)-2-hexenal and hexanal, the C 7 volatile (E)-2-heptenal and the C 10 VOC (E,E)-2,4-decadienal [8,11,16,17]. Although their high accumulation in ripe fruits may suggest that these compounds are very important determinants of tomato flavour, some studies provide evidence that impact of their quantitative variation on consumer liking may be limited [8,67], likely due to their in general high abundance and low odor thresholds, e.g., for (Z)-3-hexenal these values are 12,000 nLL −1 and 0.25 nLL −1 , respectively [21].

Biosynthesis of FA VOCs
During tomato fruit ripening, free fatty acids, mainly linolenic and linoleic acid, are derived from the catabolism of acylglycerides from disintegrating cellular membranes, by the action of lipases [87,88]. Linolenic and linoleic acid can be further catabolized by means of β-oxidation, α-oxidation, or the lipoxygenase pathway [68,[105][106][107]. In tomato fruit the latter is the most important for the production of volatiles, which occurs through two steps: (i) fatty acids are deoxygenated by means of lipoxygenases (LOX), which are classified as 13-LOX and 9-LOX and are leading respectively to 13-hydroperoxides and 9-hydroperoxides [67,108]; (ii) hydroperoxides are catabolized by means of hydroperoxide lyases (HPL), also classified as 13-HPL and 9-HPL, leading to an oxoacid and a volatile aldehyde. Volatile aldehydes can be converted into alcohols by means of alcohol dehydrogenases (ADH; [109][110][111][112][113][114]). According to the literature ( [68]), 13-LOX enzymes are mainly involved in the synthesis of (Z)-3-hexenal from linolenic acid and hexanal from linoleic acid. Among these enzymes, TomloxC has shown significant correlation with the production of hexanal, together with LeHPL, a 13-HPL [67,68,115]. Another gene, ADH2, has been reported as positively related to the production of hexanol and (Z)-3-hexenol [116] while ADH1 showed in vitro activity in the conversion of hexanal into hexanol [117]. Figure 2 summarizes the complete biosynthetic pathway of the lipid-derived VOCs, according to the available literature.

Fatty Acids Derived Volatiles (FA VOCs)
The volatile compounds originating from the degradation of linolenic and linoleic acid accumulate during tomato ripening and can also markedly increase their emission upon fruit tissue disruption. They provide a note of freshly cut grass to the aroma bouquet [104]. These compounds are the most abundant volatiles in tomato fruit and are mainly represented by the C5 volatile 1-penten-3-one, a few C6 volatiles, such as 1-hexanol, (Z)-3hexenal, (E)-2-hexenal and hexanal, the C7 volatile (E)-2-heptenal and the C10 VOC (E,E)-2,4-decadienal [8,11,16,17]. Although their high accumulation in ripe fruits may suggest that these compounds are very important determinants of tomato flavour, some studies provide evidence that impact of their quantitative variation on consumer liking may be limited [8,67], likely due to their in general high abundance and low odor thresholds, e.g., for (Z)-3-hexenal these values are 12,000 nLּ L −1 and 0.25 nLּ L −1 , respectively [21].

Biosynthesis of FA VOCs
During tomato fruit ripening, free fatty acids, mainly linolenic and linoleic acid, are derived from the catabolism of acylglycerides from disintegrating cellular membranes, by the action of lipases [87,88]. Linolenic and linoleic acid can be further catabolized by means of β-oxidation, α-oxidation, or the lipoxygenase pathway [68,[105][106][107]. In tomato fruit the latter is the most important for the production of volatiles, which occurs through two steps: (i) fatty acids are deoxygenated by means of lipoxygenases (LOX), which are classified as 13-LOX and 9-LOX and are leading respectively to 13-hydroperoxides and 9hydroperoxides [67,108]; (ii) hydroperoxides are catabolized by means of hydroperoxide lyases (HPL), also classified as 13-HPL and 9-HPL, leading to an oxoacid and a volatile aldehyde. Volatile aldehydes can be converted into alcohols by means of alcohol dehydrogenases (ADH; [109][110][111][112][113][114]). According to the literature ( [68]), 13-LOX enzymes are mainly involved in the synthesis of (Z)-3-hexenal from linolenic acid and hexanal from linoleic acid. Among these enzymes, TomloxC has shown significant correlation with the production of hexanal, together with LeHPL, a 13-HPL [67,68,115]. Another gene, ADH2, has been reported as positively related to the production of hexanol and (Z)-3-hexenol [116] while ADH1 showed in vitro activity in the conversion of hexanal into hexanol [117]. Figure 2 summarizes the complete biosynthetic pathway of the lipid-derived VOCs, according to the available literature.

QTLs for FA VOCs
Data collection identified a total number of 108 QTLs reported in 8 different studies and correlated with lipid volatiles biosynthesis (Table S1). Comparing these regions (see "Data Acquisition and Classification"), we identified 24 distinct QGRs ( Figure 1, Table S1).

FA VOCs' Candidate Genes
The collected data allowed the identification of 112 genes potentially involved in lipid VOCs biosynthesis (Table S1). Among them, 28 genes have been reported to be expressed in at least one fruit tissue (Table 3; [103]), or have been functionally characterized for their role in fatty acids metabolism. Eight lipoxygenases (LOXs) were identified as candidate genes expressed in tomato fruit. Among the identified LOXs, three have been reported for their association with lipid VOCs: LoxC (Solyc01g006540) has a major role in the biosynthesis of the most quantitatively prominent C5 and C6 lipid-derived VOCs, LoxF (Solyc01g006560) is involved in the production of fatty acid VOCs derived from 13-hydroperoxides [3,121]. Table 3. List of the genes identified as potentially responsible for the lipid VOCs QTLs reported in literature. Genes that have been functionally characterized are highlighted in bold in the table. The last column reports whether the gene has been reported to be expressed in fruit (Y/N- [103]), for a complete overview of the gene expression see Table S1. Eleven alcohol dehydrogenases (ADH) were identified. ADHs are a family of enzymes associated with the interconversion of the aldehyde and alcohol forms of lipid volatiles in tomato and they have been reported to accumulate in the fruit during ripening [6,116]. Among them, Solyc06g059740 has been characterized in tomato fruit as ADH2 [116,122,123]. Solyc11g071290 was identified in the LIP21 QGR, a QTL for the earthy/mushroom odor type volatile 1-octen-3-one-which, unlike the major C6 VOCs, has rarely been suggested as an important fresh tomato fruit odorant, has a much lower concentration than C6 VOCs, but has an extremely low odor threshold of 0.005 nLL −1 [124]. A structural variation in the promoter of this gene was reported in [26]. The wild allele was present in S. pimpinellifolium and S. cerasiforme, but was not found in heirloom tomatoes, suggesting selection against the wild allele during domestication. Although this gene showed a significant expression level in one fruit sample only (S. pimpinellifolium fruit at 4 DPA), we cannot exclude that allelic differences in this gene may be responsible for the 1-octen-3-one QTL identified by GWAS [99].

Biosynthesis of BCAA VOCs
Even though the relationship between BCAA VOCs and the tomato aroma is clear, the exact molecular mechanism underlying their quantitative variation in tomato fruit is not fully understood. In general, BCAA VOCs originate from the branched-chain amino acid (BCAA) pathway in many organisms including plants. BCAA biosynthesis has been studied well in plants and it takes place in the chloroplast where leucine and valine are synthesized from pyruvate and isoleucine from threonine via several enzymatic reactions (Figure 3). Catabolism of BCAAs, which occurs in mitochondria is believed to be the source of BCAA VOCs in tomato fruit. It has been suggested that the first step in the catabolic pathway leading to BCAA VOCs is the reversible conversion of branched-chain amino acids (leucine, isoleucine) into their corresponding α-ketoacids by means of branchedchain amino acid aminotransferases (BCATs; [125]). The importance of BCATs in the degradation of BCAAs has been demonstrated in Arabidopsis [126]. In tomato different members of the BCAT family have been shown to mediate either synthetic or catabolic reactions of BCAAs [94,[127][128][129]. The products of the reversible BCAT-mediated BCAA deamination-α-ketoacids-have been suggested to be the likely precursors for BCAA VOCs [128], which then could be produced through the combined action of various classes of candidate enzymes (Figure 3), such as α-ketoacid dehydrogenases, decarboxylases and alcohol/aldehyde dehydrogenases [130].

QTLs for BCAA VOCs
Data collection identified a total number of 129 QTLs reported by seven different authors and correlated with BCAA volatiles biosynthesis (Figure 1; Table S2). These QTLs were classified into 26 distinct QGRs (Table S2).

QTLs for BCAA VOCs
Data collection identified a total number of 129 QTLs reported by seven different authors and correlated with BCAA volatiles biosynthesis ( Figure 1; Table S2). These QTLs were classified into 26 distinct QGRs (Table S2).

BCAA VOC Candidate Genes
Table S2 reveals 75 genes that were identified as potentially involved in the BCAA VOCs metabolism, 28 of which have been reported to be expressed in at least one fruit tissue [103]. An overview of the 30 selected candidate genes is shown in Table 4, including genes expressed in fruits plus two functionally characterized genes (BCAT2 and BCAT4; see below).
Among the genes identified as expressed in tomato fruit, ten belong to enzymatic families that have been associated with BCAA biosynthesis: four genes were annotated as pyruvate dehydrogenases (PDH), three as 3-isopropylmalate dehydratases (IPMD), two as ketol-acid reductoisomerase (KARI) and one as 2-isopropylmalate synthase (IPMS). Reverse genetics analysis of Solyc06g060790 (IPMD) revealed that this gene influences the BCAA content in tomato fruit, while a similar analysis failed to support such a conclusion in case of Solyc07g053280 (KARI) [94]. Branched chain amino acid aminotransferases (BCATs) can be involved in the last step of BCAA anabolism and/or in the first step of BCAA catabolism depending on their subcellular localization (chloroplast or mitochondrion, respectively). Six BCAT genes were reported as candidate genes in Table 4, of which five (BCAT1, 2, 3, 4 and 7) have been functionally characterized [112,114]. Although BCAT2, 4 and 7 were located outside the QTL intervals, they were included in this review for completeness. BCAT1 and BCAT2 were shown to be located in mitochondria and involved in the catabolism of BCAAs, while BCAT3 and BCAT4 were shown to be located in chloroplasts and involved in BCAA biosynthesis. In turning and ripe fruits of cv. M82 BCAT1 expression was up to 10-fold higher compared to BCAT2, 3 and 4. Although BCAT2, 3 and 4 showed low, but detectable expression in fruits, their expression was much higher in leaves (BCAT2 and 3) or inflorescence (BCAT4) in this tomato background [127]. According to the RNAseq data present in the TomExpress database [103], these three genes are hardly (BCAT3) or not at all expressed in tomato fruits (BCAT2 and 4). Although its subcellular location is unclear, BCAT7 was proposed to play a role in BCAA degradation [128]. Finally, we identified eleven alcohol dehydrogenases (ADH) expressed in tomato fruit in the BCAA QGRs. These may play a role in BCAA catabolism, although their functional characteristics remain to be demonstrated.
The available tomato pangenome [26] was investigated for the presence of nonreference (non cv. Heinz) promoter regions for the abovementioned genes. Interestingly, a non-reference allele was reported for Solyc04g063350, annotated as 3-methyl-2oxobutanoate dehydrogenase, an enzymatic class that has been described as involved in the BCAA catabolism, which involves the decarboxylation of branch chain amino acids [94,129,131]. This gene has recently been named FLORAL4 after both genetic and functional studies revealed that its involvement was not restricted to BCAA catabolism, but this gene also controlled the quantitative variation of floral phenolic-derived VOCs derived from catabolism of the aromatic amino acid phenylalanine (see below; [102]). Among the investigated tomato varieties, the cultivars harboring the "wild" allele showed significantly lower gene expression than the ones presenting the domesticated promoter, suggesting a positive selection for FLORAL4 expression during tomato domestication [26]. Table 4. List of the genes identified as potentially responsible for the BCAA VOCs QTL reported in literature. Genes that have been functionally characterized are highlighted in bold in the table. The last column reports whether the gene has been reported to be expressed in fruit (Y/N- [103]); for a complete overview of the gene expression see Table S2.

Biosynthesis
Apocarotenoid volatiles are produced in plastids [104] through the cleavage of carotenoids by carotenoid cleavage dioxygenases, like LeCCD1A and LeCCD1B [133], which are particularly expressed during fruit ripening [132]. This family of genes has been reported in other species to act both on cyclic carotenoids (at the 9 ,10 position) and open-chain carotenoids-at the (5 ,6 ), (7 ,8 ), or (9 ,10 ) positions-, leading to the production of the carotenoid VOCs (Figure 4; [133][134][135][136][137][138][139]). Although many structural genes in the carotenoid biosynthetic and cleavage pathway are known, the regulation of the carotenoid pathway is still unclear. In this respect it has been proposed that the loss of membrane integrity during the ripening-dependent conversion of chloroplasts into chromoplasts may be a key mechanism in their regulation, as this process may lead to the release of the carotenoids in the cytoplasm, were they will react with cytoplasmatic cleavage enzymes [15,140]. The available knowledge of the carotenoid biosynthetic and cleavage pathway and the underlying biosynthetic genes (Figure 4) is very helpful for the identification of candidate genes within QTLs for carotenoid-derived VOCs in tomato fruit.

Biosynthesis
Apocarotenoid volatiles are produced in plastids [104] through the cleavage of carotenoids by carotenoid cleavage dioxygenases, like LeCCD1A and LeCCD1B [133], which are particularly expressed during fruit ripening [132]. This family of genes has been reported in other species to act both on cyclic carotenoids (at the 9′,10′ position) and openchain carotenoids-at the (5′,6′), (7′,8′), or (9′,10′) positions-, leading to the production of the carotenoid VOCs (Figure 4; [133][134][135][136][137][138][139]). Although many structural genes in the carotenoid biosynthetic and cleavage pathway are known, the regulation of the carotenoid pathway is still unclear. In this respect it has been proposed that the loss of membrane integrity during the ripening-dependent conversion of chloroplasts into chromoplasts may be a key mechanism in their regulation, as this process may lead to the release of the carotenoids in the cytoplasm, were they will react with cytoplasmatic cleavage enzymes [15,140]. The available knowledge of the carotenoid biosynthetic and cleavage pathway and the underlying biosynthetic genes ( Figure 5) is very helpful for the identification of candidate genes within QTLs for carotenoid-derived VOCs in tomato fruit.

QTLs for Apocarotenoid VOCs
Data collection identified a total number of 42 QTLs reported in 8 different articles and correlated with apocarotenoid volatiles biosynthesis (Figure 1; Table S3). Comparing these regions (see "Data Acquisition" paragraph for methodology), we identified 19 distinct QGRs (Table S3).
Phytoene synthase 2 (PSY2-Solyc02g081330) represents an exception to our candidate gene mining approach. This gene has been annotated at 45Mb on chr 2 and is located between QGRs APO3-4. Although this gene is not comprised among the regions identified by this review, its activity has been associated with carotenoid production in fruit in other Solanaceae [170]. For this reason, it has been proposed here as potential candidate gene for tomato apocarotenoid VOCs. Table 5. List of the genes identified as potentially responsible for the carotenoid VOCs QTL reported in literature. Genes that have been functionally characterized are highlighted in bold in the table. The last column reports whether the gene has been reported to be expressed in fruit (Y/N- [103]); for a complete overview of the gene expression see Table S3. A search for reported wild promoter regions [26] was performed for the abovementioned candidate genes, identifying Solyc03g114340 (DXR-1-deoxy-D-xylulose-5phosphate reductoisomerase) as the only candidate gene showing a wild allele for its promoter region in the available tomato pangenome. This non-reference allele consists of a 645 bp promoter located at 582 bp upstream of DXR, and has been associated with an occurrence frequency of 0.58 in the Solanum pimpinellifolium L. accessions investigated by [26]. On the other hand, its presence in the Solanum lycopersicum var. cerasiforme and S. lycopersicum heirlooms has been reported to be rare, with an occurrence frequency of 0.05 and 0.02 respectively. The cultivars presenting the wild allele showed a significantly higher expression of Solyc03g11434 compared to the ones harboring the common allele, suggesting a selection against a higher expression of DXR during tomato domestication.

Phenylalanine-Derived Volatiles (Phe VOCs)
Phenolic and phenylpropanoid volatiles originate from the catabolism of phenylalanine. Phenolic VOCs include compounds such as phenylacetaldehyde, 2-phenylethanol, 1-nitro-2-phenylethane and 2-phenylacetonitrile (benzylnitrile; all floral odor type) which have been reported to affect consumer liking of tomato fruit, [8,12,16,17]. However, the effect of these phenolic compounds on flavour and consumer preference seems not easy to predict, since some studies show positive while others show negative effects of these compounds on consumer liking [13,92,172]. This apparent inconsistency may be caused by differences in the concentrations of these compounds in the tomato materials studied. The main phenylpropanoid VOCs in tomato are guaiacol, methylsalicylate and eugenol. They are associated with a smoky, pharmaceutical aroma and are generally considered as off-flavours.

Biosynthesis of Phe VOCs
Phenolic volatiles (C 6 -C 2 ) have been reported to have a high impact on tomato aroma [11]. Their biochemical pathway starts with decarboxylation of phenylalanine, leading to the production of phenylethylamine [93]. According to the proposed phenolic volatile biosynthetic pathway in tomato, phenylethylamine is the precursor for the synthesis of the two nitrogen-containing volatiles nitrophenylethane and benzylnitrile, as well as the production of phenylacetaldehyde. Extremely high phenylacetaldehyde levels were found in tomato introgression lines carrying the malodorous locus from S. pennellii on chromosome 8 [92]. The high phenylacetaldehyde production in this line was associated with the expression of AADC1A, AADC1B, and AADC2 located in the malodorous QTL region [93,173,174]. Transgenic approaches revealed that this family of genes was capable of decarboxylating phenylalanine, leading to phenylethylamine, the direct precursor of phenylacetaldehyde. The subsequent deamination of phenethylamine to produce phenylacetaldehyde has been reported to be related to an amine oxidase. Finally, 2-phenyethanol is produced from phenylacetaldehyde by means of two reductases, PAR1 and PAR2 [175][176][177]. These enzymes are not only reducing phenylacetaldehyde, but they are also able to catalyze the reduction of benzaldehyde and cinnamaldehyde to their respective alcohols as well [175]. More recently, FLORAL4 (Solyc04g063350-3-methyl-2-oxobutanoate dehydrogenase-has been fine mapped in a diversity panel of cultivated contemporary tomato varieties and tomato RIL populations, and associated with the floral phenolic volatiles accumulation in tomato fruit [102]. Based on the protein sequence FLORAL4 belongs to the mitochondrial 2-oxoisovalerate dehydrogenase/decarboxylase enzyme family which is involved in the catabolism of BCAAs and constitutes the E1 subunit of the BCKDC complex, catalyzing the decarboxylation of the BCAA deamination products in plants [131]. A complete knock-out of the FLORAL4 gene by CRISPR-Cas9-mediated gene editing in tomato plants led to a major depletion of the phenylalanine-derived volatiles as well as a notable depletion of BCAA VOCs. This suggests involvement of FLORAL4 in both BCAA and PHE VOC metabolism, possibly via decarboxylation of the corresponding amino or keto acids. As mentioned above, phenylpropanoid volatiles (C 6 -C 3 ) are the second group of phenylalanine derived VOCs. Although their biosynthesis in tomato has not been fully characterized, results from other species suggest that phenylpropanoid volatiles are derived from intermediates of the lignin pathway [178][179][180][181]. For example, eugenol has been reported in other species to be produced from coniferyl acetate by means of a eugenol synthase [182]. The mechanisms of methyl salicylate and guaiacol biosynthesis have been investigated in tomato [11]. Methyl salicylate can be produced from salicylic acid by means of SlSAMT1, an O-methyltransferase [183], while guaiacol may be produced from catechol, by means of the catechol-O-methyltransferase CTOMT1 [184]. Although CTOMT1 could not be connected to any of the QTL regions found, it has been included for completeness. Furthermore, the conjugation of these three compounds after their biosynthesis has been reported to be linked to the activity of two classes of enzymes: glycosyltransferases and glycosyl hydrolases. Among these classes of enzymes, NSGT1 was shown to prevent the wound-induced release of the smoky aroma associated with phenylpropanoid VOCs in ripening tomato fruit [185]. Figure 5 summarizes the metabolic reactions described above. and PAR2 [175][176][177]. These enzymes are not only reducing phenylacetaldehyde, but they are also able to catalyze the reduction of benzaldehyde and cinnamaldehyde to their respective alcohols as well [175]. More recently, FLORAL4 (Solyc04g063350-3-methyl-2oxobutanoate dehydrogenase-has been fine mapped in a diversity panel of cultivated contemporary tomato varieties and tomato RIL populations, and associated with the floral phenolic volatiles accumulation in tomato fruit [102]. Based on the protein sequence FLORAL4 belongs to the mitochondrial 2-oxoisovalerate dehydrogenase/decarboxylase enzyme family which is involved in the catabolism of BCAAs and constitutes the E1 subunit of the BCKDC complex, catalyzing the decarboxylation of the BCAA deamination products in plants [131]. A complete knock-out of the FLORAL4 gene by CRISPR-Cas9mediated gene editing in tomato plants led to a major depletion of the phenylalaninederived volatiles as well as a notable depletion of BCAA VOCs. This suggests involvement of FLORAL4 in both BCAA and PHE VOC metabolism, possibly via decarboxylation of the corresponding amino or keto acids. As mentioned above, phenylpropanoid volatiles (C6-C3) are the second group of phenylalanine derived VOCs. Although their biosynthesis in tomato has not been fully characterized, results from other species suggest that phenylpropanoid volatiles are derived from intermediates of the lignin pathway [178][179][180][181]. For example, eugenol has been reported in other species to be produced from coniferyl acetate by means of a eugenol synthase [182]. The mechanisms of methyl salicylate and guaiacol biosynthesis have been investigated in tomato [11]. Methyl salicylate can be produced from salicylic acid by means of SlSAMT1, an O-methyltransferase [183], while guaiacol may be produced from catechol, by means of the catechol-O-methyltransferase CTOMT1 [184]. Although CTOMT1 could not be connected to any of the QTL regions found, it has been included for completeness. Furthermore, the conjugation of these three compounds after their biosynthesis has been reported to be linked to the activity of two classes of enzymes: glycosyltransferases and glycosyl hydrolases. Among these classes of enzymes, NSGT1 was shown to prevent the wound-induced release of the smoky aroma associated with phenylpropanoid VOCs in ripening tomato fruit [185]. Figure 5 summarizes the metabolic reactions described above. Figure 5. Overview of the phenolic and phenylpropanoid VOCs pathways, adapted from [11]. Figure 5. Overview of the phenolic and phenylpropanoid VOCs pathways, adapted from [11].

QTLs for Phe VOCs
Data collection identified a total number of 81 QTLs reported by eight different authors and correlated with phenylalanine-derived volatiles biosynthesis ( Figure 2; Table S4). Comparing these regions (see "Data Acquisition" paragraph for methodology), we identified 24 distinct QGRs (Table S3).

Phe VOCs' Candidate Genes
Seventy-five genes were identified as potentially associated with phenylalanine derived VOCs, including all candidate genes present in the QGRs plus the above-mentioned CTOMT1 (Solyc10g005060 -Table S4). Among them, twenty-one have been reported to be expressed in at least one fruit tissue or were characterized for their role in VOCs metabolism ( Table 6 and Table S4; [103]). In addition, the above-mentioned NSGT1, present in QGR PHEN17, was also included in Table 6. NSGT1 expression in tomato fruit was reported by [185], but this gene has not been predicted by the tomato genome and its genomic organization has only recently been resolved [65], and hence gene expression data for this gene could not be retrieved from the TomExpress database. Table 6. List of the genes identified as potentially responsible for the phenylalanine-derived VOCs QTL reported in literature. Genes that have been functionally characterized are highlighted in bold in the table. The last column reports whether the gene has been reported to be expressed in fruit (Y/N- [103]); for a complete overview of the gene expression see Table S4. Among the genes identified as expressed in tomato fruit, one was annotated as 3-methyl-2-oxobutanoate dehydrogenase (Solyc04g063350), the above-mentioned FLO-RAL4 gene and was demonstrated to be the causal gene for the variation in phenolic VOCs in the PHEN7 QGR on chromosome 4.
One gene (SlFMO1-Solyc12g013690) was annotated as flavin-dependent monooxygenases, a class of enzymes that was shown to catalyze the hydroxylation of aromatic compounds in prokaryotes and plants [174][175][176]. This gene was suggested to play a role in the synthesis of nitrogenous phenolic volatiles, like 2-phenylacetonitrile and 1-nitro-2phenylethane in tomato [186], although this needs experimental confirmation. Furthermore, we identified one primary amine oxidase (Solyc08g079430) that may potentially be involved in the production of 2-phenylacetaldehyde from phenethylamine [93].
We identified five phenylalanine ammonia-lyase (Solyc09g007890, Solyc09g007900, Solyc09g007910, Solyc09g007920, and Solyc10g086180) genes expressed in tomato fruit. These enzymes have been reported to be responsible for the first key step in phenylpropanoid metabolism, catalyzing the conversion of l-phenylalanine into trans-cinnamate [187].
Finally, six decarboxylase genes were reported to be expressed in tomato fruit during ripening [103,188]. Among them, Solyc08g066250 has been reported as SlHDC11 and has been associated with fruit-ripening in tomato [189]. Moreover, LeAADC1A (Solyc08g068680), together with LeAADC1B (Solyc08068610) and LeAADC2 (Solyc08g006740), have been related with the conversion of phenylalanine to phenylethylamine in a S. lycopersicum var. M82 × S. pennellii tomato IL population [93].
The tomato pangenome [26] was investigated for potential non-reference alleles in the promoter region of the identified candidate genes. A presence-absence variant was detected in the promoter of FLORAL4 (Solyc04g063350). This variant has an occurrence frequency of 0.74 in cultivated heirloom tomato and 0.64 in S. lycopersicum var. cerasiforme, while it occurs at a frequency of 0.02 only in S. pimpinellifolium WL. Gene expression analyses pinpointed a significant difference (p-value: 5.48 × 10 −6 ) between accessions harboring the domesticated versus the wild allele, suggesting that the domesticated allele of FLORAL4 has a lower expression compared to the wild allele.

Concluding Remarks
In the last decades, a vast array of scientific studies has investigated the genetic components of tomato aroma in modern tomato cultivars and their relatives. However, the methodological differences between different studies, such as the source materials, the type of mapping population, the number of markers, the reference map used, the influence of environmental factors on trait performance and analytical variation in determination of the aroma phenotype, make it difficult for breeders and researchers to efficiently use the available data in their research. This review summarized the state of the art on the understanding of tomato aroma genetics and provides a tool that can be used by breeders and researchers to collect additional evidence for the robustness of their QTL data. The identified QGRs, especially when defined by the overlapping of QTLs from different sources, represent the regions that most likely contain genetic elements that regulate tomato aroma. The QGRs differ from each other by two major parameters-the size and the number of studies contributing to it. The smaller the size of a resulting QGR and the higher the number of studies referring to it suggests a high robustness of such a QGR in a breeding program. Some of the QGRs, however, appeared to be quite large. This most likely indicates the presence of multiple genetic factors controlling the levels of (related) aromatic compounds in such a region, although we cannot exclude the possibility that, in some cases, a large QGR is caused by inaccurate prediction of a QTL due to experimental and/or methodological differences, as mentioned above. For a breeder such large QGRs could indicate a need for further inspection and dissection of the respective QGR in selected donor material, using high-resolution (fine) mapping approaches.
There are various potential applications of these data in practical breeding and breeding research: (i) The QTL information presented in this review can be directly used to support markerassisted breeding programs aimed at introgressing large-effect aroma QTLs into elite germplasm, for example using the donors indicated in Table S5. (ii) The current development of pangenome projects paves the way for a new step in tomato breeding research. Advances in computational genomics and long-read sequencing allow an easier and more comprehensive investigation of the genetic variation in tomato collections worldwide. This makes it possible to identify genetic elements that are missing in the reference genome and to discover and use novel markers-such as Structural Variants (SVs) and Present Absent Variants (PAVs) [190].
Thanks to the ongoing reduction in sequencing costs, large sets of genotypes can nowadays be re-sequenced, allowing the application of SV markers in GWAS projects [191,192]. Furthermore, SV identification in large sets of genotypes can lead to downstream breeding approaches. For example, SV-based linkage mapping can be applied by genotyping mapping populations using SV markers that showed polymorphism in the parental lines [193]. Furthermore, SV studies may help to get more insight in the mechanisms leading to a certain phenotype [26,65,194,195], pinpointing the best donors for a certain allele cross. By providing a comprehensive set of candidate genes for tomato aroma, our review may guide researchers and breeders in the selection of the most interesting genes that can be investigated for structural variation. (iii) The data in this review can be used to support the identification and use of the key genes underlying these QTLs. In combination with available (pan)genomic and transcriptomic information candidate genes present in the QTL regions can be selected and tested for their effects in vivo using either stable transgenic approaches such as CRISPR-CAS9 mediated gene editing or quicker transient overexpression or silencing in tomato fruit. This may not only lead to the identification of the causal genes controlling a trait, but also to the detection of the causal genetic variants underlying trait variation. Such variants, also called functional markers, are the best possible molecular markers for MAS, since they are functionally linked to the trait rather than genetically linked and their use as marker does not need validation in other populations, which is always required with genetically-linked markers [196]. (iv) The information on large effect aroma QTLs provided in this review can alo be used to improve the performance of genomic prediction models, since both geneticallylinked markers and in particular functional markers have been shown to significantly improve the prediction power of GP models compared to the use of random neutral markers [197].
Last but not least we hope that this review may facilitate the development of more tasty tomatoes.
Supplementary Materials: The following are available online at https://www.mdpi.com/2073-4 425/12/2/226/s1, Figures S1-S4: Gene expression (Euclidean heatmap) of the candidate genes identified in this review; Tables S1-S4: QGRs, Candidate Genes and Candidate Genes expressed in tomato fruit for the four classes of VOCs; Table S5: QTLs reported in literature (biosynthetic pathway, QGR, the QTL's original name, related compound, chromosome, correlated markers, their position in cM and bp, their p-value, their LOD score, the SolycID of the gene in which the markers have been found, the percentage of explained variation by the QTL, the effect, donor parent, the crossing population or association panel used, and the reference of the primary resource).

Conflicts of Interest:
The authors declare no conflict of interest.