Integration of GWAS, metabolomics, and sensorial analyses to reveal novel metabolic pathways involved in cocoa fruity aroma GWAS of fruity aroma in Theobroma cacao

Nacional is a variety of cocoa tree known for its "Arriba" aroma characterised mainly by fruity, floral, and spicy aromatic notes. In this study, the genetic basis of the fruity aroma of modern Nacional cocoa was investigated. GWAS studies have been conducted on biochemical and sensorial fruity traits and allowed to identify a large number of association zones. These areas are linked to both the volatile compounds known to provide fruity flavours and present in the beans before and after roasting, and to the fruity notes detected by sensorial analysis. Five main metabolic pathways were identified as involved in the fruity traits of the Nacional population: the protein degradation pathway, the sugar degradation pathway, the fatty acid degradation pathway, the mono- terpene pathway, and the L-phenylalanine pathway. Candidate genes involved in the biosynthetic pathways of volatile compounds identified in association areas were detected for a large number of associations.


Introduction
Theobroma cacao is a tree species belonging to the Malvaceae family (Bayer and Kubitzki, 2003). Native to the tropical rainforests of northern South America, it is the world's only source of cocoa products obtained after fermentation, drying, and roasting of cocoa beans. T. cacao is a diploid plant (2n = 2x = 20). Its small genome has been sequenced with 96.7% of the assembly anchored on all 10 chromosomes (Argout et al., 2011(Argout et al., , 2017Motamayor et al., 2013).
Cocoa is classified into two types of products: so-called standard or bulk cocoa, which has a pronounced cocoa taste, and so-called fine aromatic cocoa, which is characterised by floral and fruity notes (Sukha et al., 2008). The production of fine aromatic cocoa, therefore, represents about 5% of the global production but is no less important. Some Latin American countries produce almost exclusively fine cocoa, which is a significant source of income for them.
The T. cacao showed a high genetic diversity. Currently, ten genetic groups have been identified (Motamayor et al., 2008). The most widely cultivated varieties providing fine aromatic cocoa are the Nacional, Criollo, and Trinitario. Trinitarios are hybrids between the Criollo and the Amelonado. The Amelonado variety is a variety producing mainly "standard" cocoa. The Criollo variety produces cocoa beans with mainly fruity aromas (Lachenaud and Motamayor, 2017) but is not widely cultivated because of its low vigour and increased susceptibility to disease (Cheesman, 1944).
The Nacional variety is native to Ecuador. The Nacional variety trees currently cultivated (called modern Nacional in this paper) are the result of several generations of crossbreeding between the ancestral Nacional and the Trinitarios (Loor S. et al., 2009).
The Nacional variety is well known for its floral and spicy flavour, denominated "Arriba" flavour. It is for this reason that it is sought after by chocolate makers. It is characterised by floral and woody notes (Luna et al., 2002). In addition, Nacional is known for its low astringency and low bitterness (International Cocoa Organization, 2017). The floral aroma of Nacional was studied and two biosynthetic pathways were highlighted as being mainly responsible for this floral aroma: the terpene biosynthesis pathway and the L-phenylalanine degradation pathway (Colonges et al., 2021;Ziegleder, 1990). The aroma of the modern Nacional probably contains floral and fruity aromas that could be the legacy of crossbreeding with Trinitarios (hybrid trees between Criollo and Amelonado) (Loor S. et al., 2009;Rottiers et al., 2019). Loor et al. (2009) (Loor S. et al., 2009), have demonstrated this hybrid nature using molecular markers. This genetic mixing has led to a dilution of the Arriba flavour (Beckett et al., 2017;Boza et al., 2014;Loor S. et al., 2009). From 1940 onwards, surveys were carried out on the Ecuadorian coast to collect Nacional-type cocoa trees to preserve their genetic resources. These collections were placed in two main experimental stations: the Experimental Tropical Station of Pichilingue (EET-P) of INAP (Instituto Nacional de Investigaciones Agropecurias) and the Cocoa Flavour Centre of Tenguel (CCAT). The fine aromas of the modern Nacional are therefore a blend of the aromas of the ancestral Nacional and the aromas of the Criollo and Amelonado ancestors. Here, this study will be focused on fruity aromas. The fruity aromas are composed of two main classes of aromas: dried fruit aromas and fresh fruit aromas. Dried fruit aromas are mainly due to pyrazines, which mainly appear after roasting. During this stage of transformation, heating induces a large number of chemical reactions, including the Maillard reaction. This reaction leads to the synthesis of pyrazines from amino acids and reducing sugars (Arnoldi et al., 1988).
Different families of volatile compounds such as alcohols, esters, and ketones represent fresh fruit aromas. During alcoholic fermentation, yeasts synthesize short-chain alcohols (e.g., propan-1-ol; butan-1-ol, 2methyl-propan-1-ol, pentan-1-ol, hexan-1-ol, heptan-1-ol, octan-1-ol) from simple sugars. These alcohols can also be synthesised from a pyruvate molecule, present in all organisms (Sun et al., 2015). Aldehydes are formed through the oxidation of alcohol. This is the case, for example, in the Lilac flower (Syringa vulgaris L.) where the "lilac alcohols" are oxidised to give "lilac aldehydes". The esters are the result of the esterification reaction that is a reversible chemical reaction. Esterification allows the synthesis of an ester molecule and a water molecule from an alcohol molecule and an acid molecule (Aranda et al., 2008).
The objectives of this study are to decipher the genetic, genomic, and biochemical bases of the fruity aromas of the Nacional cocoa variety. A GWAS (Genome-Wide Association Study) study was carried out to discover the areas of the genome responsible for the fruity flavour. It involved phenotyping data related to volatile compounds, potentially linked to the fruity taste, as well as sensory analysis data. To refine the determinants of the fruity aroma of the modern Nacional variety, candidate genes potentially involved in the biosynthesis pathways of fruity compounds were sought.

Vegetal material
The plant material used for these experiments were composed of a collection of 152 cocoa trees from Ecuador conserved in the Pichilingue experimental station of the "Instituto Nacional de Investigaciones Agropecurias" (INIAP) and the "Coleccion de Cacao de Aroma Tenguel" (CCAT) of Tenguel. This population represents the Nacional variety currently grown in Ecuador and has been described by Loor, 2007(Loor S., 2007.

Fermentation processes
Micro-fermentations of cocoa beans were carried out in a wooden box in the most homogeneous way possible with a homogeneous cocoa mass. The process lasted 4 days with two turns at 24 and 72 h after the beginning of the fermentation. Each clone sample (152) was placed in a protective laundry bag and micro-fermented in a cocoa mass. After fermentation, the samples were put in a dry place. They were considered dried when their moisture content was below 8%.
Actual roasting conditions were 120 • C × 22 min. Each bean sample was adjusted from this basis roast using the validated ISCQF protocol (ISCQF, 2020) moisture and bean size (bean weight) from that source. Times measured from -2C of set point.

Sensorial analysis
One hundred and forty-six individuals were characterised by sensory analyses based on blind tastings carried out on three repetitions per sample by Edward Seguine. Samples were prepared and then evaluated using the protocols in the ISCQF standards (ISCQF, 2020). The flavor evaluation was also done using the protocols of these standards (ISCQF, 2020). In all cases, flavor evaluation sessions were conducted using defined attribute Cocoa of Excellence Reference Liquors (chocolate liquors, dried fermented roasted crushed beans) whose attributes were defined previously by the Technical Committee of the Cocoa of Excellence program. As far as the tasting by one person, all samples were maintained blind using the four digit codes and were evaluated in random order for all three blind evaluations. Thirteen fruity notes were judged with a score ranging from zero (no fruity notes detected) to ten (intense note detected) according to ISCQF protocol (ISCQF, 2020) (table A.1).

Volatile compound analysis by GC-MS
The analysis of volatile compounds was carried out on dried fermented beans (UR) and roasted beans (R). The volatile compounds of cocoa samples were extracted using the technique of solid-phase microextraction in the headspace (SPME-HS). GC/MS analysis were conducted according to the condition described by Assi-Clair et al.

Statistical analysis
PCA analysis and visualization were made with "mixOmics" R package. Calculation of correlation was made with "agricolae" R package and visualization of correlation matrix with "corrplot" R package.

Genotyping by sequencing (GBS)
DNA samples were genotyped by sequencing (GBS) using DArTseq (Diversity Arrays Technology Sequencing) technology (Kilian et al., 2012). Reads were aligned with the V2 sequence of the Criollo genome (Argout et al., 2017). Markers with unknown locations were discarded for analysis.

Table 1
List of biochemical compounds related to fruity traits and used for the GWAS analysis of unroasted (UR) and roasted (R) beans.

Association mapping
We performed a GWAS analysis with SNP markers associated with biochemical (146 accessions x 5195 markers) and sensory (144 accessions x 5195 markers) traits using TASSEL 219 v5.
For all the traits, we used a mixed model (MLM) and a fixed-effect model (GLM) the detailed information was described in Colonges et al. (2021).
The threshold was determined using the Bonferonni correction formula as proposed by Gao et al. (2008) with the effective number of independent tests (Meff) used as the denominator and calculated by SimpleM R package (Gao et al., 2010). Meff was 2796, which corresponds to a P-value of approximately 1.79e-05. The significance of all markers was plotted using Manhattan plots with the R QQman package.
We performed an analysis with SSR markers associated with biochemical (180 accessions x 180 markers) and sensory (197 accessions x 180 markers) traits using TASSEL v3. We used a fixed-effect model (GLM) with a structure matrix; the option of 500 permutations was chosen the detailed information was described in Colonges et al. (2021). The threshold was determined using the Bonferonni correction corresponding to a p-value about 2.78e-04.
The choice of markers for the genotyping data and the determination of the confidence interval was made using the same method as described by (Colonges et al., 2021).

Sensory trait analysis
Thirteen fruity notes and notes associated with pyrazine notes (roasted degree, cocoa, browned flavour, Caramel browned sugar) were evaluated during the sensory analysis carried out on the cocoa liquors.
All of these thirteen sensory traits were used to carry out this study (table A.1).
The results of a PCA for fruity notes (sensory analysis traits) showed a continuous variation within the population (Fig. 1A). The aromatic notes of brown flavour, fruity browned dried fruit, and nutty mainly define axis 1. Aromatic notes fruity acidity, Fruity-Citrus, and Fruity-Dark tree fruit mainly define axis 2.
Analyses of correlations between the different sensory traits related to the fruity taste did not show strong negative correlations. Three strong positive correlations were detected; a correlation of 0.8-1 between the hazelnut note and browned flavour characteristics, two correlations of 0.6-0.8 between the browned dried fruit note and the nutty note on the one hand and the browned flavour note, on the other hand ( Fig. 2A).

Biochemical traits analysis
The identification of volatile compounds was carried out in cocoa beans before and after roasting. A total of one hundred and sixty-one volatile compounds were identified. Thirty-five of them are known to have fresh or dried fruity notes (Table 1). These compounds are mainly synthesised through five biosynthesis pathways: proteins degradation pathways (required for the production of pyrazines via the Maillard reaction), the fatty acid degradation pathway, the sugar degradation pathway, the L-phenylalanine degradation pathway, and the monoterpene biosynthesis pathway.
The result of a PCA for volatile compounds in unroasted beans involved in fruity notes shows a continuous variation within the population (Fig. 1B). The compounds 1-methylbutyl acetate, meso-butan-2,3diyl acetate, and ethyl acetate mainly define axis 1. The compounds 3methylbutyl acetate, heptan-2-one, and ethanol mainly define axis 2.
Correlation analyses were carried out between the different volatile compounds related to fruity notes. Twenty-seven strong positive correlations (greater than or equal to 0.8) were detected between several  (Arn and Acree, 1998;Ito et al., 2002) UR: Unroasted beans; R: Roasted beans; Path.: Pathway,FA: Fatty acid degradation pathway; SS: Sugar degradation pathway; M: Monoterpene biosynthesis; P: Pyrazine production; L-phe: L-phenylalanine degradation pathway. Green numbers represent the two individuals closest to Amelonado's ancestor. Red numbers represent the two individuals closest to the Criollo ancestor. Purple numbers represent SNA604 (Nacional ancestor) and two individuals closest to the Nacional ancestor. (e.g., See also  volatile compounds from unroasted or roasted beans. Sixteen strong positive correlations were between compounds involved in the fatty acid and sugar degradations. Only one strong positive correlation was between the two pyrazines. Four strong positives correlations were between compounds involved in fatty acid and sugar degradation pathway and pyrazines. Seven strong negative correlations were also detected ( Fig. 2B) of which five were detected between compounds involved in fatty acid and sugar degradation. The two other negative correlations are between pyrazine and pentan-2-ol (R) or 1, 2, 5-trimethylbenzene (R). No strong correlation was detected between biochemical compounds and sensory analysis data (see figure A.1).

Genome wide association study (GWAS)
The genetic diversity of the population and its structure has been described by Loor et al. (2009) (Loor S. et al., 2009) using SSR markers and by Colonges et al. (2021) (Colonges et al., 2021) using SNP markers. The population has a high rate of heterozygosity.
All detected association areas are available in table A.2.

Identification of significant associations for sensorial traits
Out of all the associations, only 22 relate to fruity sensory data. With thirteen sensory perceptions related to fruity traits, associations could be detected for only three of them: "Fruity-Dark Tree Fruit", "Fruity-Dried fruit" and "Fruity-Berries". Only one association could be detected for three of the four characters (Table 2). Nineteen areas of associations could be detected for the note "Fruity-Dark Tree Fruit". These associations were detected on chromosomes 1, 2, 4, 7, 8, and 10. The strongest association detected for the note "Fruity-Dark Tree Fruit" is also the strongest association for all fruit notes and explains 24% of the variation in this trait.

Identification of significant associations for biochemical traits
The GWAS analysis revealed 480 areas of association for biochemical compounds related to the fruity taste. All the associations found are reported in table A.2.
For all the compounds for which significant associations have been detected, three major biosynthesis pathways seem to emerge. The protein degradation pathway (required for the production of pyrazines via the Maillard reaction), the sugar degradation pathway, and the fatty acid degradation pathway. The two degradation pathways (sugars and fatty acids) are often linked (Fig. 3). They will therefore be presented jointly.
The areas of significant associations have been mapped to visualize their locations and co-locations. Two maps have been produced. A map showing the associations of traits related to pyrazine production (figure A.2). A second map showing the areas of associations related to the compounds known to be involved in the degradation pathway of sugars or fatty acids ( figure A.3). These biosynthesis pathways often have common compounds. Some results are different according to the methods (GLM and MLM) or according to the sorting of markers (MAF5 or G7) or even between the type of markers (SNP or SSR).
Repeatable results between methods appear to be the most conclusive.

Significant associations were identified for the biochemical compounds involved in the pyrazine production pathway
Of the 74 volatile compounds related to the fruity notes, eight pyrazines were identified by GC-MS in the roasted beans and only one in the unroasted beans (2, 3, 4, 5-tetramethylpyrazine). Two hundred and sixty-eight association zones have been detected in relation to pyrazines ( Table 2). Association zones were detected on chromosomes 1, 2, 3, 9, and 10. The most significant association zone for 2, 3, 5-trimethylpyrazine (R) co-locates with the most significant association zone detected for methylpyrazine (R). Six of the most significant association zones for these compounds are located outside the calculated haplotypic blocks.
The most significant association found for pyrazines is the association zone associated with 2-ethyl-5-methylpyrazine (R). The variation of this trait is explained at 49% by the genetic variation located in this association zone.

Significant associations were identified for the biochemical compounds involved in the degradation of fatty acid and sugar pathway
Four hundred and eighty associations were detected in connection with the compounds involved in the degradation pathways of sugars and fatty acids (table A.3). Some of the most significant associations are located in the same haplotypic block but with a different position of the peak of the associations (Table 3). Some peaks of the most significant associations co-locate (Table 3).

Co-locations between biochemical compounds and sensorial traits
Nine co-locations between associations linked to fruity notes and associations linked to volatile compounds were detected (Table 3) of which seven are between the fruit note "Dark tree fruit", one with the dried fruit note and one with the berries note.

Involvement of monoterpenes and L-phenylalanine pathway compounds in the fruity notes of cocoa
As shown by (Colonges et al., 2021), monoterpene biosynthesis and L-phenylalanine degradation pathways are involved in the synthesis of fruity notes. Two volatile compounds with fruity notes belonging to the monoterpene biosynthesis have been identified in this population of cocoa: the linalool cis pyranic oxide, known for its citrus note, and the linalool Trans furanic oxide known for its floral and citrus notes. Respectively, twenty-seven and twenty-nine association area was detected related to these two compounds.
Five volatile compounds known for fruity notes belonging to the Lphenylalanine degradation pathway have been identified: benzaldehyde, benzyl alcohol, 1-phenylethyl acetate, ethyl benzoate, and acetophenone. Two associations' zones were identified for benzaldehyde (UR) and seventy-two for benzaldehyde (R), known for its cherry and bitter almond notes. Forty-two associations' zones were identified related to benzyl alcohol (UR), seventy-three for 1-phenylethyl acetate (R), two for ethyl benzoate (UR), thirty-six for acetophenone (UR), and forty for acetophenone (R), known for their fruity notes.

Candidate genes are potentially involved in the formation of the dried fruity aroma
Across the pyrazine association zones, 100 candidate genes were identified (table A.4). 94 genes are identified as being involved in the synthesis of precursors of the Maillard reaction (amino acids and reducing sugars). These genes are therefore involved in either protein degradation or sugar degradation ( figure A.3).
Of the 100 genes, 30 could be involved in the degradation of proteins, sugar, and fatty acid (Table 4), including 25 genes coding for the Alpha/Beta hydrolase family of proteins known to have various functions, including peptidase function (Holmquist, 2000;Mindrebo et al., 2016). In addition, five genes coding for enzymes with oxidase/reductase or hydrolase functions.
Of the 100 genes, 61 are involved in protein degradation (Table 4). These genes could therefore be at the origin of the synthesis of amino acids, necessary for the Maillard reaction, by degrading existing proteins.
Among the 100 genes, three are involved in the degradation of sugars, and six are involved in fatty acid degradation (Table 4). These genes could be at the origin of the synthesis of sugars also necessary for the Maillard reaction.

Fatty acid and sugar degradation pathways
In the set of association zones linked to volatile compounds (acids, alcohols, aldehydes, esters, ketones) from the degradation pathways of sugars and fatty acids, 227 candidate genes were identified (table A.4).
Of the 227 genes identified, 30 encode an "alpha-beta hydrolase" with a probable lipase function (Holmquist, 2000;Mindrebo et al., 2016) and 8 for enzymes with hydrolase functions. As their name indicates, the enzymes encoded by these genes would participate in lipids or sugars degradations.
Of the 227 genes identified, 121 have functions for fatty acid degradation (Table 4), 11 genes were identified as being involved in the fatty acid synthesis, 47 involved in sugars degradation, six involved in amino acid degradation, four involved in carboxylic acids degradation. The degradation of sugars, amino acids, and carboxylic acids are necessary steps for the synthesis of compounds such as acids, alcohols, or esters.

Monoterpene biosynthesis pathway
For the linalool cis pyranic oxide four genes coding for "Probable terpene synthase 9 ′′ was identified (three on chromosome 7 and one on chromosome 10) (table A.4). This gene is known to be responsible for the transformation of the geranyl diphosphate into linalool, the precursor of linalool cis-pyranic oxide (Cseke et al., 1998). For the linalool Trans furanic oxide six candidate's genes were identified on chromosome 5: five coding for Cytochrome P450 89A2" and one for "Cytochrome P450 89A9". At least, two studies have shown the implication of Cytochrome P450 for the transformation of linalool into 6,7-epoxylinalool Table 3 Co-localisations between compounds involved in the fatty acid and sugar degradations pathways or/and in the L-phenylalanine degradation pathway.   ( Chen et al., 2010;Meesters et al., 2007).

L-phenylalanine degradation pathway
In total, twenty-nine candidate genes were identified in these associations' zones involved in their biosynthesis. Any candidate gene was detected in associations linked to ethyl benzoate (UR). Forty-six candidate genes were identified in the areas of association with the other compounds. They were chosen of their predictive functions. They indicate possible involvement in the degradation pathway of L-phenylalanine (table A.4).

Candidates genes potentially involved in the general plant defence identified in association areas linked with biochemical compounds
As suggested earlier (Sabau et al., 2006), the detection of fermentative micro-organisms seems to induce defensive responses that may be responsible for the synthesis of certain aromatic compounds.
In the association zones linked to compounds involved in fatty acid and sugar degradation, twenty-three genes are involved in the general defences of the plant (Table 5). Jasmonic acid and salicylic acid are two phytohormones with an important role in the defence mechanisms.
In the association zones linked to pyrazines compounds, sixteen genes coding for enzymes with functions involved in general plant defences were identifying (Table 5). The functioning of the proteasome plays an essential role in the activation of the jasmonic acid response pathway (Song et al., 2014). The fumaryl-acetoacetate hydrolase was identified in Arabidopsis Thaliana as affecting cell death in response to jasmonic acid (Zhou et al., 2020). Ethylene is also a phytohormone that plays a role in the response to biotic stresses (Song et al., 2014).

Discussion
It is the first time that integrative analysis of genetic, biochemical and sensorial fruity traits was conducted, leading to the identification of a few numbers of metabolomic pathways involved in fruity traits.

Insights into the genetic architecture of fruity aromas in cocoa
A large number of associations could be detected in relation to the different analysed traits related to fruity taste. Out of the thirteen sensory descriptors relating to fruity notes, associations were detected for only three of them. This is far less than what was detected for floral notes. (Colonges et al., 2021), detected associations for eleven of the sixteen floral descriptors obtained by sensory analysis. These results show that there is greater variability in the presence of floral notes than fruity notes in this population of modern Nacional.
The statistical power of GWAS may be somewhat limited due to the population size (152 clones). These clones come from genetic diversity conservation orchards and therefore do not have an optimal design for association genetics analyses. In order to improve future analyses, it would be interesting to plan several years of harvesting and analysis to confirm the results obtained. In addition, meta-analyses with other populations could also provide more statistical power.
However, a greater number of associations with volatile compounds known to have a fruity taste were detected. Possibly, these compounds are not in sufficient concentration to be statistically detected in the sensory analysis. Each compound would thus have a weak effect on the fruity note and in this case, they are not detectable by the GWAS method. In addition, some volatile compounds with a fruity taste can be synthesised by the micro-organisms present during fermentation (Schwan and Wheals, 2004). This is the case, for example, of esters, which are largely synthesised by fermentative yeasts (Ho et al., 2014;Soles et al., 1982). In our case, it is possible that these microorganisms played a role in the production of volatile aromatic compounds, but they are not the only ones responsible, since a large number of associations related to volatile compounds known to have a fruity taste have been detected. This means that the cocoa tree is also involved in the synthesis of these compounds in associations.
Of the 74 volatile compounds related to the fruity notes, association zones could not be detected for 13 of them. These compounds belong to different chemical families (two acids, three alcohols, one aldehyde, five esters, and two ketones). As these different compounds do not appear to be genetically related to the cocoa trees in this population, the microorganisms present during fermentation likely synthesised them (Abbas, 2006;Ho et al., 2014;Soles et al., 1982).

Metabolomic pathways involved in fruity traits
Based on the previously presented results, five biosynthesis pathways appear to be involved in the synthesis of fruity compounds in cocoa. Two have been identified as being involved in the synthesis of floral compounds (Colonges et al., 2021): the monoterpene biosynthesis pathway and the L-phenylalanine degradation pathway. The three other biosynthesis pathways appear to be the pyrazine production pathway, the sugar degradation pathway and the fatty acid degradation pathway.
The pyrazine production seems to be achieved by the Maillard reaction during roasting. Indeed, ten of the eleven pyrazines were detected in cocoa beans after roasting. The eleventh pyrazine, tetramethylpyrazine, was certainly synthesised during drying also through the Maillard reaction (Starowicz and Zieliński, 2019). For this reaction to take place, free amino acids as well as reducing sugars are needed. This is why genes coding for proteinase or sugar degradation enzymes were found in the pyrazine association zones. As was observed for floral notes, co-locations between pyrazines known to have a fruity note and other pyrazines with no known notes suggest that a gene coding for a key enzyme involved in their biosynthesis could be responsible for the presence or absence of the fruity note. This is the case, for example, of the co-locations between methylpyrazine, known to have a nutty note, and 2,3-dimethyl-5-ethylpyrazine, known to have burnt notes, on chromosomes 1, 4, and 6. It is possible that in these areas of the genome, the same enzyme is at the origin of their precursors. In the case where the enzyme is active, the precursors for pyrazine A would be synthesised and in the case where the enzyme is inactive, the precursors for pyrazine B would be present. If pyrazine A is the one bringing a dry fruit note then when the enzyme is activated, a dry fruit note is present; otherwise it is absent. The pathways of sugar degradation and fatty acid degradation are linked and have several biochemical compounds in common. A large number of areas of the association have been detected for these many traits. With the large number of results and the multiple possibilities of compound synthesis, it is difficult to establish a hypothetical biosynthetic pathway in cocoa from these results. Furthermore, some intermediate compounds may be synthesised by the microorganisms present during fermentation. The biosynthesis of these compounds could be a synergy between the enzymatic actions of the microorganisms and the cocoa beans. Further studies on the presence of these volatile compounds with or without some micro-organisms (yeast or bacteria) could identify more precisely the biosynthetic mechanisms. However, the colocalisations observed between certain compounds suggest enzymatic actions initiated in the cocoa beans. This is the case, for example, on chromosome 1 where a co-location between the association linked to pentan-1-ol (R) and pentanoic acid (R) is observed in the haplotypic block number 18 (5 356 899 pb-5 820 933 pb). In this case, a cocoa enzyme could be responsible for the oxidation of the alcohol pentan-1-ol leading to the carboxylic acid: pentanoic acid. A candidate gene "Acylcoenzyme A oxidase 2, peroxisomal" encoding an enzyme with an oxidative function was found at position 5 520 690 on chromosome 1 next to the haplotypic block of the associations ( figure A.3).
The fruity aroma of Nacional could also include compounds belonging to the L-phenylalanine degradation pathway such as benzaldehyde, benzyl alcohol and 1-phenylethyl acetate, known to have fruity flavours. Associations have been detected for these three compounds and exposed in a previous study (Colonges et al., 2021). These compounds are also precursors of compounds known to have floral tastes. The enzymatic activity allowing the degradation of benzaldehyde or benzyl alcohol is one of the keys to the increased presence of compounds with fruity or floral tastes.

Conclusions and perspectives
The study of the determinism of aromas is complex and involves several disciplines. No biosynthesis pathway produces only one of the compounds with the same type of taste. It is the balance between all the compounds that allow the synthesis of an aromatic profile and that of Nacional cocoa is complex. The presence of an aromatic compound is not necessarily synonymous with the perception of its aromatic note. Identification of the volatile compounds with a perceived fruity note could be carried out thanks to a gas chromatography analysis coupled with olfactometry. With this approach, it would be possible to identify the molecules whose fruity notes are perceived and thus select the candidate genes favourable alleles associated with these compounds, providing new tools for marker-assisted selection.

Declaration of competing interest
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.