The genetic architecture of the pepper metabolome and the biosynthesis of its signature capsianoside metabolites

metabolite presence/ absence and/or accumulation remain largely unknown. In this study, we carried out a genome-wide association study as well as generating and characterizing a novel backcross inbred line mapping population to determine the genetic architecture of the pepper metabolome. This genetic analysis provided over 1,000 metabolic quantitative trait loci (mQTL) for over 250 annotated metabolites. We identiﬁed 92 candidate genes involved in various mQTLs. Among the identiﬁed loci, we described and validated a gene cluster of eleven UDP-glycosyltransferases (UGTs) involved in monomeric capsianoside biosynthesis. We additionally constructed the gene-by-gene-based biosynthetic pathway of pepper capsianoside biosynthesis, including both core and decorative reactions. Given that one of these decorative pathways, namely the glycosylation of acyclic diterpenoid glycosides, contributes to plant resistance, these data provide new insights and breeding resources for pepper. They additionally provide a blueprint for the better understanding of the biosynthesis of species-speciﬁc natural compounds in general


INTRODUCTION
Pepper (Capsicum spp.) is one of the most important cultivated crops of the Solanaceae family, with a global gross production value of $37.6 billion in 2021 (FAOSTAT, 2021).7][8] The most widely cultivated species, C. annuum, is said to have originated in Mexico, spreading via Mesoamerica to Africa and Asia before ultimately reaching Europe. 9,105][16] Capsaicinoids, furthermore, serve as protectants against plant diseases and are known to promote cell death of malignant cells in humans. 17,18Next to essential vitamins, minerals, and nutrients, peppers are also rich in several additional phytochemicals, including capsaicinoids and flavonoids. 19,20][13] Compounds exclusively produced in the genus Capsicum are capsaicinoid's sweet analogs capsinoids, 19 capsidiols, 20 capsanthin, 21 capsicosides, 22 and the acyclic diterpenoid glycosides known as capsianosides. 23][32] Although a wide range of secondary metabolites, including the capsianosides, have been reported in pepper, 8,[33][34][35][36] the genetic basis underlying their biosynthesis is still poorly understood.][39][40][41][42] This increasingly important research field moreover facilitates phenotypic screening without the construction of transgenic plants. 43n this study, we used two mapping populations, a genomewide association study (GWAS) panel alongside an as-yet undescribed biparental backcross inbred line (BIL) population and explored the fruit metabolic diversity in multiple experiments (Figure S2A).Metabolite GWAS (mGWAS) and metabolite quantitative trait locus (mQTL) mapping revealed genetic architectures responsible for the natural variations in the composition of secondary metabolites in the pepper populations.We identified 32 QTL and 92 novel genes associated with the abundance of 20 different compound classes.To demonstrate the power of our approach and resources provided here, we focused on a fruit-specific metabolic hotspot located on the chromosome 9 associated with monomeric capsianoside, combining the QTL analysis from both GWAS and biparental BIL populations.We identified a gene cluster of 11 UDP-glycosyltransferases (UGTs) involved in the metabolic diversity of the monomeric capsianosides in pepper.The function of the UGTs was verified by transient overexpression in pepper and tobacco.Given the importance of these metabolites both in terms of plant disease resistance and the nutritional importance of these compounds in the human diet, these findings may contribute to the development of elite pepper varieties that possess increased disease resistance and enhanced nutritional value.The experimental procedure described in this study could, furthermore, serve as a blueprint to identify key regulatory genes involved in metabolic pathways in non-model crop species.
In parallel to the BIL population, we explored the genetic diversity of 162 C. annuum accessions collected from 62 diverse locations across 6 Balkan countries (Data S1).Using 77,401 filtered SNPs throughout the whole genome, we analyzed the population structure analysis and the genetic relations among the 162 C. annuum accessions (Figures 1A-1E).The estimation of the delta K value showed the highest peak at K = 3 (Figure 1D), indicating that the 162 accessions in the panel generally could be grouped into 3 subpopulations, which could be primarily attributed either to their pungency or their respective kapia and pumpkin fruit shape (Figures 1B and 1C).Although the grouping of these subpopulations is not fully separated, the principal-component analysis (PCA) analysis showed that the pumpkin fruit-shaped accessions were clearly separated from other groups (Figure 1E).On the other hand, the population structure was not associated with production of capsianosides across the GWAS panel (Figure 1C), indicating that the GWAS will likely identify true causative genetic factors behind such compound class.

Metabolic diversity in the GWAS and BIL populations
In order to explore the chemical diversity in pepper fruit, we grew both the GWAS and BIL populations for multiple experiments, collected fruit materials, and subjected them to ultra-high performance liquid chromatography coupled to mass spectrometry (UHPLC-MS) for metabolite profiling (see STAR Methods; Figure S1B).Using UHPLC-MS, we were able to detect and quantify about 1,100 metabolites across both populations, including 456 flavonoids, 422 acyclic diterpenoids, 153 other phenolics, and 59 other compounds, including lipids and fatty acids (Data S2, S3, S4, S5, S6, and S7).
To investigate the metabolic diversity further, PCA was performed for all metabolic traits (Figures S3A-S3F).In the case of data from the GWAS panel, PC1 and PC2 explained 39.8% and 6.5% of the trait variance, respectively (Figure S3D).Figures S3B and S3D show the metabolic variation of the BILs from the experiments one and two.In this case, PC1 could explain 28.7% and 29.7% of the variation, while PC2 could explain 13.3% and 13.9% of the variation in both experiments, respectively (Figures S3E and S3F).PC1 distinguished the parental lines, while PC2 distinguished lines with unique metabolic variation.

Genetic architecture of the pepper fruit metabolome
We next performed mGWAS for the GWAS panel and mQTL mapping for the BILs to reveal the genetic basis of the pepper fruit metabolome (GWAS SNPs Data S8, BIL SNPs Data S9).We used a mixed linear model (MLM) and R-QTL to detect significant associations in GWAS and the linkage mapping populations, respectively.In total, we identified 1,941 lead SNPs for 156 annotated metabolites and 13,792 lead SNPs for 1,278 non-annotated compounds in the GWAS panel (Data S10).Linkage mapping in the BIL population identified 551 SNPs with a (E) PCA of pepper accessions using the SNP data.Accessions were classified as pungent (red), pumpkin (orange), kapia (blue), and green-pungent (purple).See also Data S1 and S8.logarithm of odds (LOD) score > 15 for 265 annotated metabolites and 4,764 significant SNPs for 2,376 non-annotated compounds.Pleiotropic analysis identified five QTLs with overlapping associations in a ±50 kb window across the BIL and GWAS populations (Figure 2).Increasing the window size led to 45 candidate genes for several overlapping significant loci in both BIL and GWAS populations (Data S11).The highest sum of associations was observed at the bottom of chromosome 9 (Figure 2).
In order to demonstrate the power of both GWAS and linkage mapping populations to reveal candidate genes controlling metabolic traits, we first mapped the capsaicinoid content (metabolites that contribute to pepper fruit pungency) of both populations (Figures S3G-S3J).Both the GWAS and QTL mapping panels revealed significant associations for capsaicinoids (e.g., dihydrocapsaicin, capsaicin, and nonivamide; Figure S3G).One of the QTL regions spanning 205.2 kb (149.95-150.15Mb, LOD = 15.15) on chromosome 2 contained 13 genes, including the acyltransferase 3 AT3/PUN1.This gene has previously been described as the causal enzyme catalyzing the final condensation reaction of the biosynthesis of capsaicin. 44The same QTL was additionally validated by mapping the fruit pungency scores (see STAR Methods) in the GWAS panel (Figure S3H).This result demonstrated the power of both populations to identify key genes involved in pepper metabolism.As a further example, we have identified a new locus on chromosome 2, 1.6 Mb downstream, controlling the abundance of capsaicinoids.This QTL region overlapped in both populations and harbored 3 genes (151.68-151.86Mb, LOD = 16.39),including CA02g20190

Pleiotropy analysis of GWAS and BIL panel
Genome-wide association study (GWAS; orange; 173 semi-polar compounds) and linkage mapping (blue; 190 semi-polar compounds) were performed on fruit pericarp samples.The pleiotropic map identifies regions of the genome that have common significant single-nucleotide polymorphisms (SNPs) in a ±50 kb window across populations.Each dot denotes a significant GWAS or linkage mapping hit of a secondary metabolite that has been mapped to a specific genomic region.The inner gray circle shows the total number of significant SNPs mapped to the genomic area, and the outer ring represents the number of chromosomes.The UDP-glycosyltransferase cluster (arrow) is mapped by a total of 132 and 84 semi-polar compounds from the GWAS and BIL populations, respectively.The orange line indicates common QTL in both populations.The detailed results are given in Data S2, S4, S5, and S7.See also Figure S3.
coding for an MYB transcription factor that could be used for further validation and follow-up studies (Data S11 and S13).Furthermore, 29 promising QTL were manually identified after going through the gene list within each significant QTL: On chromosome 8 at 114.48 Mb, a tocopherol cyclase (CA08g05970) was associated with g-tocopherol (415.19 m/z, 7.88 min) in the GWAS panel and found to be highly expressed in rip fruit (Data S11).Caffeic acid 3-glucoside was linked to a phenylalanine ammonia-lyase (CA05g20790) on chromosome 5 and showed higher expression at fruit breaker stage.Dimyristoylcapsanthin (1,021.823m/z, 15.1 min) was identified in the BIL population and mapped to a phytoene synthase 1 (CA04g04080) on chromosome 4 at 9.8 Mb, next to luteolin 6,8-di-C-hexoside and an acyclic diterpenoid derivative and ten phenolics and two capsianoside, and expression data showed that phytoene synthase 1 is very highly expressed in ripe fruit.A capsanthin/capsorbin synthase (CA06g22860) was identified by an association of three acyclic diterpenoids and quercetin 3,7dirhamnoside, quercetin-trihexoside in the BIL population and showed be highly expressed in ripe fruit (Data S11).Eight unannotated compounds next to an acyclic diterpenoid derivative and a sinapoyl derivative mapped to a region on chromosome 2 containing a cluster of 23 1-aminocyclopropane-1carboxylate oxidase (ACO) 1 at position 133.7-134.3Mb.

Genetic variation of the capsianosides biosynthesis pathway
Capsianosides are a diverse class of acyclic diterpene glycosides, water-soluble constituents present in pepper.While the chemical identity, anti-herbivore effect, and dietary health benefits have been reported, 29,30 the genetic basis underlying their biosynthesis is yet to be fully elucidated (Figure S2A).In our study, we detected over 400 capsianoside derivatives (Data S2 and S7) and were able to identify eight genetic loci controlling their abundance using either association or linkage QTL mapping approaches (Figure 3; Data S11).

Identification of a gene cluster responsible for capsianoside decoration
The above mQTL analysis identified genetic loci and candidate genes involved in the upstream capsianoside biosynthesis pathway.Intriguingly, the GWAS analysis identified a hotspot mQTL at the bottom of chromosome 9 that influenced a wide range of capsianosides derivatives, specifically the glycosylated derivatives of diterpene backbones (Figure 4A).
To cross-validate our findings, we also checked the QTL presence in the BIL population (Figures 4B, 4C, and 4E).The QTL mapping in the BIL population indeed confirmed the hotspot in two independent experiments (Figures 4B and 4C).Further, to investigate the specificity of the mQTL to capsianoside derivatives in this study, we performed PCA analysis using capsianoside derivatives in both GWAS and BIL populations (Figures 4D, 4E, and S4A-S4I).Then, PC1 to PC3 covariant were used to perform the GWAS and QTL mapping (Figures 4, S4C, S4F, and S4I; Data S14).In GWAS, PC1 explained 41.8% of capsianosides variance (Figure S4B).A similar observation was obtained for the BIL population, with 44.4% and 49.6% variation in capsianoside explained by PC1 in experiments one and two, respectively (Figures S3E and S3H).PCA analysis further indicated that >85% of the variation is explained by genetic variance.Interestingly, in both the GWAS panel and BIL population, the main peak (Àlog 10 (p) of 13.4 and LOD 67.1 and 59) mapped to the bottom of chromosome 9.Further, using linkage disequilibrium (LD) analysis of GWAS (Figure 5C) and recombination breakpoints in the BIL population (Figure 4), we were able to narrow down the QTL-hotspot region to 286.3 kb (270.1-270.29 Mb), which contains 25 genes (Data S12), including a cluster of 11 UGT genes that pinpoint the most likely causative genes.
However, it should be noted that 41 disease-resistance genes are located upstream of the UGT cluster from position 267.7 to 269.4 Mb, and 33 disease-resistance genes are located within or downstream of the UGT cluster from position 270.1 to 271.1 Mb.In total, 25 genes were assigned to one orthogroup (OG0000008, Data S12 and S13) and are annotated as homologs of the disease resistance gene BS2 that is associated with black spot disease in pepper, which is caused by Xanthomonas campestris pv.vesicatoria. 45n addition to the major hotspot on chromosome 9 controlling a large number of capsianoside derivatives in pepper fruit, the QTL analysis identified additional significant loci specific to the BIL population (Data S12).The QTL mapped to the bottom of chromosome 2 harboring an introgressed segment from C. chinense (LOD = 73.5)contains seven genes, two of which are predicted to be CRS1-like group II intron splicing factors (CA02g23580, CA02g29200) orthologous to the tomato Sol-yc02g087250 and are promising for further investigation of transgressive segregation among the BIL population.Also, in this region, CA02g29190 encodes a protein of unknown function, whereas the orthologue in S. lycopersicum (Sol-yc02g087260) is predicted to be a tobacco mosaic virus (TMV) response-related protein, implying a defense-related role.However, the causality between these genes and the observed phenotype still needs to be investigated.

Functional validation of a UGT cluster associated with capsianoside variation in pepper fruit
We concentrated our efforts on the UGT cluster at the bottom of chromosome 9 because it had the greatest number of associations.To determine whether the above mQTL was specific to fruits or also detected in leaves, we measured the levels of 102 compounds, including capsianosides, in the leaves of 156 pepper accessions and performed mGWAS (Figure 5B; Data S15).None of the capsianoside derivatives measured in leaf tissue were mapped to the bottom of chromosome 9 (Figure 5B), indicating that it is a fruit-specific mQTL.
The genetic linkage analysis revealed four linkage blocks (Figure 5C); UGT1 is linked to block ''A''; UGT2, UGT3, and UGT4 are linked to ''B''; UGT5 and UGT6 are linked to ''C''; and UGT7, UGT8, UGT9, UGT10, and UGT11 are linked linked to block ''D'' (Figures 5C and 5D).Furthermore, the LD analysis indicated that all of the presented linkage blocks (except B) are inherited in LD.Multiple sequence alignment analyses of the UGTs revealed four and two subclusters using the nucleotide and amino acid sequences, respectively (Figures 5D and 5E).Interestingly, genes with high sequence similarity are distributed across different linkage blocks.
We cloned the coding sequences of four UGT genes that representing different linkage blocks to test the hypothesis that the UGT gene cluster controls the glycosylation steps in the capsianoside biosynthesis pathway.To investigate the role of UGTs in capsianoside biosynthesis, independent transient overexpression was performed in N. benthamiana, C. annuum (CM334), and A. thaliana (Figures 6 and S5A-S5C).UGT 5, UGT6, and UGT 7/9 (7/9 with 100% sequence identity) were successfully cloned from both C. chinense and C. annuum, whereas UGT1 was only cloned from C. chinense.The localization was determined using a GFP epitope tag driven by cauliflower mosaic virus (CaMV) 35S and revealed that UGT1 and 6 are cytosolic, whereas UGT5 and 7/9 are both cytosolic and chloroplastic (Figures S5A-S5C).Metabolic profiling of transformed leaves revealed significant differences in glycosylated capsianoside content between transformed pepper leaves over-expressing UGT genes and empty vector (Figures 6 and S3D-S3F).A similar result was obtained in N. benthamiana leaves over-expressing the UGT genes, with noticeable changes reported in nicotianoside derivative content specifically (diterpene glycosides in Nicotiana, Figure 6).These findings suggest that UGTs are responsible for the decoration of capsianoside derivatives in pepper.

DISCUSSION
Here, we provide the first comprehensive analysis of the genetic architecture of pepper metabolism by utilizing two distinct populations.First, we conducted a natural variation-based GWAS panel of Balkan pepper, which is widely used in commercial European breeding programs. 58Secondly, we complemented the GWAS with a metabolite QTL analysis on a novel previously undescribed biparental BIL population (BC 2 S 4 ) derived from C. annuum cv.Marconi giallo and C. chinense cv.CGN22793 (Figure S2).
Our studies revealed that Capsicum harbors a diverse metabolic landscape with metabolite variation spanning the range of absent (below the threshold of detection) to 150-fold more abundant with this span being similar to that recorded in previous more targeted metabolite-based research. 38,59Results from the mGWAS were not affected by confounding effects of population structure (Figure 1), and additionally, a subset of the identified loci from the mGWAS and mQTL analyses Phylogenetic tree based on amino acid sequences of 139 UGTs across different plant species in addition to the eleven CaUGT.Arabidopsis thaliana (At), Nicotiana benthamiana (Nb), and N. attenuata (Na), as well as the crop species Vitis vinifera (Vv), Punica granatum (Pg), Camellia sinensis (Cs), Glycine max (Gm), Sorghum bicolor (Sb), and Oryza sativa (Os).Multiple sequence analysis was performed using Clustal omega (https://www.ebi.ac.uk/Tools/msa/clustalo/) and tree built with interactive tree of life (iTOL, https://itol.embl.de/).Color code represents different UGT groups.The novel UGT cluster of Capsicum is highlighted in pink.OG, outgroup.See also Figure S6.co-map either to loci previously associated with pungency (Figure S7) or with earlier targeted metabolite analyses that identified transgressive segregation of metabolite production (Figure S4; Data S8).These findings suggest that detailed characterization of these loci could reveal the molecular basis of both fruit pungency and metabolite heterosis.Indeed, in the case of the former, we were able to provide considerable insight into our study by being able to narrow down the previously mapped region QTL for capsaicinoids and fruit pungency on chromosome 2 10,60 to a 28 gene interval, including the gene CA02g19260 encoding AT3/PUN1 44 and MYB transcription factor (CA02g20190).
Beyond the above example, the combined panels uncovered a further 93 candidate genes with potential involvement in pepper metabolism.These included a major cluster of 23 ACOs involved in ethylene (ET) biosynthesis, which are known regulators of stress responses and developmental processes. 61To highlight some additional genes: CA06g22860 is predicted to encode a capsanthin/capsorbin synthase and could be identified on chromosome 6 using the BIL population, while a phytoene synthase (CA04g04080) on chromosome 4 could be identified that is associated with capsanthin and capsorbin derivatives.Both genes are differentially expressed under differential irrigation 62 and exposure to biotic stress (e.g., Phytophtora capsici, P. infestans, Xanthomonas campestris pv.Vesicatoria race 1 [Xcv1]) and phytohormones (e.g., salicylic acid [SA]), 63 and are highly expressed during the fruit ripening 64 (Data S9).Furthermore, CA04g04080 is differentially expressed under abiotic stress (cold, heat, salinity, and osmotic stress). 63,65In addition, a carotenoid cleavage dioxygenase (CA11g03240) on chromosome 11 could be identified, which reported to be differentially expressed under different irrigation conditions, 62 and infection with P. capsici. 63Further, seven QTL involved in the capsianosides pathway were identified (Figure 3).A farnesyl diphosphate synthase, which is differentially expressed under different irrigation conditions, 62 and wide range of a biotic stresses, such as heat, cold, osmotic, and salt stress, 63,65 was expressed during the fruit ripening as well as in leaves and roots. 64In addition, the expression of farnesyl diphosphate synthase was reported to be associated with the infection of tobacco mosaic virus, P. capsici, P. infestans, Xcv1, Xanthomonas axonopodis pv.glycines 8ra (Xag8ra), and following application of SA, abscisic acid (ABA), and ET. 63These observation indicate the key role of this enzyme in capsianosides biosynthesis pathway and support the important role acyclic diterpenoids play in plant resistance to a biotic stresses.Similarly, we found a GGR that is involved in production of photosynthesis-related isoprenoids by utilizing geranylgeranyl diphosphate (GGPP) 66 and a geranyllinalool synthase that is involved in geranyllinalool biosynthesis.Both were differentially expressed under different conditions, 62 of heat, cold, and salinity stress, 63,65 as well as a wide range of biotic stresses such as infection by P. capsici, P. infestans. 63Finally, two TPS found on chromosomes 3 and 11 are involved in for mono-/tri-/sesqui-/di-and tetraterpenes, 67 have been reported to be differentially expressed under heat and salinity 63,65 as well as in fruits and roots. 64n addition to the above-mentioned QTL, 11 UGTs form a significant hotspot region on chromosome 9 that is primarily responsible for the chemical diversity of the capsianosides.These compounds possess anticancer and defense-related potential and are unique to the genus Capsicum. 23,29,31,32This major QTL is specific to pepper fruit pericarp (Figure 2) and is additionally evidenced by mapping PC1 of capsianoside derivatives in all three panels (Figures S4C, S4F, and S4I).Because it is unknown whether differential glycosylation of capsianosides affects fruit morphology or taste, it is possible that breeders selected primarily for disease resistance (as evidenced by the presence of 74 disease-resistance-related genes in close proximity) and that the UGT cluster hitchhiked to support hypersensitive resistance against pathogens.This is further supported by expression of these enzymes in pepper fruit (Figure S7D) and the differential expression of UGT 1 if infected with P. capsici and Xcv1. 63ahyuni et al. previously demonstrated a clear separation among 32 accessions from four Annuum clade members based on the differential glycosylation levels of capsianosides, 34 which is partially reflected in PC1 (Figure S4C).In a subsequent study, they used 113 F 2 plants to map a small subset of metabolic QTL, including a hotspot region containing hundreds of QTL of which some are associated with capsianoside V, IX, VII, X-2, a capsicoside, and several flavonoids. 33Here, we report a QTL with a mere 25 genes that is primarily responsible for differential glycosylation of capsianosides.Although wellcorrelated compounds such as oxylipins and capsicosides (Figures S7A and S7B), in total 267 mass features also mapped to this region as well.The reliance of these classes on the same chromosomal interval may be due to metabolic network compensation. 33Sterols are derived from isopentenyl diphosphate (IPP), and oxylipins are derived from fatty acids that share acetyl-coenzyme A (CoA) as a precursor with the mevalonate (MVA) pathway for IPP biosynthesis. 68,69As such, their abundance is likely to be influenced by capsianoside abundance that said, the mechanism by which flux to these three compound classes is linked remains elusive.Intriguingly, abrogation of the nicotianoside pathway in Nicotiana results in nonspecific hydroxylation, which leads to autotoxicity via inhibition of the sphingolipid pathway. 70,71Therefore, we believe that glycosylation solves the autotoxicity problem and that digestion by herbivores promotes hydroxylations. 71It appears that the intermediate compounds from 17-hydroxygernayllinalool to capsianosides, known as lyciumosides, 44,72,73 have mapped to the same hotspot region on chromosome 9 as capsianosides in this study.These intermediate structures have not been described in Capsicum species hitherto.
However, pleiotropy analysis clearly demonstrated that a subset of metabolic features from both the GWAS and the biparental panel mapped to a variety of small effect QTL.This finding reinforces the power of natural variation and BILs while highlighting a major limitation of segregating F 2 populations in analyzing only a limited genetic repertoire at low resolution.
UGTs play an important role in the formation of fruit quality, pigmentation, and resistance to many antimicrobials and diseases. 74The UGTs studied here could be characterized as potential di-/triglycoside-forming GTs and displayed similarity to group O family UGTs (Figures 7, S6A, and S6B).Duplication and subsequent diversification could be one explanation for the two clusters based on amino acid sequence similarity.Finally, molecular validation procedures confirmed the role of the Capsicum UGT cluster in the decoration of monomeric capsianosides.
In conclusion, our findings suggest that Capsicum annuum accessions from the Balkans have been divided into three genetically homogeneous groups resulting from morphological selection during domestication and/or through historical trading routes.Metabolic profiling of 162 pepper accessions, as well as 100-odd BILs grown under two conditions revealed a wide range of metabolic variations some of which correlated with plant morphometric descriptors (Figure S6).Moreover, a combination of GWAS and QTL mapping revealed hundreds of QTLregulating specialized secondary metabolites, with more than 37 QTL discovered in both panels and, in total, yielding 93 candidate genes.These studies highlight the utility of both the GWAS collection and the novel BIL population that we introduce here for understanding fundamental and applied aspects of pepper biology.Perhaps most important among these is a hotspot region on chromosome 9 that regulates monomeric capsianoside glycosylation.We were able to resolve the underlying genetics to a narrow region containing a cluster of eleven UGTs.The identification of this cluster adds to the burgeoning list of clusters in plant secondary metabolism. 75,76Its importance to underline the important of these genes in diterpenoid glycosides biosynthesis and as the anticancer potential of capsianosides. 29As such, we believe that we have identified an important tool that will enable breeders to breed highly resistant and nutritionally enhanced biofortified peppers.This study will additionally provide a useful blueprint for the identification of the genetic architecture of nutraceuticals in all our crops.

Materials availability
The genetic materials that support the findings of this study are available from the lead contact upon request.

Transient overexpression
The total RNA from C. annuum cv.Marconi giallo and C. chinense cv.CGN22793 was isolated using a NucleoSpinÒ RNA plant kit (Macherey-Nagel) according to the manufacturer's instructions.First-strand cDNA was synthesized using 1.5 mg RNA and Prime ScriptÔ RT reagent Kit with gDNA eraser (Takara) according to the manufacturer's instructions.Full-length cDNA of fruit pericarp samples of the founder lines was amplified by using 250 ng (primers see Data S16).The entry clone was obtained through recombination of the PCR product with pDONR207 (Invitrogen).By LR recombination error-free clones were introduced into pK7FWG2. 77ransformation of leaves of C. annuum cv.CM334, N. benthamiana, and A. thaliana were performed following, 93 and Agrobacterium tumefaciens (AGL1) containing vector pBin61-p19 infiltrated with an OD 600 0.5.DM6000B/SP5 confocal laser scanning microscope (Leica Microsystems, Wetzlar, Germany) was used for verification of expression.Metabolic shifts were analyzed after 1 and 3 days using 50 mg (C.annuum, N. benthamiana) and 20 mg (A.thaliana), respectively, of agro-injected leaves and subjected to LC-MS analysis as described above using chloroform:methanol:water extraction. 94

QUANTIFICATION AND STATISTICAL ANALYSIS Population structure analyses and genetic mapping
To analyze the population structure underlying the GWAS population the software Structures 2.3.4. 81was utilized following the admixture model with correlated alleles with K 2 -8.Six independent runs with 500 burn-in generations and 5000 Markov Chain Monte Carlo generations were used for estimating K.The optimal K value was determined by DK 95 estimated with PopHelper. 96ive categories imply breeders selection (green, green-pungent, kapia, pumpkin, pungent) were chosen to analyze the population structure.
Principal component analysis, dendrogram, correlation matrix, and heatmaps were either designed by Genedata ExpressionistÒ Analyst software (Figure S7) or MetaboAnalyst 5.0. 82For GWAS the R package rMVP 78 was used employing a genomic relationship matrix, efficient mixed-model association, a-threshold of 0.05, and mixed linear model.Manhattan plots were employed by the package qqman 97 with a genome-wide threshold calculated with Bonferroni correction (-log 10 (0.05/77401)) and a suggestive threshold approximating false-discovery rate (-log 10 (1/77401)) mapped against the reference genome sequence C. annuum cv.CM334 version 1.6 from PLAZA 4.0. 98For QTL mapping R/qtl 83 was used to construct a linkage map using a permutation rate of 1000 and alpha = 0.01, and a logarithm of odds (LOD) threshold of 15 was defined for the relative intensities of 2551 mass features.

Statistical analyses
Broad-sense heritability was calculated using the following equation by treating independent accessions and BILs as a random effect and the biological replication as a replication effect: H 2 = var (G) /(var (E) +var (G) ), where var (G) and var (E) is the variance derived from genetic and environmental effects, respectively.The variation across accessions and BILs was determined by the first and third quartiles.Tajima's D in 1 Mb non-overlapping bins was calculated using VCFtools 99 and was visualized by CMplot (https://github.com/YinLiLin/CMplot).Linkage disequilibrium was estimated by squared allele-frequency correlation (r 2 ) according to the R package LDheatmap 100 in a ± 50 kb window.Orthologous genes and orthogroups were discovered by OrthoFinder. 84,101Protein sequences were obtained from genome databases: PLAZA 4.0 (https://bioinformatics.psb.ugent.be/plaza),peppergenome (peppergenome.snu.ac.kr),SolGenomics (solgenomics.net)and TAIR (arabidopsis.org).Pleiotropy analysis was conducted by the circos representation Fuji plot 102,103 using 150 and 136 specialized metabolic features and 1,780 and 1,177 SNPs of association and biparental panel based on their putative annotation and H 2 higher 95 %.Locus-IDs were assigned in a ± 50 kb window to identify with overlapping associations (QTLs) across the BIL and GWAS populations.

Figure 1 .
Figure 1.Genetic architecture of the GWAS population (A) SNP density within 1 Mb window size generated by genotyping-by-sequencing.(B) Phylogenetic analysis of 162 C. annuum accessions using 77,401 SNPs, and the table shows averaged SNPs in 1 Mb non-overlapping bins.(C) Hierarchical population structure analysis of 162 accessions of C. annuum from the Balkans.Using 77,401 single-nucleotide polymorphisms, the number of subpopulations was most likely inferred at K = 3. Population structure at K = 3 based on Q matrix assigning accessions either to primary morphological groups (upper) or the relative abundance of capsianoside IX (lower).Colors represent subpopulations.(D) Delta K showing the number of estimate populations.(E)PCA of pepper accessions using the SNP data.Accessions were classified as pungent (red), pumpkin (orange), kapia (blue), and green-pungent (purple).See also Data S1 and S8.

Figure 5 .
Figure 5. Eleven UDP-glycosyltransferases are causal for differential glycosylation of capsianosides in pepper fruit pericarp (A) Pleiotropic analysis of GWAS of fruit pericarp (orange) and leaves (70 semi-polar compounds; green; for further description, see Figure 2).(B) Genetic linkage analysis as a pairwise linkage disequilibrium heatmap in R2 in a ±50 kb window around candidate genes.(C) The cladogram comparing nucleotide sequences.(D) Amino acid sequences of the UGT cluster.Letters indicate the chromosomal location (linkage blocks see [a]) of the genes, and asterisks indicate genes cloned for transient overexpression.Letters indicate the chromosomal location (linkage blocks see [a] of the genes), and asterisks indicate genes cloned for transient overexpression.See Data S6, S12, and S15 and Figure S6.