Multi-omics analysis reveals the genetic basis of rice fragrance mediated by betaine aldehyde dehydrogenase 2

Graphical abstract


h i g h l i g h t s
A set of novel functional haplotypes were detected in the BADH2 coding region Tajima's D index suggested balancing selection in japonica rice for BADH2 316 expression quantitative trait loci (eQTLs) regulate BADH2 expression 13 trans-protein quantitative trait loci (pQTLs) were mapped on different chromosomes 15 volatiles discriminated fragrant haplotypes in PLS-DA model and VIP score g r a p h i c a l a b s t r a c t

Introduction
Rice taste is directly related to its fragrance [1] and therefore fragrant rice is widely preferred among consumers. Fragrant rice accounts for approximately 15-18% of the world's rice trade, reflecting high demand [2,3]. Rice breeders have paid particular attention to the molecular and biochemical basis of fragrance and its improvement [4,5].
Various BADH2 mutations associated with fragrance in rice have been reported [11]. In particular, an 8 bp deletion in exon 7 and a 7 bp deletion in exon 2 are frequently detected in many fragrant rice accessions [15,16]. Moreover, the strength of the fragrance differs even in presence of mutant alleles and variation in 2AP accumulation has been recorded in several fragrant varieties from South and South-East Asia, as well as the USA [7,[17][18][19][20]. This underlines the possibility that regulation at the transcriptional and translational levels influences rice fragrance along with BADH2 allelic variation.
With advances in high-throughput sequencing technologies, -omics-based studies of rice have progressed considerably, including analyses of genomes (genomics), RNA transcripts (transcriptomics), and metabolites (metabolomics) aimed at unraveling the functional variants and molecular mechanisms underlying traits for crop improvement [21]. Expression quantitative trait loci (eQTLs) and protein quantitative trait loci (pQTL) provide insight into the genetic basis of transcriptional variation and protein abundance, respectively [22,23]. These strategies are used extensively in human genetics to identify functionally relevant genetic variants associated with phenotypic variation and to reduce the gap between genomic variation, expression levels, and phenotype variation. However, integrative studies of rice fragrance linking transcript levels to phenotypic variation are still lacking. Here, we evaluate fragrance polymorphism in a Korean rice collection at the population, gene, transcript, and proteomic levels.
We used multiple -omics approaches to study BADH2 variation, and fragrance regulation. We conducted whole-genome resequencing of 475 accessions, including 54 wild rice accessions. Furthermore, we quantified BADH2 expression in a subset of 279 accessions by RNA-Seq, protein levels in a subset of 64 accessions by liquid chromatography-tandem mass spectrometry (LC-MS/ MS), and the volatile profiles in a subset of 421 accessions by HSsolid-phase microextraction (SPME)/GC-MS. By this gene-tometabolite approach, we identified four novel alleles in the BADH2 coding region, 316 eQTLs, and 13 pQTLs. Considering the importance of fragrance in the rice market, our data are expected to provide important information for rice breeding.

Material and methods
Plant materials for whole-genome resequencing A set of 137 accessions were previously selected from 25,604 rice accessions of the Korean Genebank of the Rural Development Administration (RDA) using PowerCore [24,25]. In addition, 284 varieties from RDA genebank and 54 wild rice accessions procured from the International Rice Research Institute (IRRI) were added to the collection to build a core set of 475 accessions (Table S1). This Korean World Rice Collection (475 accessions) was comprised of 421 cultivated accessions, including 305 japonica, 102 indica, 9 aus, 2 aromatic, and 3 admixture varieties.

Resequencing of 475 rice accessions
Genomic DNA of each accession was extracted from the leaf using the DNeasy Plant Mini Kit (Qiagen, Germantown, MD, USA) and subjected to whole-genome resequencing using the Illumina HiSeq 2500 sequencing platform (Illumina Inc., San Diego, CA, USA), with an average coverage of approximately 15Â. The raw sequences were aligned to the Nipponbare rice reference genome (International Rice Genome Sequencing Project IRGSP-1.0; http:// rapdb.dna.affrc.go.jp/download/irgsp1.html) using Burrows-Wheeler Aligner (BWA) version 0.7.8 with default parameters [26]. Sequence quality was checked based on the alignments using Integrative Genomics Viewer (IGV) with default parameters. The duplicate reads in the raw resequencing data were removed using PICARD version 2.14 and Samtools version 1.8. The Genome Analysis Toolkit (GATK) version 3.6 pipeline was used for SNP calling [27]. Whole-genome sequencing data from 3,000 rice accessions [28] were also evaluated using PowerCore [24].

Population structure analysis
SNPs (minor allele frequency [MAF] ! 5% and missing rate 20%) detected in the whole-genome resequencing data were used to a construct phylogenetic tree and were evaluated by a principal component analysis (PCA). FigTree version 1.4.3 (http://tree. bio.ed.ac.uk/software/figtree/) was used to construct a dendrogram based on the neighbor-joining method with 1,000 bootstrap replications. The PCA was conducted using the high-quality SNPs in TASSEL 5.0 and plots were drawn using the R package ggplot2 version 3.6.2 [29]. Population structure was evaluated using the fastStructure [30] with K values ranging from 2 to 10. Nucleotide diversity (p), population differentiation (F ST ), and Tajima's D values were calculated using VCFtools 0.1.13 with 500 bp sliding windows and 500 steps.

Analysis of BADH2 genetic variation
Genetic variation in the BADH2 gene was evaluated using the DNA variant VCF file containing the whole-genome resequencing data. Plots of SNPs and InDels in the BADH2 gene were generated using Circos version 0.67 (http://circos.ca/software/download/circos/).

Haplotype analysis of BADH2
Genetic diversity of the BADH2 gene (chromosome 8, Chr08_20379823-20385975) obtained from the DNA variant VCF file was imported into TASSEL 5.0 [31]. IRGSP-1.0 was used as the reference genome for variant calling. The sequences were aligned using MEGA7 version 7.0 [32]. Haplotype diversity was analyzed using DnaSP version 6 [33]. The aligned DNA sequences were imported into DnaSP version 6 software to calculate p, the number of polymorphic or segregating sites, ThetaW (Watterson estimator, h w ), and Tajima's D. Haplotypes were constructed by statistical parsimony using the TCS network with PopART (Population Analysis with Reticulate Trees) version 1.7 [34].
Variant positioning on the 3D protein structure A 3D model of dimeric BADH2 (where each monomer consisted of 504 amino acids) was built by homology modeling using MOD-ELLER version 9.21 and by searching for structures related to a novel gene encoding Trichomonas vaginalis lactate dehydrogenase (TvLDH) using a Python script.

RNA-Seq analysis
Total RNA was extracted from 297 rice accessions grown in the paddy field at the Kongju National University, Korea. Panicles were collected at the milky stage (after 15 days of heading) and immediately transferred to liquid nitrogen. Total RNA was extracted from milky stage panicles using the RNeasy Ò Plant Mini Kit (Qiagen) as per the manufacturer's instructions. The RNA quality was confirmed by 1.0% agarose gel electrophoresis and quantified using the NanoDrop ND-1000 (NanoDrop Technologies, Wilmington, DE, USA). In addition, the RNA 6000 Pico Kit (Agilent Technologies, Santa Clara, CA, USA) was used to evaluate RNA integrity. Five micrograms of total RNA were used as input material for RNA sample preparation and sequencing libraries were generated using the TruSeq RNA Sample Preparation Kit (Illumina). Finally, the constructed cDNA libraries were sequenced on the Illumina HiSeq 2500 sequencing platform (Illumina). Trimmomatic was used to remove adapters and low-quality reads with default parameters. The clean reads were mapped to IRGSP 1.0 using the R Trinity package. Expression levels quantified in FPKM (fragments per kilobase of transcript per million mapped reads) were standardized using z-scores (zero-mean normalization) [35] and 18 accessions with absolute values > 2 were defined as outliers. Then, the FPKM values of 279 rice accessions were normalized.

Protein extraction and digestion
Proteins were extracted following the method described by Lee et al. [36] with some modifications. In brief, rice milky stage grains were homogenized and suspended in an extraction buffer composed of Tris-HCl (100 mM, pH 8. The proteins (500 lg) were reduced with 5 mM Tris (2carboxyethyl) phosphine hydrochloride (TCEP) at room temperature for 30 min. Then, alkylation was performed with 10 mM iodoacetamide at room temperature in the dark for 30 min. Subsequently, samples were diluted with Tris-HCl (100 mM, pH 8.5) to reduce the urea concentration from 8 M to 2 M and digested with trypsin (5 lg) overnight at 37°C. Protein digestion was terminated with formic acid (5%). Then, digested samples were desalted the using SPEC PLUS PT C18 column (Agilent Technologies). SpeedVac was used to dry the solvent.

LC-MS/MS
Samples were analyzed by A Nano LC connected to a Finnigan LTQ mass spectrometer (Thermo Scientific, Waltham, MA, USA). Biphasic columns were organized using 365-lm o.d. Â 100-mm i. d. fused-silica capillaries (Polymicro Technologies, Phoenix, AZ, USA). The desalted proteins were loaded onto the column. A binary buffer system with 0.1% formic acid (buffer A) and acetonitrile in 0.1% formic acid (buffer B) was used. A linear gradient from 3% buffer B to 50% buffer B at a flow rate of 0.200 ml/min was used for separation. An 11-step program with increasing concentrations of salt solution was used for peptide elution. The run time was 120 min for each step, and approximately 22 h for the 11 steps. Peptide eluent ionization was performed by electrospraying directly into the MS/MS system, and parent-ions were scanned in the range of 400-1600 m/z. MS/MS-ion scanning of the top five most intense parent ions was performed by collusion-induced dissociation.

Association studies
The high-quality SNPs (MAF ! 5% and missing rate 20%) obtained by whole-genome resequencing were used in a genome wide analysis of associations with BADH2 transcript levels and protein abundance. To detect genetic variants associated with BADH2 transcript levels, 279 accessions were subjected to an eQTL analysis using the mixed linear model implemented in the GAPIT R package [37]. The same package was used for pQTL mapping considering SNPs from 64 accessions as markers, with the BADH2 protein abundance as the phenotype. The R package qqman (https://cran.r-project.org/web/packages/qqman/index.html) was used to draw Manhattan plots. A SNP with a significant association with BADH2 transcript expression and protein abundance was defined as an eQTL and pQTL, respectively, a cut-off À log 10 (pvalue) of < 4. A SNP mapped 1 Mb upstream or downstream of the target gene was considered as a cis-QTL, while the remaining SNPs were referred as trans-QTLs.

Analysis of volatile compounds by HS-SPME/GC-MS
The volatile profiles of rice panicles from the milky stage were analyzed according to Lee et al. [38], with slight modifications. Briefly, sample powders (500 mg) from 421 accessions (Table S1) were placed in a headspace vial and spiked with 2,4,6trimethylpyridine as an internal standard. The preheated volatiles were adsorbed onto a SPME fiber (divinylbenzene/carboxen/polydi methylsiloxane StableFlex fiber; Supelco, Bellefonte, PA, USA) and injected into a GC/MS instrument (QP2010 Ultra, Shimadzu, Japan) equipped with an Rxi-5Sil MS capillary column (30 m Â 0.25 mm ID; Restek, Bellefonte, PA, USA). Volatile compounds were identified based on the n-alkane (C8 to C20)-based retention index for each peak in comparison with the NIST08 (Shimadzu, Japan) and FFNSC 2 (Flavor and Fragrance Natural and Synthetic Compounds, version 2.0, Japan) mass spectral libraries. The peak area of the total ion chromatogram of each compound, including 2AP (m/z 83), was defined as the relative amount of compound and used for statistical analyses (SPSS, version 24). A partial least squares discriminant analysis (PLS-DA) and variable importance in projection (VIP) were performed using MetaboAnalyst 4.0 (http://www. metaboanalyst.ca/) [39] after log transformation of the data followed by Pareto scaling. We also performed the sensory test of grain aroma by using a 1.7% potassium hydroxide (KOH) solution (Supplementary methods).

Data availability
The whole-genome resequencing data for rice accessions are available in NCBI under the BioProjects PRJNA664261 and PRJNA564458. Population genetic structure of a 475-accession core set A phylogenetic tree was derived from the SNPs called from whole-genome resequencing data for the 475 accessions (Fig. 1). This core set was classified into six groups, including two major groups (indica and japonica), three small varietal groups (aus, admixture, and aromatic), and a small group of wild rice accessions. A PCA (Fig. 1b) confirmed the results of the phylogenetic analysis. The indica accessions clustered with aus, aromatic, and admixture (Fig. 1a). The japonica group was closely related to the aromatic and wild rice groups in the PCA (Fig. 1b). The wild rice group showed overlap with the japonica group in both analysis types. Furthermore, we analyzed the population structure using K values ranging from 2 to 10 and found that a K value of 6 was optimal, with clear separation among groups (Fig. 1c).

Genome-wide SNPs identified in 475 accessions
Genetic variation in BADH2 of the 475-accession core set A total of 321 variants, including 209 SNPs, 79 deletions, and 33 insertions, were identified in the seven groups for BADH2 region ( Table 2). The transition/transversion (Ts/Tv) ratio was 1.789. Moreover, the average nonsynonymous/synonymous site diversity (p non / p syn ) ratio was higher than 1, indicating a high level of nucleotide polymorphism (Table 2).
Nucleotide diversity in the BADH2 region was highest in the wild group, followed by the admixture, aus, indica, tropical-japonica, and temperate-japonica groups (Fig. 2a). The Tajima's D value was the more negative in the wild group than in temperatejaponica, indica, and tropical-japonica (Fig. 2d). The fixation index (F ST ) revealed a very high level of gene flow (low F ST ) between the wild and admixture groups, with F ST = À0.060, compared with higher estimates between the wild and aromatic groups (0.059), wild and aus groups (0.117), wild and tropical-japonica groups (0.245), wild and indica groups (0.327), and wild and temperatejaponica groups (0.580) (Fig. 2e). In addition, an analysis of novel alleles revealed that breeding between the temperate-japonica group and the aromatic group was markedly more frequent than that between the temperate-japonica and aus groups (Fig. 2f).

No signature of selection during domestication
Likelihood ratio tests showed that BADH2 did not experience significant positive selection between O. sativa and O. rufipogon or between O. sativa or O. rufipogon and other plants (Supplementary methods and Fig. S2). These results suggest that BADH2 did not show adaptive evolution after divergence from the most recent common ancestor of O. sativa and O. rufipogon, implying that variation in BADH2 was shaped by recent selection, such as domestication and recent selective pressure.
Haplotypes of BADH2 in the 475-accession core set We found 33 polymorphic sites in the BADH2 transcript region, including 26 SNPs and InDel alleles in the coding region, five alleles in the 5 0 UTR, and two alleles in the 3 0 UTR (Figs. 3, S3). Thirty-eight haplotypes were identified based on 26 SNPs within the coding region of BADH2 (Tables S2, S3). The most common haplotype (Hap_1) was found in 384 rice accessions (Fig. S4). Eight haplotypes (Hap_2 to Hap_9) belonged to cultivated rice accessions and seven (Hap_2 to Hap_8) were identified as by previously reported functional alleles (Tables S1, S2). One of the newly identified haplotypes, Hap_9, was distinguished by a substitution (C/A) at nucleotide position 476 at exon 2 (BADH2-E2-476C > A) in a cultivated accession (RWG-431). Interestingly, all of the 29 other newly identified haplotypes (Hap_10 to Hap_38) were found in wild rice accessions ( Table S3) Tables S2, S3). Of note, more haplotypes were identified in wild rice accessions than in cultivated rice accessions (Table S3). Next, we performed cloning and sequencing of the coding region of BADH2 from accessions with the four novel SNPs and confirmed the SNPs by a sequence alignment against a reference (Fig. S5).

Effects of variants on the 3D protein structure of BADH2
We analyzed the three-dimensional structures of proteins with the four nonsynonymous SNPs and found a substitution in the a-helix active site of accessions, indicating a loss of BADH2 activity (Fig. 4). In Hap_9, the SNP (C/A) resulted in a substitution from tyrosine (TAC) to a stop codon (TAA), resulting in premature termination. In Hap_16, the nonpolar alanine became the polar  threonine at 307 th amino acid. In Hap_17, leucine became phenylalanine (both nonpolar) at 436 th amino acid. In Hap_26, aspartic acid became glutamic acid (both polar) at 65 th amino acid ( Fig. 4a-d).

Expression quantitative trait loci associated with BADH2
Next, we quantified the expression levels of the BADH2 gene and performed a genome-wise association study of BADH2 expression levels based on RNA-Seq data from a subset of 279 rice accessions. We found that the expression level of Os08g0424500 (BADH2) was regulated by 316 eQTLs with -log 10 P > 4 (Fig. 5a, Table S4). Among these, 201 eQTLs were located on chromosome 8 and 185 eQTLs were within 1 Mbp of the BADH2 region at positions 20379823-20385975, suggesting that cis-SNPs had a greater contribution than trans-SNPS to BADH2 expression ( Table S4). The flanking region about 20 kb upstream and about 12 kb downstream of BADH2 had loci with the most significant effects. A total of 63 eQTLs were located on 22 genes, such as dynamin family protein (Os08g0425100), eight hypothetical conserved proteins, cytokinin oxidase/dehydrogenase 10 (Os06g0572300), and carbonic anhydrase endoglucanase 21 (Os08g0424100). The remaining 253 eQTLs were in intergenic regions (Table S4). An eQTL (9_19892312 bp) from chromosome 9 located in intergenic region of two b-glucosidase genes (Os09g0511700 and Os09g0511600) had significant effects on BADH2 expression. Glucosidase enzymes hydrolyze glycosides and release aromatic components with a natural flavor [40].

Protein quantitative trait loci for BADH2
We further quantified the level of BADH2 protein expression and performed a genome-wide association study of BADH2 protein levels. We quantified the BADH2 protein level by LC-MS/MS in 64 accessions selected based on weak to strong fragrance for the pQTL analysis. We identified 13 significant pQTLs (Àlog 10 P > 4) on chromosomes 2, 7, 8, 10, and 12 (Fig. 5b, Table S5). All five pQTLs from chromosome 8 were not within 1 Mbp of the BADH2 region. Chromosome 10:19402186 pQTL had the most significant P-value (4.57), while two pQTLs on chromosome 2 were located within Os02g0184800 and Os02g0184900, of which Os02g0184800 is a conserved hypothetical protein and Os02g0184900 is a cytochrome P450, CYP71D8L (Table S5).
We assessed evidence for an association between QTLs and 2AP accumulation using the unpaired two-samples Wilcoxon rank sum test on 2AP-producing accessions (accessions with mutant BADH2 alleles). We considered sample size ! 3, removed 'N' genotypes or missing values and selected five most significant SNP markers from eQTL and pQTL analysis. The Wilcoxon test showed that the four markers Chr08_20382857, Chr08_20374951, Chr08_20370280 and Chr08_20396752 had the most highly significant effect on the 2AP phenotype with p < 0.06 (Fig. 5c).

Volatile profiling and discriminant analysis
We carried out an aroma test with the 421 cultivated accessions from the core set. Based the panel assessment, samples were divided into four categories: nonfragrant, slightly fragrant, moderately fragrant, and strongly fragrant. In total, 50 accessions were categorized as fragrant, of which 25 accessions were slightly fragrant, 22 were moderately fragrant, and 3 were classified as strongly fragrant ( Table S1). The aceession RWG-431 with a novel allele (badh2-E2-476C > A) showed a 2AP peak and was classified as moderately fragrant (Table S1). Five accessions of wild rice (RWG-459, RWG-460, RWG-461, RWG-475 and RWG-476) with nonsynonymous substitutions were rated as slightly fragrant (Table S1).
To examine the volatile profile variation within BADH2 haplotypes, we performed PLS-DA, estimated VIP scores, and determined the indicator variables that maximize the separation between haplotypes (Fig. 6). Only haplotypes 2, 4, 6, and 7 were used for the PLS-DA, because this analysis requires a sample size of more than three. In the PLS-DA, principal component 1 (PC1) of the score-plot explained 25.6% of the total variance and PC2 explained 15.2% of the variance. Most of the accessions showed overlap on the score-plot, while some accessions from haplotypes 2, 6, and 7 were separated (Fig. 6a). We then identified the compounds that contribute to the variation observed in PLS-DA by extracting volatiles Table 2 Summary of the distribution of nucleotide polymorphisms detected in the coding region of the BADH2 gene in different groups of rice accessions and results of polymorphism, statistical, and haplotype analyses using japonica as the reference. with a high VIP score (>1.0). The VIP score plots showed that 15 compounds were the main discriminants in the PLS-DA model. Among them, 2AP, 5,9-undecadien-2-one 6,10-dimethyl-(Z), benzaldehyde, n-decanal, hexanal, octanal, phenylacetaldehyde, gamma-nonalactone, 1-heptanol, n-hexanol, and oxepane-2,7dione were significantly more abundant in haplotypes 7 and 6 than in other haplotypes. Haplotypes 7 and 6 were characterized by badh2-E13-5390C > T and an 8 bp deletion in exon 7, respectively. In addition to 2AP, benzaldehyde, n-decanal, hexanal, octanal, phenylacetaldehyde, and n-hexanol are known to contribute to rice aroma [38]. In haplotypes 2 and 4, four volatiles (tridecane-3-methylene, 3-decene-2,2-dimethyl-(3E), tridecane-5-methyl-, and toluene) were detected at significantly higher levels than those in other haplotypes (Fig. 6b); among these four volatiles, toluene is known to confer a sweet and pungent aroma [13]. Therefore, based on the multivariate analysis, 15 volatile compounds could be considered potential biomarkers for establishing the relationship between volatile compounds and fragrance.

Discussion
High demand for fragrant rice has prompted breeders to identify and characterize badh2 alleles from different genetic backgrounds and to provide genetic resources for fragrance breeding. In this study, a multi-omics analysis of a Korean collection of 475 rice accessions produced a comprehensive view of this commercially important trait. We performed whole-genome resequencing of 475 rice accessions and identified 26 alleles in the BADH2 coding region. Among these, eight alleles have been reported previously, including a 7 bp deletion in exon 2 [41,42], A/T SNP in exon 7 [43], 8 bp deletion in exon 7 [10,41,42], C/A SNP in exon 10 [42], G/A SNP in exon 10 [15,42], 3 bp deletion in exon 12 [44], C/T SNP in exon 13 [42,45], and 1-bp insertion in exon 14 [42,45,46]. Furthermore, we analyzed 3,000 rice genome project (3 K-RGP) data to assess the BADH2 polymorphisms in a large set of accessions (Supplementary Results; Tables S6-S8 (Table S1). These results indicate that the aromatic, indica, admixture, and aus groups were under positive selection or experienced selective sweeps, while japonica (with a positive value) experienced population bottlenecks or balancing selection. A total of 37 synonymous and nonsynonymous substitutions, including 21 functional SNPs (18 nonsynonymous, one insertion, and two deletions) were found in the BADH2 coding region of 543 accessions (Fig. S7, Table S8). Functional SNPs, badh2-E7-3035A > T, and 8 bp deletion in exon 7 characterized for the fragrant phenotype [12,41,47,48] were present in 156 and 110 accessions and represented two major haplotypes (Hap_14 and Hap_16) ( Tables S6,  S8). Fourteen haplotypes were distinguished by 15 novel functional SNPs (14 nonsynonymous SNPs and one deletion) detected in 77 accessions ( Table S8). The functional haplotypes identified from the Korean rice collection and 3 K-RGP data will provide important genetic resources for the development of new fragrant rice varieties.
We also tested the effect of the betaine aldehyde dehydrogenase homolog BADH1 (Os04g0464200) on Korean rice fragrance by association analyses (Supplementary Results; Tables S9-10, Fig. S10). There were no associations between 2AP accumulation and major BADH1 haplotypes (Fig. S10). Consistent with our results, He et al. [42] also recorded no association between BADH1 and fragrance in 205 rice accessions, while Singh et al. [49] reported a significant association between BADH1 protein haplotypes and aroma score in eighty rice accessions. These contradictory findings may be due to the differences in genotypes, fragrance analysis stage, and classification approaches. A transcript level analysis revealed the constitutive expression of BADH2 under normal and stress conditions and the upregulation in BADH1 expression by salt and drought stresses [12,50,51]. Various studies have revealed the involvement of BADH1 in salt stress tolerance [52][53][54][55]. Furthermore, BADH1 has a much lower affinity for GABAld and higher affinities for other aldehyde substrates than those of BADH2 [12,53,56,57]. The absence of aroma in BADH1-RNAi rice lines [52] suggests physiologically discrete roles of BADH homologs. Hence, BADH2 may substantially diminish the GABAld pool available for 2AP biosynthesis, and BADH1 cannot compensate for abrogated BADH2 activity.
Fragrant rice varieties with the same mutant allele or haplotype showed considerable variation in 2AP accumulation, consistent with previous reports [18,[44][45][46]56]. The accessions with Hap_2 carrying badh2-E10-4488C > A causing an alanine to glutamic acid substitution [42] and Hap_4 characterized by a synonymous SNP (badh2-E10-4528G > A) [15,44] did not show 2AP accumulation. Such variation in 2AP accumulation in fragrant rice can be attributed to environmental factors and variation in cultivation practices [58,59], however differences in the levels of transcripts with similar coding sequences also significantly influence plant phenotypes [60,61]. Despite studies of associations between the BADH2 region and fragrance, genetic variation at the whole-genome level associated with BADH2 gene expression levels, contributing to phenotypic variation, are lacking. In plants, most eQTL and pQTL studies are aimed at building a regulatory network to link genetic networks with phenotypic variation [62][63][64][65]. Recently, Anacleto et al. [66] reported eQTLs associated with the expression of a candidate gene, granule-bound starch synthase I (GBSSI), and revealed cis-acting functionally relevant genetic variants influencing the glycaemic index and texture in rice. Most eQTLs detected in our study were located in intergenic regions. These regions function via transcription factor binding interactions and regulatory modules and as barriers in nucleosome positioning and organization, determining DNA accessibility [64,67,68]. Intergenic regions have been identified in genome-wide association studies of various traits in human and a few plants [62,69]. Our pQTL analysis revealed the presence of trans-pQTLs, as 13 pQTLs were mapped on a different chromosome, 1 Mbp from the location of BADH2. The trans-pQTLs could be explained by many factors, like post-translational modifications, in addition to gene transcription in the regulation of protein expression [70,71]. Interestingly, we did not detect any genetic variation in 2AP precursor genes (GAPDH, TPI and P5CS) in eQTL or pQTL analysis, indicating that the correlations between these genes and 2AP accumulation reported earlier [13,14] are not directly related and rather might be due to the availability and utilization of these gene products leading to 2AP biosynthesis. We detected large number of eQTLs significantly associated with gene expression. However, associations were weaker for pQTLs, possibly because the genetic basis underlying protein expression involves more complex regulation than mRNA abundance [71][72][73]. Our study provides evidence supporting the roles of SNPs associated with BADH2 expression in 2AP variation in fragrant rice.

Conclusions
We performed whole-genome resequencing, transcriptomics, and metabolomics analyses of 475 accessions in the Korean World Rice Collection. The novel functional haplotypes identified from the Korean rice collection and 3 K-RGP data provide important genetic resources for the development of new fragrant rice varieties. High-quality SNPs from whole-genome resequencing were used in association analyses with BADH2 transcript levels and protein abundance. An array of eQTLs and pQTLs associated with BADH2 expression and protein accumulation are likely regulators mediating 2AP variation in fragrant rice. Indeed, further studies are needed to validate the candidate eQTLs and their interactions. A multivariate analysis identified 15 volatile compounds along with AP as potential biomarkers for rice fragrance. Our study demonstrates the power of integrating genetic, gene expression, and phenotype variation to gain insights into rice fragrance.