The gap-free genome and multi-omics analysis of Citrus reticulata ‘Chachi’ reveal the dynamics of fruit flavonoid biosynthesis

Abstract Citrus reticulata ‘Chachi’ (CRC) has long been recognized for its nutritional benefits, health-promoting properties, and pharmacological potential. Despite its importance, the bioactive components of CRC and their biosynthetic pathways have remained largely unexplored. In this study, we introduce a gap-free genome assembly for CRC, which has a size of 312.97 Mb and a contig N50 size of 32.18 Mb. We identified key structural genes, transcription factors, and metabolites crucial to flavonoid biosynthesis through genomic, transcriptomic, and metabolomic analyses. Our analyses reveal that 409 flavonoid metabolites, accounting for 83.30% of the total identified, are highly concentrated in the early stage of fruit development. This concentration decreases as the fruit develops, with a notable decline in compounds such as hesperetin, naringin, and most polymethoxyflavones observed in later fruit development stages. Additionally, we have examined the expression of 21 structural genes within the flavonoid biosynthetic pathway, and found a significant reduction in the expression levels of key genes including 4CL, CHS, CHI, FLS, F3H, and 4′OMT during fruit development, aligning with the trend of flavonoid metabolite accumulation. In conclusion, this study offers deep insights into the genomic evolution, biosynthesis processes, and the nutritional and medicinal properties of CRC, which lay a solid foundation for further gene function studies and germplasm improvement in citrus.


Introduction
Citrus, cultivated primarily in tropical and subtropical regions, is among the most significant fruit crops, renowned for its delightful taste.While the peel of most citrus varieties is commonly discarded as inedible byproduct, the dried pericarp of Citrus reticulata 'Chachi' (CRC), notably sourced from Xinhui County in the Guangdong province of China, is widely used in cuisine and traditional Chinese medicine due to its unique f lavor and healthpromoting effects [1].The medicinal history of CRC dates back over 2000 years to the Han Dynasty, documented in the ancient Chinese materia medica, 'Shennong's Classic of Materia Medica' (also known as 'Shen Nong Ben Cao Jing') [2].Beyond its historical roots, CRC continues to play a significant role in modern clinical practice, addressing conditions such as anepithymia, indigestion, abdominal fullness, distention, vomiting, and cough with expectoration of phlegm [3].The most recent 2020 edition of the Chinese Pharmacopoeia lists CRC in 176 different prescriptions, constituting over 10% of Chinese patent medicines.Contemporary pharmaceutical investigations underscore the diverse pharmacological effects of CRC extract, including impacts on the gastrointestinal and respiratory systems, as well as exhibiting antioxidant, anti-inf lammatory, and anticancer activities [4,5].
Flavonoids and volatile oils constitute the principle bioactive constituents within CRC.The intricate repertoire of chemical compounds in CRC peel encompasses over 200 f lavonoid metabolites, including f lavonoid glycosides and polymethoxyf lavones (PMFs) [6,7].Hesperidin, nobiletin, and tangeretin, which are recognized as index compounds within CRC, have been listed in the Chinese Pharmacopoeia.The composition and distribution of f lavonoids exhibit notable variability, both temporally and spatially.For instance, studies have shown that f lavonoid Oglycosides and C-glycosides, exemplified by naringenin, naringin, and prunin, reach peak concentrations at early stages of fruit development and diminish with fruit ripening [7,8].Conversely, certain PMFs, including tangeretin and nobiletin, exhibit higher concentrations in the immediate proximity to maturation, and mature samples display elevated levels of hesperetin and apigenin [9].An association study between f lavonoid content and antioxidant efficacy corroborates the premise that CRC harvested at different times is endowed with disparate therapeutic potentialities, which highlights the necessity to understand the biosynthesis and accumulation of bioactive components during fruit development of CRC.
Genes associated with f lavonoid biosynthesis have continuously been identified, including transcription factors (TFs) that function as either positive or negative regulators, such as MBW complex components [10], AaYABBY5 [11], and MYBs [12].For Citrus grandis, a comprehensive analyses of fruit transcriptome and metabolome data suggest the involvement of CgCHS, CgCHI, Cg4CL2, and Cg4CL3 in naringenin biosynthesis.Additionally, Cg1, 2RhaT and CgFNS are found to participate in biosynthesis of naringin and rhoifolin, respectively [13].In sour orange (Citrus aurantium), CHS, CHI, CYP75B1, CYP81Q32, and nine other components exhibit a high correlation with f lavonoid accumulation during fruit development [14].Moreover, several uridine diphosphate (UDP)-sugar dependent glycosyltransferase (UGT) genes catalyzing f lavonoid glycosylation have been reported in different citrus species, which include Cm1,2RhaT, Cs1,6RhaT, and Crc1,6RhaT [15][16][17].Although recent studies have shown that MYB, C2C2-Dof, and Alfin-like TFs are likely involved in regulation of f lavonoid biosynthesis in CRC [18], a comprehensive view of f lavonoid composition and its regulation during fruit development is still lacking.
Since the genome sequencing of the first Citrus plant was completed in 2013 [19], high-quality genome assemblies have been achieved for ∼20 cultivated and wild Citrus species [20].Notably, a previous study produced a draft genome of the Mangshan wild mandarin (C.reticulata) from the Nanling Mountains of South China.This wild mandarin represents a primitive form of C. reticulata, which has been domesticated independently with substantial introgression from pummelos [21].In this study, we present a high-quality gap-free genome assembly of a cultivated line of C. reticulata 'Chachi', namely Dazhongyoushen, which is widely cultivated in Guangdong province of China.This genome assembly, together with the transcriptomic and metabolomic profiles from four major stages of fruit development, provides new insights into f lavonoid biosynthesis and nutritional properties of CRC fruit.The dataset provided in this study will also facilitate gene function analysis as well as germplasm improvement in future breeding programs.

Genome assembly and annotation of CRC
A total of 23.1 Gb nanopore long reads (N50 >40 kb) and 21.1 Gb nanopore ultralong reads (N50 >100 kb) were generated for the CRC genome (Supplementary Data Table S1).We compared the metrics of assemblies generated from each dataset, as well as a combination of both long and ultralong reads and found that the assembly derived from long reads alone showed overall higher contiguity than the others.This long read assembly was further polished by 29.4 Gb MGI short reads, followed by the removal of haplotigs.The final assembly consisted of 16 contigs, with a total size of 307.7 Mb and an N50 size of 30.2 Mb.To further improve the genome, we employed the ultralong read assembly from the same pipeline to scaffold the assembly, which yielded nine supercontigs, corresponding to the nine chromosomes of the genome.Consequently, the gap-free genome assembly reached a size of 312 974 977 bp and an N50 size of 32 187 089 bp, which closely approximates the estimated genome size of 315.6 Mb as determined by the K-mer analysis (Fig. 1a; Supplementary Data Fig.S1).This assembly exhibited significantly higher contiguity compared with the genome of the wild C. reticulata, which had a contig N50 size of 24 761 bp [21].A genome-wide search for centromeric satellite repeats and telomeric repeats (CCCTAAA/TTTAGGG) uncovered the positions of centromeres on nine chromosomes, as well as telomeres at both ends of chromosomes 2, 5, and 6, and at one end of chromosomes 1, 4, 7, and 9 (Supplementary Data Fig.S2).
BUSCO [22] evolution showed that 98.4% of the broadly conserved genes were captured by the CRC assembly.The LTR assembly index (LAI) for this genome was 24.95, exceeding the benchmark proposed for a gold-standard reference genome [23].Collinearity analysis indicated a high degree of synteny among the genomes of CRC, Citrus sinensis and Citrus clementina (Fig. 1b).In addition, the Merqury consensus quality value [24] for the assembly was 33, and the RNA read mapping rates were 95.33%.These metrics together indicated the high quality of the CRC genome assembly.

Whole-genome duplication and phylogenetic evolution
To unravel the evolutionary history of the CRC, we analyzed the synonymous substitution rates (K s ) among paralogs within this genome and compared them with those in other previously studied citrus species.We detected a clear peak of Ks values between 1 and 2, with a summit around 1.6 for all citrus genomes, suggesting that an ancient whole-genome duplication took place in the common ancestor of citrus species (Fig. 1c and d).Phylogenetic analysis involving CRC and an additional 14 plant species based on 411 single-copy orthogroups positioned CRC in close relation to C. clementina, both of which belong to modern mandarins (Fig. 1e).Gene family evolution analysis revealed that 132 gene families (comprising 1157 genes) have significantly expanded (P < 0.05) in the CRC genome compared with its last common ancestor with C. clementina, including those encoding UDP-glycosyltransferase, shikimate O-hydroxycinnamoyltransferase, and phenylalanine ammonia-lyase, which are critical components of f lavonoid biosynthesis (Supplementary Data Table S2).

Dynamics of flavonoid metabolism during fruit development of CRC
Flavonoids constitute the primary active components in CRC peels.To study when and how these metabolites were synthesized and accumulated, we sampled the fruit peels at four distinct developmental stages, including the small immature stage (SIM), immature stage (IM), near-mature stage (NM), and mature stage (M) (Fig. 2a).A comprehensive analysis using UPLC-MS-based untargeted metabolomics identified 491 f lavonoid metabolites in the peel of CRC fruit at four stages, among which f lavones were the most diverse, comprising 243 compounds (49.49%), followed by f lavonols with 115 compounds (23.42%) (Supplementary Data Table S3; Supplementary Data Fig.S3).The metabolic profiles of f lavonoids were highly reproducible between biological replicates but were remarkably dynamic among fruits at different stages (Supplementary Data Fig.S4a).In general, the majority of f lavonoid metabolites (83.3%; n = 409) exhibited high concentrations in the early developmental stages of fruit and declined during the course of fruit maturation (Supplementary Data Table S3; Supplementary Data Fig.S4b).These included key metabolites of the f lavonoid biosynthesis pathway, such as chalcone and naringenin, both of which showed a high concentration in small immature fruit but were undetectable in the subsequent three stages, as well as four f lavanones (i.e.prunin, naringin, hesperetin, and hesperetin-7-O-glucoside) that were progressively reduced by 65.49-98.96%during fruit development (Fig. 2b; Supplementary Data Table S3).In contrast, the terminal products of the pathway, including three f lavones (apigenin, luteolin, and chrysoeriol) and three f lavonols (quercetin, myricetin, and kaempferide), showed the opposite trend and accumulated in mature fruit.Moreover, the concentration of hesperidin, a primary bioactive compound in CRC fruit, was increased by 1.73-, 1.96-, and 2.05-fold in IM, NM, and M fruits, respectively, compared with that in the SIM fruit (Fig. 2b).
In addition, we identified 76 PMFs in the peel of CRC fruit, including 14 dimethoxyf lavones, 20 trimethoxyf lavones, 17 tetramethoxyf lavones, 16 pentamethoxyf lavones, 8 hexamethoxyf lavones, and 1 heptamethoxyf lavone.Among these, 45 (59.21%)PMF compounds were present in SIM fruit but were almost undetectable in the subsequent developmental stages (Supplementary Data Fig.S5).Likewise, sinensetin was most abundant in SIM fruit, followed by a gradual decrease during fruit maturation, reaching a minimum at the mature stage with a 76.97% reduction.The contents of nobiletin, tangeretin, and eight other PMF compounds also declined markedly with fruit development, while being almost stable in NM and M fruit (Supplementary Data Table S3).Conversely, retusin was the most abundant PMF compound in the IM, NM, and M fruit, but barely detectable in the SIM fruit (Supplementary Data Table S3; Supplementary Data Fig.S5).

Transcriptional regulation of flavonoid biosynthesis
We performed transcriptome sequencing using the same samples as the metabolomics analysis.PCA analysis separated the samples in a manner similar to the pattern observed in metabolomics, suggesting that transcriptional regulation dominated metabolic f lux in CRC fruit (Fig. 3a; Supplementary Data Fig.S4a).We identified 1621 genes upregulated and 1235 genes downregulated in the SIM versus IM comparison.The number of differentially expressed genes (DEGs) decreased in subsequent stages, with 754 upregulated and 1016 downregulated genes in the NM versus M comparison, and 659 upregulated and 348 downregulated genes in the IM versus NM comparison (Fig. 3b).K-Means clustering of samples based on gene expression revealed eight significant gene groups (Fig. 3c), among which the largest group (C1) was enriched with genes associated with chloroplast functions.Genes within this group were expressed at higher levels in SIM fruit than in other fruits.Conversely, groups showing gene expression positively correlated with fruit development (C4 and C5) were predominantly associated with mitochondrial functions (Fig. 3c).These data were consistent with the fact that fruit ripening was accompanied by the breakdown of chloroplasts and an increase in respiration rate.Furthermore, 189 (8.31%) of DEGs were identified as TFs (Fig. 3d), and most of them were downregulated (n = 141).
We identified 21 genes in the CRC genome that potentially participate in f lavonoid biosynthesis, based on homology searches with genes from other species (Supplementary Data Fig.S6a).The biosynthesis begins with the phenylpropanoid pathway, which involves three major enzymes: phenylalanine ammonia lyase (PAL), cinnamic acid 4-hydroxylase (C4H), and 4-coumarate:CoA ligase (4CL).Subsequently, p-coumaroyl-CoA is converted to multiple intermediates and end-products through a series of enzyme activities, including chalcone synthase (CHS), chalcone isomerase (CHI), f lavone synthase (FNS), f lavanone 3-hydroxylase (F3H), flavonoid 3 -monooxygenase (F3 H), f lavonoid-3 ,5 -hydroxylase (F3 5 H), f lavonol synthase (FLS), 4 -O-methyltransferase (4 OMT), and 5 -O-methyltransferase (5 OMT).Evolutionary analysis indicated that most of these 21 genes were under purifying selection (Supplementary Data Table S4), while expression analysis revealed distinct expression patterns for these genes during fruit development (Supplementary Data Fig.S6a).Four putative PAL genes were detected in all four stages of fruit development.Among these four genes, the CP7g002665 gene was highly expressed in SIM, IM, and M stages and, interestingly, it was significantly lower by 40.53% in the NM stage compared with its highest expression level at the M stage.The expression trend of the CP6g002191 gene was similar to that of the CP7g002665 gene.The expression of CP8g002821 was significantly lower than the other three putative PAL genes.For the two putative C4H genes, the expression of the CP1g000830 gene increased with fruit development, reaching its highest expression at the M stage.CHS genes, encoding an important rate-limiting enzyme in f lavonoid biosynthesis, converted p-coumaroyl-CoA to naringenin chalcone.Two CHS genes (CP2g002894 and CP1g000339), identified in the CRC genome, exhibited distinct expression patterns.The highest expression of CP2g002894 was observed at the SIM stage, and it decreased by 71, 66, and 80% at the IM stage, NM stage, and M stage, respectively.However, CP1g000339 showed a rising-declining trend during fruit development, with the lowest expression level at the SIM stage and the highest at the IM stage.Additionally, the expression levels of FLS1, F3H, and 7-Glct genes were similar to that of CP2g002894. 4 OMT is a critical  Moreover, we conducted a correlation analysis using a dataset comprising 21 genes and 18 metabolites across the four stages of fruit development (Supplementary Data Fig.S6b).Within the phenylpropanoid pathway, it was observed that four PAL genes lacked a consistent and obvious correlation with the majority of f lavonoids.However, two 4CL genes, CP5g001726 and CP4g000284, demonstrated a positive correlation with several f lavonoids, including naringenin chalcone, naringenin, hesperetin, diosmin, and neohesperidin.Interestingly, these genes were negatively correlated with other f lavonoids, such as hesperidin, kaempferide, quercetin, myricetin, and chrysoeriol.Naringenin chalcone, the product catalyzed by the CHS enzyme, was positively correlated with CP2g002894 (r = 0.96) and negatively with CP1g000339 (r = −0.85).The gene CP7g002886, which encodes a CHI enzyme, showed a strong positive correlation with both its substrate (naringenin chalcone) and product (naringenin), with correlations exceeding 0.90.Moreover, a significant positive correlation (r = 0.92) was found between the expression of 4 OMT and its product, hesperetin.

Identification of gene modules related to flavonoid biosynthesis
To explore the gene regulatory network governing f lavonoid synthesis in the peel of CRC fruit, we utilized a weighted gene co-expression network analysis (WGCNA) with transcriptome and metabolome data.This yielded 20 distinct co-expression modules (Fig. 4a), each assigned a unique color and correlated with f lavonoid content.Notably, two module eigengenes (MEs) were found to significantly correlate (P < 0.05) with 15 f lavonoids.The turquoise module, encompassing 3324 genes, exhibited strong positive correlations with hesperidin (r = 0.97), kaempferide (r = 0.96), luteolin (r = 0.98), quercetin (r = 0.97), apigenin (r = 0.98), and chrysoeriol (r = 0.97).Conversely, significant negative correlations were observed with hesperetin-7-O-glucoside (r = −0.99),genistein (r = −0.97),naringin (r = −0.96),and others.The blue module, containing 2271 genes, showed an opposite pattern of correlation to the turquoise module, with the highest positive correlations with f lavonoids such as hesperetin-7-O-glucoside and genistein, and negative correlations with hesperidin, kaempferide, and others.Structural genes of the f lavonoid biosynthetic pathway, including CHS, FLS, and 4 OMT, were highly expressed in the blue module during the SIM stage, aligning with f lavonoid content variations (Fig. 4b).GO enrichment analysis linked the blue module to photosynthesis and plant development processes (Fig. 4d), while the turquoise module was associated with cellular respiration and energy generation (Fig. 4c and e).
TFs were integral to the network, with 399 TFs distributed across the f lavonoid co-expression modules, predominantly from gene families like NAC, WRKY, bHLH, bZIP, and others, with nearly equal distribution between the turquoise and blue modules (Fig. 4f).Key TFs in the blue module were identified as positively related to several f lavonoids, with PIF8 and AGL25 showing extensive connectivity.In the turquoise module, TFs such as AGL5 and NAC2 exhibited strong positive correlations with specific f lavonoids, while 14 TFs from various families were identified as negatively associated with f lavonoid biosynthesis.Structural genes involved in f lavonoid biosynthesis, like C4H, CHS, CHI, FLS, and 4 OMT, showed co-expression with multiple TFs, highlighting a complex network of interactions facilitating f lavonoid biosynthesis.This comprehensive analysis underscores the intricate regulatory mechanisms inf luencing f lavonoid synthesis in CRC, critical for fruit development and quality.

Characterization of UDP-glycosyltransferase in CRC
Flavonoids in plants are often glycosylated.We identified 129 UGTs in the CRC genome, containing a conserved plant secondary product glycosyltransferase (PSPG) domain at the C-terminus of the protein (Fig. 5a).The number of UGT genes in CRC was significantly higher than that in model plants such as Arabidopsis thaliana (n = 107), but lower than in apple (n = 241) and grape (n = 181).The UGT genes were unevenly distributed across the genome, with chromosomes 2, 3, and 8 harboring nearly half of the UGT genes (Fig. 5b).Intron mapping of the UGT genes revealed that 55 UGTs (43%) lacked introns, 48 UGTs (38%) contained one intron, and 11 UGTs (9%) contained two introns, while the remaining 14 UGTs contained more than two introns.To evaluate the evolutionary relationship of the UGT gene family, a phylogenetic tree was constructed based on A. thaliana UGTs and previously reported UGT genes in other species.The UGTs in CRC were divided into 16 groups, named Group A to Group P following the common naming rules of UGT genes.In CRC, Group E had the highest number of UGTs (n = 19), followed by Groups L, I, A, and H, each containing more than 12 members.
Gene expression of UGTs in the peel of CRC at the four stages was analyzed.Out of the total UGTs, 82 (63.5%) genes were expressed at one or more stages of fruit development (TPM [transcripts per million] >1) (Fig. 5c).Among these, more than 34 UGTs showed the highest transcript levels in SIM fruit, with 10, 6, and 5 UGT genes belonging to Groups E, D, and I, respectively.

Discussion
This study reports a gap-free genome assembly for the 'Chachi' variety of C. reticulata (CRC), featuring a contig N50 size of 32 187 089 bp and highlighting its genetic complexity and evolutionary significance.CRC, recognized for its dual role as a food and medicinal plant, contains a plethora of biologically active substances.Prior research has largely focused on a limited range of compounds within CRC.Advanced techniques, such as HPLC fingerprinting, rapid-resolution LC-ESIMS and reverse-phase HPLC, have been used to identify and quantify characteristic components, including various f lavonoids, in CRC and related species [25][26][27].Despite these advancements, traditional methods often provide limited insights into a small number of bioactive compounds of CRC.
Metabolomics has emerged as a pivotal technology for identifying and quantifying various metabolites in medicinal herbs.Leveraging a state-of-the-art, widely targeted metabolomics approach, this study employed UPLC-ESI-MS/MS in conjunction with multiple reaction monitoring (MRM) scan mode for an expansive analysis of the metabolic profiles of CRC fruit peels.This comprehensive approach facilitated the identification of numerous metabolites, including various f lavonoids, offering a more extensive investigation of CRC metabolites than previously reported.Our analysis revealed distinct accumulation patterns of specific metabolites across four developmental stages of the fruit.Notably, naringenin chalcone and several f lavanones exhibited the highest concentrations during the early, SIM stage, with a progressive decrease as the fruit developed.This pattern suggests the SIM stage as critical for f lavonoid biosynthesis, with subsequent fruit growth and ripening leading to a reduction in f lavonoid content due to dilution, cell division, and increased catabolism over synthesis.Contrastingly, hesperidin, a bioactive compound extensively studied for its health benefits, showed a pattern of increasing concentration reaching its peak at the M stage.Similarly, the content trends of three f lavonoids (apigenin, luteolin, and chrysoeriol) and four f lavonols (quercetin, myricetin, kaempferide, and dihydrokaempferide) mirrored that of hesperidin.PMFs, unique to citrus, including tangeretin, nobiletin, and sinensetin, exhibited a gradual decrease from the SIM stage to fruit ripening.Retusin remained consistent in later developmental stages but was absent  The diverse structures of f lavonoids correlate with their varied pharmacological properties.For instance, hesperidin is recognized for its neuroprotective and antidiabetic effects, apigenin for its anti-adipogenesis activity, quercetin for its anti-inf lammatory, immunomodulatory, and anti-oxidative properties, and myricetin for its broad spectrum of health benefits, including immunomodulatory, analgesic, antimicrobial, anti-oxidant, antidiabetic, anticancer, and antihypertensive effects.PMFs like nobiletin and tangeretin, which decrease through the stages of CRC fruit development, have shown potential in anti-oxidative, anticardiovascular disease, and anticancer therapies.Moreover, specific PMFs present at the SIM stage demonstrated significant anti-inf lammatory and anti-obesity activities.This study underscores the importance of harvesting CRC fruit at various stages to maximize the medicinal benefits of its metabolites, providing a scientific foundation for their informed and rational use.
The biosynthetic pathway of f lavonoids has been well documented, with the identification of key structural genes encoding major enzymes across various plant species.This study integrates genomic, transcriptomic, and metabonomic analyses to shed light on the biosynthesis of f lavonoids in CRC, and identifies several critical genes.Notably, the genes encoding CHS, CHI, F3H, and FLS displayed the highest expression levels at the SIM stage, with a subsequent decline in expression.CHS, the initial enzyme in f lavonoid biosynthesis, directs the phenylpropanoid pathway towards f lavonoid production.Two CHS genes were identified in CRC fruit, exhibiting distinct expression patterns; one maintained low expression throughout fruit development, whereas the other was significantly upregulated at the SIM stage, suggesting a predominant role in f lavonoid biosynthesis.In citrus, the biosynthesis of f lavor-determining non-bitter f lavanone-7-O-rutinosides and bitter f lavanone-7-O-neohesperidosides is catalyzed by 1,6-rhamnosytransferase (1,6RhaT) and 1,2rhamnosytransferase (1,2RhaT), respectively.Our findings reveal a homolog to the 1,6RhaT gene in CRC, with its highest expression at the SIM stage, aligning with previous studies on its expression in other citrus fruits.The integrative analysis of metabolome and transcriptome provided interesting insights into the f lavonoid biosynthetic pathway in CRC.In the upstream portion, the changes in gene expression and metabolite content were consistent.For example, the expression of the CHS and CHI genes positively correlated with their products, naringenin chalcone and naringenin, respectively.However, discrepancies between gene expression and metabolite levels were primarily observed in the downstream pathway.A notable example was the FLS gene and its direct product, quercetin.Despite the highest expression level of the FLS gene occurring at the SIM stage -3.60-, 2.66-, and 4.61-fold greater than the other three fruit development stages -the quercetin content at the SIM stage was significantly lower than at the other stages.Conversely, the content of major quercetin glycosides (e.g.quercetin-3-O-robinobioside, quercetin-3-O-glucoside, and quercetin-7-O-rutinoside) at the SIM stage was ∼10-fold higher than at the other stages.Thus, when considering the derivatives of f lavonoids, such as glycosylates, the changes in synthase gene expression and metabolite production in f lavonoid biosynthesis were largely consistent.Moreover, regulation of f lavonoid biosynthesis involves the interplay of MYB and bHLH TFs, which were differentially expressed during fruit development.WGCNA analysis identified specific modules associated with hesperidin accumulation, indicating the regulatory roles of these TFs in f lavonoid biosynthesis.
While previous research has illuminated the roles of numerous TFs in f lavonoid biosynthesis, the intricacies of regulatory mechanisms within this pathway are still not fully comprehended in CRC.The WRKY TFs, one of the most expansive TF families, play critical roles in various plant processes, including growth, development, metabolism, responses to environmental stresses, and senescence.Studies in apples have shown that overexpressing MdWRKY11 and MdWRKY75 leads to increased accumulation of f lavonoids and anthocyanins.Specifically, MdWRKY11 upregulates genes such as F3H, FLS, DFR, ANS, and UFGT, contributing to f lavonoid production, while MdWRKY75 interacts with the promoters of MYB TFs, which are known not only for regulating anthocyanin synthesis but also for controlling anthocyanin transport.In our investigation, the bHLH, AP2/ERF-ERF, and MYB TF families were identified as the most prominently differentially expressed TFs across all DEGs when comparing the SIM stage versus the M stage.Subsequent analysis through WGCNA helped establish a gene-to-metabolite correlation network, pinpointing WRKY28 and PIF8 as significantly associated with several key structural genes in f lavonoid biosynthesis, including C4H, CHS, FLS, and FSH.The temporal expression patterns of these two newly identified TFs suggest their potential involvement in regulating f lavonoid biosynthesis in CRC, marking a step forward in unraveling the complex regulatory network governing this critical metabolic pathway.

Sample collection
Leaf samples of CRC were obtained from the citrus germplasm resource nursery in Guangzhou city, Guangdong province, located at a longitude of 113.3712 • and a latitude of 23.1504 • .Fruit samples of CRC were collected from Sanjiang town in the Xinhui district of Jiangmen city, with coordinates of longitude 113.1061 • and latitude 22.4679 • .Fruits were harvested from three distinct trees in late July (SIM stage), early November (IM stage), late November (NM stage), and mid-December (M stage) of year 2020.Each sampling contained at least six fruits, and all samples were immediately frozen in liquid nitrogen and stored at −80 • C until usage.

Genome sequencing, assembly, and functional annotation
The genome was sequenced using the Oxford Nanopore longread sequencing platform.NextDenovo software [28] (v2.5.2) was employed with default parameters to assemble the long reads.Subsequently, the assembly underwent polishing with NextPolish [29] (v1.4.1), utilizing both nanopore long reads and MGI short reads as inputs.The polished assembly was processed with purge_dups [30] to remove haplotigs and then scaffolded with a second assembly derived from ultralong reads.Transposable elements were identified using the EDTA package [31].Gene prediction was performed with BRAKER2 [32].The coverage of gene space was assessed using BUSCO [22].Functional annotation of predicted genes was performed by comparing with the KEGG, TrEMBL, and GenBank databases under the E-value threshold of 1E−5.GO annotation was carried out by InterProScan [33].The TFs were identified by iTAK [34] with default parameters.Telomere and centromere analyses were performed using the quarTeT pipeline [35].

Phylogeny and whole-genome duplication analysis
For phylogenetic analysis, protein sequences were clustered into orthogroups using OrthoFinder [36] with default parameters.MAFFT [37] was employed to align protein sequences, and trimAl [38] was used to trim the alignments.The resulting aligned sequences were concatenated, and a maximum likelihood phylogeny among species was constructed using IQ-TREE2 [39] with 1000 bootstraps.The divergence times were estimated using the MCMCtree module of the PAML package [40], and fossil ages used for tree calibration were obtained from a previous publication [41].Gene family expansion and contraction were analyzed using CAFE5 [42] with a P-value threshold of 0.05.For whole-genome duplication analysis, an all-against-all BLASTp search was conducted using Diamond [43] with default settings.These alignments were then utilized to identify syntenic gene pairs in collinear blocks with JCVI (https://github.com/tanghaibao/jcvi).The Yn00 module of the PAML package was utilized to calculate synonymous substitutions per synonymous site (K s ).

Identification of flavonoid biosynthetic genes
Protein sequences of CRC were used to identify candidate genes encoding key enzymes involved in the f lavonoid biosynthetic pathway under the cutoff of E-value ≤1e−5 and identity ≥40%.Query sequences for the BLAST search comprised all known CHS and CHI genes in C. sinensis, as well as those encoding potential f lavonol synthase, f lavanone 3-hydroxylase, f lavonoid 3 -hydroxylase, O-methyltransferase, and glycosyltransferase, from diverse sources such as Glycine max [44], Citrus unshiu Marc [45], Oryza sativa [46], Citrus grandis [47], and Cajanus cajan [48].

Quantification of flavonoids
Samples were prepared and extracted for metabolomic analysis following an established protocol [49].A composite sample, made up of aliquots from each fruit, was used to evaluate the stability and repeatability of the untargeted metabolomics method.The extracted samples were analyzed using a UPLC-ESI-MS/MS system comprising a Shim-pack UFLC Shimadzu CBM30A system for UPLC and an Applied Biosystems 6500 QTRAP for MS.The UPLC conditions included using an Agilent SB-C18 column (1.8 μm, 2.1 mm × 100 mm), an injection volume of 4 μl, a column temperature of 40 • C, a f low rate of 0.35 ml/min, and a solvent system of acetonitrile and water.The gradient program was set to start at 95:5 v/v at 0 min, shifting to 5:95 v/v at 9.0 min, maintained at 5:95 v/v through 10.0 min, and returning to 95:5 v/v at 11.1 min, holding until 14.0 min.The UHPLC eff luent was then directly introduced into an ESI-triple QTRAP-MS system.Optimized ESI settings included a turbo spray ion source at a temperature of 550 • C, ion spray voltages of 5.5/−4.5 kV, and gas pressures of 50 psi for gas I, 60 psi for gas II, and 25 psi for the curtain gas.QQQ scans were performed in multiple reaction monitoring (MRM) mode, with each scan optimized for de-clustering potential (DP) and collision energy (CE).
For f lavonoid quantification, all MS data were utilized to annotate metabolites based on the in-house database and widely recognized public metabolite databases such as MassBank, HMDB, ChemBank, PubChem, and METLIN.Chromatographic peak areas of corresponding metabolites across different samples were corrected and integrated for quantitative analysis using MultiQuant software.Statistical analyses, including PCA, correlation coefficient, and hierarchical clustering, were conducted using R software.The ropls module in R was employed for partial least squares-discriminant analysis (PLS-DA).Significant differences in metabolite levels between groups were identified based on a variable importance in projection (VIP) of at least 1, an absolute log 2 fold change (FC) of ≥1, and a P-value of <0.05.

Transcriptome sequencing and analysis
Total RNA was extracted from samples using TRIzol reagent, adhering to the manufacturer's instructions.RNA-Seq library construction and sequencing were entrusted to a specialized transcriptome sequencing company (Majorbio, Shanghai, China).Subsequently, 12 libraries were sequenced (NovaSeq 6000, Illumina) under paired-end sequencing mode (2 × 150 bp).The raw reads were cleaned and then aligned to the CRC genome using HISAT2 software [50].Transcript expression levels were quantified using RSEM software [51], measured as transcripts per million (TPM).DEG analysis between two samples was conducted using DESeq2 [52], with expression differences defined as |log 2 FC| > 1 and adjusted P value ≤0.05.K-means clustering analysis was carried out using the clusterGVis module in the R package.

Phylogenetic analysis of UGT family
UGTs were predicted using hmmsearch against the PF00201 domain with default parameters.Subsequently, these predicted genes were verified in the SMART database to confirm the completeness of the conserved 44-amino-acid PSPG box.Protein sequences of UGTs were aligned using MAFFT software and then trimmed with trimAl.The maximum likelihood phylogeny was constructed by IQ-TREE2 with 1000 bootstrap replicates.

Gene network analysis
The WGCNA R package was used for network analysis.A dataset comprising 16 932 genes, each with >15 counts in 75% of the samples, was employed for the co-expression network analysis.The settings for the analysis included a soft threshold power of 16, a minimum module size of 30, and a cutHeight of 0.25.Consequently, 11 844 genes were categorized into 15 tissue-specific modules, while 5088 genes were identified as outliers and grouped into the gray module.Eigengene values for each module were computed and tested for associations with metabolic data.GO enrichment analysis for the module genes was visually represented using TreeMap and REVIGO [53].Each rectangle in the visualization represented a single cluster representative; these were combined into superclusters of loosely related terms, differentiated by color.The size of each rectangle was proportional to the significance of the P-value.The resulting co-expression network was visualized using Cytoscape [54].

Figure 1 .
Figure 1.Comparative genomics of CRC and closely related species.a Genomic features of CRC.Each feature is summarized based on a 100-kb window.Tracks VII-X represent gene expression of fruit at SIM, IM, NM, and M stages, respectively.b Genome collinearity analysis of CRC, Citrus sinensis, and Citrus clementina.c Syntenic analysis within the genome of CRC.d Synonymous substitution rate (K s ) of paralogs in CRC and other species.The peak indicates that an ancient whole-genome duplication (WGD) occurred in the common ancestor of Citrus species.e Gene family expansion (red) and contraction (blue) during the evolution of representative land plant species, including CRC.PZ, MZ, and CZ indicate the Paleozoic, Mesozoic, and Cenozoic eras, respectively.The orange background indicates Citrus-related species, while green indicates other seed plants, and purple indicates bryophytes.Numbers on branches are the sizes of expanded (blue) and contracted (red) gene families at each node at P < 0.05.

Figure 2 .
Figure 2. Metabolic profiles of CRC fruits at four representative developmental stages.a Fruit of CRC at small immature stage (SIM), immature stage (IM), near mature stage (NM), and mature stage (M).b Flavonoid biosynthetic pathways and the concentration of the metabolites during fruit development.The heat map shows the normalized average concentration of each metabolite in the fruit at different developmental stages.Metabolites undetected or absent in CRC fruit are indicated with gray names.methyltransferase that catalyzes the transfer of a methyl group to eriodictyol at the 4 position of the B ring, converting it to hesperetin.The expression level of the gene encoding 4 OMT (CP9g001415) decreased gradually during fruit development, with

Figure 3 .
Figure 3. Transcriptional dynamics of CRC fruit at four stages.a Principal component analysis of fruit samples based on gene expression data.b Upset diagram showing the intersection of DEGs in different contexts of comparison.c K-means clustering analysis of genes based on their expression data.d Classification and the of numbers of differentially expressed transcription factors in the comparison between SIM and M.

Figure 4 .
Figure 4. WGCNA analysis of CRC transcriptomes.a Relationship between gene co-expression modules and contents of selected metabolites.Numbers in each cell represent a correlation coefficient and a P value (within parentheses).b, c Expression pattern of the blue and turquoise modules.The x-axis shows replicated transcriptome samples from four developmental stages; the y-axis indicates the eigengene value.Gene number of each module is shown on the top.d, e Enriched GO terms of the blue (d) and turquoise (e) module genes.f Regulatory network of TFs and f lavonoid biosynthetic genes.Dot sizes and colors represent the numbers of correlated genes.at the early stage.These findings align with previous research indicating rapid changes in f lavonoid content during CRC fruit ripening.It should be noted that although the metabolic changes between SIM and IM are significant, our sampling may have

Figure 5 .
Figure 5. Analysis of UGT gene families in the CRC genome.a Maximum likelihood phylogeny of UGT genes in CRC and other selected plant species.b Chromosomal location of UGT genes in the CRC genome.c Expression profile of UGT genes in CRC fruit.