Characterization of UDP-glycosyltransferase family members reveals how major flavonoid glycoside accumulates in the roots of Scutellaria baicalensis

Flavonoid glycosides extracted from roots of Scutellaria baicalensis exhibit strong pharmaceutical antitumor, antioxidative, anti-inflammatory, and antiviral activities. UDP glycosyltransferase (UGT) family members are responsible for the transfer of a glycosyl moiety from UDP sugars to a wide range of acceptor flavonoids. Baicalin is the major flavonoid glycoside found in S. baicalensis roots, and its aglycone baicalein is synthesized from a specially evolved pathway that has been elucidated. However, it is necessary to carry out a genome-wide study of genes involved in 7-O-glucuronidation, the final biosynthesis step of baicalin, which might elucidate the relationship between the enzymes and the metabolic accumulation patterns in this medicinal plant. We reported the phylogenetic analysis, tissue-specific expression, biochemical characterization and evolutionary analysis of glucosyltransferases (SbUGTs) and glucuronosyltransferases (SbUGATs) genes based on the recently released genome of S. baicalensis. A total of 124 UGTs were identified, and over one third of them were highly expressed in roots. In vitro enzyme assays showed that 6 SbUGTs could use UDP-glucose as a sugar donor and convert baicalein to oroxin A (baicalein 7-O-glucoside), while 4 SbUGATs used only UDP-glucuronic acid as the sugar donor and catalyzed baicalein to baicalin. SbUGAT4 and SbUGT2 are the most highly expressed SbUGAT and SbUGT genes in root tissues, respectively. Kinetic measurements revealed that SbUGAT4 had a lower Km value and higher Vmax/Km ratio to baicalein than those of SbUGT2. Furthermore, tandem duplication events were detected in SbUGTs and SbUGATs. This study demonstrated that glucosylation and glucuronidation are two major glycosylated decorations in the roots of S. baicalensis. Higher expression level and affinity to substrate of SbUGAT4, and expansion of this gene family contribute high accumulation of baicalin in the root of S. baicalensis.

reported to inhibit the growth of a range of cancer cells [2]. Flavones in the roots of S. baicalensis are the major bioactive compounds responsible for these bioactivities, such as baicalein, wogonin and their glycosides. These root-specific flavones lack a 4′-OH group on their B-rings and are synthesized from a new, specially evolved pathway [3]. These compounds contribute most specific health benefits in S. baicalensis and specifically promote the apoptosis of tumor cells without toxicity in healthy cells [4,5]. Baicalein is also reported to repress the replication of COVID-19 virus by inhibiting its 3C-like protease [6,7].
Approximately 100 flavones have been reported in S. baicalensis, and glycosylation contributes dramatically to the diversity of flavone structures [2]. Glycosylation is a normal decoration for flavones that often occurs at the end of their biosynthetic pathway. This pathway plays an important role in the stabilization and enhancement of the water solubility of flavones, leading to the regulation of bioactivity, and the storage and detoxification of xenobiotics in plants [8]. Compared to their aglycones, glycosides can be easily absorbed by the human body and have the potential to improve pharmacokinetic and pharmacodynamic profiles, which makes glycosylation a promising technology for drug discovery [9]. Due to the multiple hydroxyl groups on most parent flavone backbones, regiospecific biocatalytic reactions mediated by enzymatic synthesis may be a more suitable strategy than chemical synthesis [10].
Baicalin is the 7-O-glucuronidated product of baicalein. The activity of a glucuronosyltransferase in S. baicalensis was first described in 2000 [21], and a glucosyltransferase from S. baicalensis was also characterized at the molecular level that converts baicalein to oroxin A [22]. However, it is still unclear whether other glucosyltransferase or glucuronosyltransferase members participate in flavone biosynthesis pathways and whether they play different roles in different compounds. With the availability of the S. baicalensis genome [23], it is necessary to carry out a genome-wide study of glucosyltransferases and glucuronosyltransferases in this medicinal plant and to elucidate the relationship between the enzymes and the metabolic accumulation patterns in the plant. Here, we identified and characterized the UGT family members of S. baicalensis by phylogenetic analysis, expression profiles and in vitro enzyme assays. Our results provide important clues for the biosynthesis of baicalin by using synthetic biology.

Identification of UGT genes in S. baicalensis
Based on an HMMER search of the S. baicalensis genome sequence, we identified 130 UGT candidates using 122 UGT sequences from A. thaliana as queries. The relevant sequences and gene loci are provided in Additional file 1 Table S2. After the removal of redundant sequences, a total of 124 UGT genes were obtained. The protein length of the predicted UGTs ranged from 205 to 663 residues.

Chromosomal location of UGT genes in S. baicalensis
To detect UGT genes in the S. baicalensis genome in detail, we mapped the chromosomal localization of UGTs according to gene annotation files. As shown in Fig. 2, UGT s were unevenly distributed on all 9 pseudochromosomes (Chr01-Chr09) of S. baicalensis. Chr01 contained the most UGTs (40 genes), while Chr04 contained only one UGT gene. In addition, a gene cluster was used to describe the relative positions of UGT genes. According to the definition of gene clusters in which two or more genes fell within eight open reading frames (ORFs) on the same chromosome [32], 23 gene clusters covering two to twelve UGT genes were detected (Fig. 2). These gene clusters contained a total of 73 UGT genes which represented 58.87% of all UGTs located on chromosomes.

Tissue-specific expression patterns of UGT genes in S. baicalensis
Expression patterns of the UGT genes were analyzed using FPKM values from RNA-seq data of S. baicalensis [23]. Based on the expression patterns in different tissues, UGTs could be clustered into four main groups (Fig. 3). A total of 44 UGTs from group A, accounting for 35.5% of identified UGTs, had relatively high expression levels in roots, and most of them were induced by MeJA treatment, which indicated that these UGT family members were involved in the glycosylation of root-specific flavonoids in S. baicalensis. The expression levels of UGTs from group B were higher in the aerial parts (stems, leaves, flower buds and flowers) than in the roots, while members of group C were highly expressed in flower buds and flowers. The transcripts of UGTs from group D were equivalently distributed in all tissues examined.

Gene isolation of 7-O-glycosyltransferases
Baicalin is the 7-O glucuronidated product of baicalein and is the richest flavonoid in S. baicalensis roots. To illustrate how major flavone glycoside accumulate, we identified 10 glucosyltransferases (SbUGT1-10) and 7 glucuronosyltransferase (SbUGAT1-6) candidate genes in subfamilies A and I ( Fig. 1 and Additional file 1 Table S3), which might be involved in flavonoid 7-O glycosylation. SbUGAT1.1 and SbUGAT1.2 were two distinct ORFs predicted for the same gene locus. The full-length cDNAs of all the SbUGTs and SbUGATs were successfully isolated using specific primers (Additional file 1 Table S4). The genes were then reconstructed into a prokaryotic expression vector. Amino acid sequence alignment showed that the enzymes all possessed the conserved PSPG motif, and differences in two amino acids (Trp and Arg in SbUGAT) accounted for the functional divergence of UGT and UGAT (Additional file 2 Fig. S2) [28].
The SbUGTs and SbUGATs were clearly separated in the phylogenetic tree (Fig. 4A). SbUGT1 clustered with SbUGT2 and 3, which was the sister group of SbUGATs clade. SbUGT4, 5 and 6 comprised a subgroup, while SbUGT7, 8, 9 and 10 clustered together. As shown in Fig. 4B, SbUGT1, 2, 3, 7, 8, and 9 and SbUGAT1, 2, and 4 had relatively high expression in both roots and MeJAinduced roots, suggesting that these genes could be involved in the biosynthesis of flavonoid glycosides in roots. Transcripts of SbUGT4, 5, 6, 10, and SbUGAT3, 5 were highly accumulated in stems, leaves and flowers, while SbUGAT6 seemed to be a flower bud specific gene, which was probably involved in decorations of flower pigments.

Functional characterization of SbUGTs and SbUGATs
For in vitro enzyme assays, crude proteins of the candidate SbUGTs and SbUGATs were extracted from Escherichia coli carrying the corresponding genes, respectively. Compared with the empty vector (EV) control, new peaks (Peak I) with identical retention time to oroxin A standard were detected by HPLC from 6 SbUGT proteins (SbUGT1, SbUGT2, SbUGT3, SbUGT7, SbUGT8 and SbUGT9) incubated with baicalein as a substrate and UDP-Glc as a sugar donor (Fig. 5A). These products had the same mass charge ratio (m/z) and MS/MS patterns as oroxin A standard ( Fig. 5B and Additional file 2 Fig. S4A). Correspondingly, new peaks (Peak II) were found by HPLC from 4 SbUGAT proteins (SbUGAT3, SbUGAT4, SbUGAT5 and SbUGAT6) supplemented with baicalein as a substrate and UDP-GluA as a sugar donor (Fig. 5C).

Fig. 1
Phylogenetic tree of UGTs proteins. The neighbor-joining method was used to construct the tree with bootstrap (n = 1000). Circles before the labels represent candidate genes from S. baicalensis, triangles before the labels represent functional UGTs that have been reported. The postfix of the gene ID (.p2, .p3, .p4) represented different ORFs predicted for the same gene locus. UGT1A1 from Homo sapiens was used as an outgroup The products were determined by comparing their retention time, m/z ratio and MS/MS patterns with baicalin standards ( Fig. 5D and Additional file 2 Fig. S4B). These results indicated that SbUGTs could use UDP-Glc but not UDP-GluA as a sugar donor and convert baicalein to oroxin A, while SbUGATs used only UDP-GluA as the sugar donor and catalyzed baicalein to baicalin (Fig. 5E).

Kinetic measurement of SbUGTs and SUGATs
The recombinant enzymes were purified from crude proteins that exhibited GT activity toward baicalein for kinetic analysis (Additional file 2 Fig. S4). As shown in Table 1 and Additional file 2 Fig. S5, SbUGT3 had the lowest Km (2.67 μM) among all the SbUGTs, but the highest Vmax value was detected for SbUGT2, which was 44,503 pkat mg − 1 protein, leading to a 3.36-fold higher Vmax/Km for SbUGT3 than for SbUGT2. However, SbUGT2 had the most abundant transcripts in roots compared with other SbUGTs, with an FPKM value 10.27 times that of SbUGT3. The lowest SbUGAT Km value was found for SbUGAT3, and SbUGAT6 had the highest Vmax/Km ratio. Furthermore, SbUGAT4 was the most highly expressed SbUGAT gene in root tissues. Although Vmax/Km of SbUGAT6 is slightly higher than that of SbUGAT4, SbUGAT6 has few transcripts in roots and JA-treated roots, as its FPKM is 0.36 compared to 222.29 of SbUGAT4. In S. baicalensis roots, SbUGT2 and SbU-GAT4 had the comparable expression levels, which were significantly higher than those of other SbUGT or SbU-GAT genes. Therefore, competition between SbUGT2 and SbUGAT4 would determine the metabolic patterns in S. baicalensis roots. As SbUGAT4 had a higher Vmax/Km value for baicalein, being 1.4 times higher than that of SbUGT2, this explains the large amount accumulation of baicalin, rather than oroxin A, in S. baicalensis roots. SbUGT2 and the other SbUGTs might be involved in the biosynthesis of 4′-hydroxylated flavone 7-O-glucosides, such as luteolin 7-O-glucoside and apigenin 7-O-glucoside.

Evolutionary analysis of SbUGTs and SbUGATs
Comparative genome analysis showed that 77 gene families expanded in S. baicalensisn, including the UGT gene family, which indicated the importance of glycosylation in S. baicalensis [23]. To identify the evolutionary path of SbUGTs and SbUGATs, gene syntenic relationships were analyzed between S. baicalensis, Scutellaria barbata (another medicinal plant from the genus Scutellaria accumulating baicalin) and Sesamum indicum. As shown in Fig. 6A, SbUGT1 (Sb01g31800), 2 (Sb01g31810), and 3 (Sb01g31820) were located adjacently on pseudochromosome 1 and derived from one common ancestor. In S. barbata and S. indicum there is a single gene corresponding to SbUGT1, 2 and 3 in the region syntenic to S. baicalensis, indicating that the tandem duplication of SbUGTs followed the divergence of S. baicalensis from other species of the genus Scutellaria (< 13.28 Mya) [33]. Another tandem duplication event was detected in SbUGT7 (Sb03g36130), 8 (Sb03g36140) and 9 (Sb03g36150), which were located close on pseudochromosome 3 (Fig. 6A). In S. barbata, there were three genes corresponding to SbUGT7, 8 and 9, and only one single syntenic gene was found in S. indicum, indicating that UGT gene expansion occurred in the divergence of the Lamiaceae and Pedaliaceae families (< 46.9 Mya) [23].
SbUGAT3 (Sb01g51711) and 6 (Sb09g13460) were located on pseudochromosomes 1 and 9, respectively, although we did not find homologs of these two genes in the isogenic regions of S. indicum (Fig. 6B). In S. barbata there is a single gene corresponding to SbUGAT3 and 6 in the region syntenic to S. baicalensis, suggesting that SbUGAT3 and 6 arose following the differentiation of the Lamiaceae and Pedaliaceae families (< 46.9 Mya) [23]. SbUGAT4 (Sb01g56811) and 5 (Sb01g56821) were located closely on pseudochromosome 1 and had syntenic genes in the isogenic regions of S. barbata, which were derived from one common ancestor gene in S. indicum, indicating that a tandem duplication event occurred in SbUGAT4 and 5 with the divergence of the Lamiaceae and Pedaliaceae families (< 46.9 Mya) [23]. These results revealed the importance of glycosylation in S. baicalensis, and indeed most of the flavonoids found in S. baicalensis can be glycosylated [2].

Discussion
The glycosylation catalyzed by UGTs is important for the stabilization and enhancement of the water solubility of natural products. UGT genes are also involved in the regulation of metabolic homeostasis, deactivation/ detoxification of xenobiotics, and biosynthesis, storage and transport properties of specialized metabolites [34]. UGTs occur as gene families in plant genomes. A wide range of identification of UGT genes from lower to higher plants showed that there existed at least one UGT gene in Chlamydomonas reinhardtii, which increased to 21 and  142 UGT genes in Physcomitrella patens and Selaginella moellendorffii, respectively, and a range of 56 to 242 UGT genes were identified in many vascular plants, suggesting that the expansion of UGT family occurred early in the land plant lineage and continued to expand at various rates among vascular plant lineages [34,35]. In our study, we identified 124 nonredundant UGTs by searching the S. baicalensis genome. These genes encoding sequences could be divided into 18 subfamilies (Fig. 1), consistent with the phylogeny of a collection of 246 biochemically characterized UGT protein sequences [36]. Different types of flavone UGTs were located in five distinct subfamilies (D, J, L, O and R) that correlated with their respective substrate-and region-specificities. UGTs from other subfamilies may be involved in the sugar decoration of various specialized metabolites, such as terpenoids, phenolics and cytokinins [36], which need to be further clarified. Extracts from S.baicalensis have strong antitumor, antiviral, anti-inflammatory, neuroprotective, and hepatoprotective activities owing to the rich flavonoid compounds [1]. Approximately 100 flavonoids were identified in S. baicalensis, and most of them were mainly found in the roots and were glycosylated [2]. Baicalin is the most abundant flavonoid glycoside accumulating in the roots, and its aglycone, baicalein, is a 4′-deoxyflavone that is synthesized from a root-specific pathway [3]. Newly developed genome sequencing technologies helped us elucidate the specially evolved pathway for baicalein [3,23,37]. Compared to the general flavonoid biosynthetic pathway, cinnamic acid is catalyzed by cinnamate-CoA

Table 1 Kinetic parameters and average FPKM values of roots and MeJA-induced roots of SbUGTs and SbUGATs
One asterisk (*) indicates a significant difference (0.01 < P < 0.05) between the root and MeJA-induced root under t-test. The products of SbUGT7, SbUGT8 and SbUGT9 were very low or not detected at the linear reaction stage

Enzymes
Km (μM) Vmax (pkat mg protein − 1 )  ligase-like 7 (CLL-7), chalcone synthase 2 (CHS-2), and chalcone isomerase (CHI) to form pinocembrin, a flavanone without a 4′-OH group. Pinocembrin is then converted by a specialized isoform of flavone synthase II-2 (FNSII-2) to form chrysin, which serves as the founding precursor of 4′-deoxyflavones. Chrysin is then decorated by flavone 6-hydroxylase (F6H) to produce baicalein. For the final biosynthetic step of baicalin, more than onethird of UGTs were found to be highly expressed in roots and MeJA-treated roots (Fig. 3), consistent with the accumulated patterns of root-specific flavonoid glycosides, which were also induced by JA [3]. These UGT family members might contribute to the diversification of flavonoid glycosides in S. baicalensis roots, such as baicalin and oroxin A which possess the same aglycone but different sugar moieties, wogonoside and scutellarin, which are both 7-O-glucuronidations, or linarin and iridin with different aglycones and sugar groups (Additional file 1 Table S1). SbUGAT4 has a stronger expression level and higher affinity for baicalein, meaning that this enzyme should be able to compete effectively with SbUGTs for substrate in roots (Table 1), which makes baicalin high accumulation in the roots of S. baicalensis other than oroxin A. UGT and UGAT have a broad range of substrate selectivity but narrow sugar donor adaptability. In addition to baicalein, SbUGT use UDP-Glc as an unique sugar donor convert wogonin to wogonoside, another 4′-deoxyflavone and its glycosides specifically accumulated in S. baicalensis roots [22]. SbUGT is also found to catalyze the 7-O-glucosylation of 4′-OH flavones, such as apigenin, scutellarein and kaempferol [22]. A UGAT protein purified from cultured cells of S. baicalensis exhibited 7-O-glucuronidated activity with UDP-GluA as a sugar donor in baicalein, wogonin and scutellarein, and showed lowest Km value and highest Vmax/Km ratio to baicalein [21]. Noguchi et al. found that the Trp residue (W) in the PSPG box of SbUGTs might be responsible for the better selectivity for the UDP-Glc donor, while the corresponding Arg residue (R) of SbUGATs plays a critical role in the interaction with the UDP-GluA sugar donor [28]. Homology modeling and site-directed mutagenesis analysis showed that these two key amino acid residues within the PSPG motif were vital for the substrate selectivity of UGT and UGAT because the cationic guanidinium moiety of R can be in close proximity to the anionic carboxylate of the glucuronic acid moiety of UDP-GluA. The UGAT gene is ubiquitous among Lamiales but cannot be found in Arabidopsis (Brassicaceae), suggesting that the functional differentiation of UGT and UGAT might occur locally in the lineage of specific plants [38]. However, a flavonoid 3-O-glycosyltransferase (Sb3GT1) from S. baicalensis was reported to accept five sugar donor (UDP-Glc/−Gal/−N-acetylglucosamine /−Xyl/− arabinose) to catalyze 3-O-glycosylation of 17 flavonols [15]. Molecular modeling revealed that the smaller side chains of G15 and P187 within Sb3GT1 offered a broader interspace which was critical for the sugar donor and substrate promiscuity. Sb3GT1 was clustered into subfamily D in our phylogenetic tree (Fig. 1), indicating that functional diversifications of UGTs were occurred following the specie evolution.

Vmax/Km
Gene duplications are one of the primary driving forces in the evolution of genomes and have contributed to the formation of specialized metabolites [39,40]. For example, the second gene involved in the synthesis of baicalein, SbCHS2, likely underwent several duplications to produce five gene copies encoding identical or near-identical proteins. Similar gene amplifications were also detected in SbFNSII-2, SbPFOMT and SbF8H, suggesting that increasing the gene and protein dosages resulted in greater flux along the 4′-deoxyflavone biosynthetic pathway [23]. Tandem duplication events were found in SbUGTs and SbUGATs (Fig. 6). SbUGT1, 2 and 3 were likely the products of gene duplications after the divergence of S. baicalensis and S. barbata (< 13.28 Mya), while the expansion of SbUGT7, 8 and 9, as well as SbUGAT4 and 5, occurred earlier when the Lamiaceae and Pedaliaceae families diverged (< 46.9 Mya). SbU-GAT3 and 6 were likely produced following the emergence of the Scutellaria genus due to the absence of any homologs of these two genes in the isogenic regions of S. indicum. The expansions of SbUGTs and SbUGATs are consistent with the abundant flavonoid glycosides accumulated in S. baicalensis, especially for baicalin, which might reflect changes involving our ancestors′ selection for species in Scutellaria genus with higher levels of 4′-deoxyflavones in their roots for use in traditional Chinese medicine (TCM).

Conclusions
Specialized metabolites from plants are powerful weapons for humans when challenged by a pandemic [41], such as the COVID-19 virus, which has infected 240 million people and killed over 4.5 million people as we prepared this manuscript (https:// covid 19. who. int/). Baicalein from the roots of S. baicalensis exhibited excellent performance in suppressing the replication of COVID-19 virus [6,7]. Baicalin is the 7-O glucuronidated product of baicalein converted by UGATs. The sugar moiety contributes to stronger absorptivity in the human intestine, and the absorbed baicalein can be released from baicalin by hydrolase in humans [42]. For the final biosynthetic step of baicalin, 6 SbUGTs and 4 SbUGATs cloned in this study showed 7-O-glucosylated and 7-O-glucuronidated activities to baicalein. Spatially high expression in roots and preferred to the substrate of SbUGAT4 make glucuronidation of baicalein more easily in the roots of S. baicalensis. Furthermore, the biosynthesis of baicalein directly from glucose in vitro has been achieved by E.coli fed-batch fermentation, and the production reached 214.1 mg/L [43]. Our results provide a toolkit for the biosynthesis of baicalin by using synthetic biology.

Plant materials
Root samples for metabolomic analysis were collected from 2-month-old and 2-year-old S. baicalensis plants maintained in Shanghai Chenshan Botanical Garden. Samples were ground into powder in liquid nitrogen and then freeze dried. Twenty milligram of each sample was suspended in 2 ml of 70% methanol and then extracted in an ultrasonic water bath for 2 h. After centrifugation at 12,000 g for 10 min, the supernatant was filtered through a 0.2 μm Millipore filter before metabolite analysis. Three biological replicates for each sample were collected.
Different tissues for RNA-seq were collected from 3-month-old S. baicalensis plants, including roots, stems, leaves, flower buds, flowers, and MeJA-induced roots (100 μM MeJA treatment for 24 h). All tissues were collected with three biological replicates.

Widely-target metabolomic analysis
Samples were analyzed using a UPLC-ESI-MS/MS system (UPLC, SHIMADZU Nexera X2; MS, Applied Biosystems 4500 Q TRAP) as described previously [24]. Chromatographic separation was carried out on a Waters ACQUITY UPLC HSS T3 C18 column (1.8 μm, 2.1 mm × 100 mm). The flow rate of the mobile phase consisting of 0.04% (v/v) acetic acid in water (A), and 0.04% acetic acid (v/v) with acetonitrile was set to 0.35 ml/min. The gradient program was set with starting conditions of 5% B. Within 10 min, a linear gradient to 95% B was programmed, and a composition of 95% B was held for 1 min. Subsequently, a composition of 5% B was applied within 0.10 min and maintained for 2.9 min. The column oven was set to 40 °C; the injection volume was 2 μl.
Linear ion trap (LIT) and triple quadrupole (QQQ) scans were acquired on a QQQ-LIT mass spectrometer (Q TRAP), API 6500 Q TRAP UPLC/MS/MS System, equipped with an ESI Turbo Ion-Spray interface, operating in positive and negative ion modes and controlled by Analyst 1.6.3 software (AB Sciex). The ESI source operation parameters were as follows: ion source, turbo spray; source temperature, 550 °C; ion spray (IS) voltage, 5500 V (positive ion mode)/− 4500 V (negative ion mode); ion source gas I (GSI), gas II (GSII), and curtain gas (CUR), 50, 60, and 30.0 psi, respectively; and collision gas (CAD) level, high. Instrument tuning and mass calibration were performed with 10 and 100 μmol/L polypropylene glycol solutions in QQQ and LIT modes, respectively. QQQ scans were acquired as multiple reaction monitoring (MRM) experiments with collision gas (nitrogen) set to 5 psi. The declustering potential (DP) and collision energy (CE) for individual MRM transitions were optimized. A specific set of MRM transitions was monitored for each period according to the metabolites eluted within this period.

Genome-wide identification of SbUGT and SbUGAT genes
A total of 122 UGT protein sequences of A. thaliana were downloaded from http:// www. p450. kvl. dk/ UGT. shtml# seqs and were selected to implement multiple sequence alignments by CLUSTAL X (V2.0) [44]. Then, the output alignment file was used to construct a hidden Markov model (HMM) profile by using the "hmmbuild" module within HMMER v3.0 [45]. The HMM profile was used to extract full-length UGT candidates from the S. baicalensis genome by the HMM algorithm (HMMER), filtering by a length between 200 and 600 amino acids. Finally, the UDPGT domain retrieved from Pfam PF00201 (http:// pfam. xfam. org/) was used to check the validation of the final UGT genes.
Multiple sequence alignments and phylogenetic tree construction were performed using MEGA X [46]. For the neighbor-joining tree, candidates were constructed under the default parameters with UGT sequences from A. thaliana, and UGTs with known functions under the following accession numbers: BpUGAT (AB190262), SlUGT1 (AB362989), AmUGTcg10 (AB362988), PfUGT50 (AB362991), SiUGT23 (AB362990), VvGT5 (AB499074), and Sb3GT1 (MK577650). SbUGT and SbUGAT candidates could be screened according to the annotated function and classified subfamilies. A maximum-likelihood tree was constructed under the default parameters with sequences of 7-O SbUGT and SbUGAT candidates.

Gene location visualization
The chromosomal location of the SbUGT and SbUGAT genes was determined using TBtools v1.098652 [47].

Gene cloning
The complete ORFs of the SbUGT and SbUGAT genes were amplified by RT-PCR using the primers listed in Additional file 1 Table S4. cDNA templates were chosen according to the tissue-specific expression patterns of SbUGT and SbUGAT genes. The ORFs of SbUGT1 and SbUGT10 were obtained by de novo synthesis (Gen-Script, Nanjing, China). According to the manufacturer′s instructions, fragments were cloned into the entry vector pDONR207 and prokaryotic expression vector pYes-dest17 using the Gateway BP Clonase II Enzyme Kit and LR Clonase II Enzyme Kit (Invitrogen, MA, USA), respectively.

Crude enzyme extraction and protein purification
The successfully constructed vectors were transformed into E. coli Rosetta (DE3) competent cells (Weidi Biotech, Shanghai, China). After growing at 37 °C for 12 h, transformant colonies were initially grown in 10 ml of LB liquid medium with 100 μg/ ml ampicillin at 37 °C and 180 rpm for approximately 12 h and then transferred to 200 ml of LB liquid medium with 100 μg/ml ampicillin at 37 °C in a shaking incubator until the OD 600 reached 0.6-0.8. Isopropyl β-D-thiogalactopyranoside (IPTG) was added to a final concentration of 1 mM and cultured at 16 °C and 120 rpm for 16 h. pET28a (similar backbone with pYesdest17) was used as an empty vector control.
For crude enzyme extraction, E. coli cells were harvested by centrifugation at 12,000 rpm and then resuspended in 50 mM phosphate buffer (pH 8.0) that contained 0.5 mM phenylmethanesulfonylfluoride (PMSF), 300 mM NaCl, and 2 mM β-mercaptoethanol. High-pressure cell disruption equipment (Constant Systems, Northants, UK) was used to crush the E. coli cells. After centrifugation at 4 °C and 12,000 rpm for 20 min, approximately 10 ml of supernatant (crude protein) was collected. An equal volume of 60% glycerin was added to the supernatant for storage at − 80 °C.
For protein purification, E. coli cells were harvested by centrifugation at 12,000 rpm and then resuspended in 10 ml of buffer A [50 mM phosphate buffer (pH 8.0), 0.5 mM PMSF, 300 mM NaCl, 2 mM β-mercaptoethanol and 10 mM imidazole]. High-pressure cell disruption equipment (Constant Systems, Northants, UK) was used to crush the E. coli cells. After centrifugation at 4 °C and 12,000 rpm for 20 min, the supernatant was mixed with 1 ml of Ni-nitrilotriacetic acid (NTA) agarose (Qiagen, Germany) and stirred at 4 °C for 1 h. The mixture was packed into a column and washed three times at 4 °C with 5 ml of buffer B [50 mM phosphate buffer (pH 8.0), 0.5 mM PMSF, 300 mM NaCl, 2 mM β-mercaptoethanol and 20 mM imidazole]. The protein was eluted by 1 ml of buffer C [50 mM phosphate buffer (pH 8.0), 0.5 mM PMSF, 300 mM NaCl, 2 mM β-mercaptoethanol and 250 mM imidazole], and the imidazole was removed by Ultracel regenerated cellulose membrane (Millipore, MA, USA). Protein concentrations were determined using the Bradford method [48] and analyzed by 10% SDS-polyacrylamide gel electrophoresis.

In vitro enzyme assays and kinetic studies
Crude enzyme assays were performed in a 100 μl reaction volume, which contained 100 mM Tris-HCl buffer (pH 7.0), 0.5 mM sugar donor (UDP-Glc or UDP-GluA), 5 μl of extracted protein and 100 μM substrate. The reaction was incubated for 2 h at 37 °C. Methanol was then added at a final concentration of 70% to quench the reaction. The reaction mixture was filtered with a 0.2 μm Millipore filter and analyzed by LC-MS.
For kinetics measurements, baicalein was used at concentrations ranging from 0.5 to 200 μM. The reaction time was reduced to 10 min. Km and Vmax values were calculated from the Eadie-Hofstee plot.

Standard compounds
Baicalein and baicalin were purchased from Sigma-Aldrich (St. Louis, MO, USA), and oroxin A was purchased from Yuanye-Biotech (Shanghai, China). Baicalein was dissolved in dimethyl sulfoxide (DMSO), while baicalin and oroxin A were dissolved in methanol.

Metabolite analyses
Metabolites were analyzed using an Agilent 1260 Infinity II HPLC (high-performance liquid chromatography) system. Chromatographic separation was carried out on a Phenomenex Luna C18 (2) column (100 mm × 2 mm 3 μ) with a guard column. The flow rate of the mobile phase consisting of 0.1% (v/v) formic acid in water (A) and 1:1 acetonitrile/MeOH + 0.1% formic acid (B) was set to 0.26 ml/min. The gradient program was as follows: 0-3 min, 20% B; 20 min, 50% B; 20-30 min, 50% B; 36 min, 30% B; 37 min, 20% B; and 37-43 min, 20% B. The detection wavelength was 280 nm. The injection volume was 20 μl and the column temperature was 35 °C. The products of enzyme assays were measured by comparing the area of the individual peaks with standard curves obtained from standard compounds.
LC-MS/MS was carried out by Thermo Q Exactive Plus. Chromatographic separation was carried out on a Phenomenex Luna C18 (2) column (100 mm × 2 mm 3 μ) using the same gradient described above. Mass spectra were acquired in negative ion modes with a heated ESI source, and the parameters were as follows: aus. Gas flow, 10 l/min; aus. Gas heater, 350 °C; sheath gas flow, 40 l/ min; spray voltage, 3.5 kV; capillary temperature, 320 °C. For full-scan MS/data-dependent (ddMS 2 ) analysis, spectra were recorded in the m/z range of 50-750 at a resolution of 17,500 with automatic gain control (AGC) targets of 1 × 10 6 and 2 × 10. 5

Syntenic analysis
The genomes of S. baicalensis, S. barbata and S. indicum were compared using MCScan Toolkit v1.1 [49] implemented in Python. The genomes of S. baicalensis and S. barbata were downloaded from the National Genomics Data Center (https:// bigd. big. ac. cn/ gwh) with accession numbers GWHAOTC00000000 and GWHAOTP00000000, respectively, and the genome of S. indicum v1.0 was downloaded from National Center for Biotechnology Information (NCBI) under the Bio-Project PRJNA186669. Syntenic gene pairs were identified using an all-vs-all BLAST search using LAST [50], filtered to remove pairs with scores below 0.7, and clustered into syntenic blocks in MCScan. Microsynteny plots were constructed using MCScan.
Additional file 1: Table S1. Flavonoid glycosides detected in the root metabolome. Table S2. Sequences of UGT genes identified from S. baicalensis genome. Table S3. The list of enzyme names, gene locus, and their subfamilies of predicted 7-O glycosyltransferases in S. baicalensis. Table S4. Primers used for the cloning of SbUGT and SbUGAT genes.
Additional file 2: Figure S1. Representative 7-O flavonoid glycosides detected from roots of S. baicalensis. Red boxes indicated the different groups between sugar moieties. Figure S2. Alignment of SbUGTs and SbUGATs protein sequences. The consensus sequences were highlighted by red color. The arrows indicated the different amino acid residues between SbUGTs and SbUGATs, which were responsible for the functional divergent between these two types of glycosyltransferases.