Integrated transcriptome and metabolome analyses revealed regulatory mechanisms of flavonoid biosynthesis in Radix Ardisia

Background Radix Ardisia (Jab Bik Lik Jib) is a common Miao medicine and is widely distributed in the Guizhou region of southern China. The botanical origin of Radix Ardisia includes the dry root and rhizome of Ardisia Crenata Sims (ACS) or Ardisia Crispa (Thunb.) A.DC. (AC), which are closely related species morphologically. However, the secondary metabolites in their roots are different from one another, especially the flavonoids, and these differences have not been thoroughly explored at the molecular level. This project preliminarily identified regulatory molecular mechanisms in the biosynthetic pathways of the flavonoids between ACS and AC using a multi-omics association analysis. Methods In this study, we determined the total levels of saponin, flavonoid, and phenolic in Radix Ardisia from different origins. Integrated transcriptome and metabolome analyses were used to identify the differentially expressed genes (DEGs) and differentially expressed metabolites (DEM). We also performed conjoint analyses on DEGs and DEMs to ascertain the degree pathways, and explore the regulation of flavonoid biosynthesis. Results The total flavonoid and phenolic levels in ACS were significantly higher than in AC (P < 0.05). There were 17,685 DEGs between ACS vs. AC, 8,854 were upregulated and 8,831 were downregulated. Based on this, we continued to study the gene changes in the flavonoid biosynthesis pathway, and 100 DEGs involving flavonoid biosynthesis were differentially expressed in ACS and AC. We validated the accuracy of the RNA-seq data using qRT-PCR. Metabolomic analyses showed that 11 metabolites were involved in flavonoid biosynthesis including: Naringenin, Luteolin, Catechin, and Quercetin. A conjoint analysis of the genome-wide connection network revealed the differences in the types and levels of flavonoid compounds between ACS and AC. The correlation analysis showed that Naringenin, Luteolin, Catechin, and Quercetin were more likely to be key compounds in the flavonoid biosynthesis pathway also including 4CL, AOMT, CHS, CHI, DFR, F3’5’H, FLS, and LAR. Conclusions This study provides useful information for revealing the regulation of flavonoid biosynthesis and the regulatory relationship between metabolites and genes in the flavonoid biosynthesis pathway in Radix Ardisia from different origins.


INTRODUCTION
Radix Ardisia (hmong: Jab Bik Lik Jib, also called Ba Zhua Jin Long) is a common Miao herbal medicine in Guizhou, and is recorded in ''Tujing Materia Medica'' ''Tianbao Materia Medica'' and ''Guizhou Herbal Medicine.'' The meridian distribution of Miao medicine is determined by property and flavor; hot drugs with sweet, hot, fragrant and spicy taste are classified as cold meridian, while cold drugs with sour, bitter and wet taste are classified as hot meridian. Radix Ardisia is cold in nature and belongs to the heat meridian category. It clears heat, detoxifies, and is regarded as a laryngological medicine by the Miao. Radix Ardisia is widely used in many prescription formulas including Kaihou Jian Spray and Yangyin Kouxiang Mixture, and has a high medicinal value. The botanical origin of Radix Ardisia includes both the dry root and rhizome of Ardisia Crenata Sims, Ardisia Crispa (Thunb.) A.DC. and Ardisia Crenata Sims var. bicolor (Walker) C. Y. Wu et C. Chen, which are closely related species morphologically, but their chemical compositions and content are not exactly the same. The multiple origins leads to mixed varieties of medicinal materials, which affects both the quality and medicinal safety of Radix Ardisia.
The compounds reported in Radix Ardisia include triterpene saponins, flavonoids, and coumarins (Kobayashi & de Mejia, 2005;Liu et al., 2007;Zheng et al., 2008;Liu et al., 2016;Li et al., 2021;Jansakul, Herbert & Lennart, 1987;Yoshida, Koma & Kikuchi, 1987;Kang et al., 2001;Ma et al., 2015), with the content of the main component (bergenin) different in ACS and AC . The 17 flavonoids and 10 coumarins were separated from different origins in Radix Ardisia. There were five common components, 22 different chemical components, and the contents of the common components were significantly different (Li et al., 2021); the secondary metabolites in their roots were different, especially the flavonoids. Differences in the types and contents of chemical components in medicinal materials may lead to differences in the efficacy of different botanical origins.
Multi-omics association analysis has become a widely used biological method for genome analysis (Szymanski et al., 2014;Chen et al., 2012), so RNA-seq was performed to investigate the differentially expressed genes in the two origins of Radix Ardisia (ACS and AC). This RNA-seq analysis provided key insights into the regulatory mechanisms of flavonoid biosynthesis. We then used UHPLC-QTOF-MS techniques to scan the metabolites, and to investigate the regulatory relationship between genes and flavonoid biosynthesis, as well as screen the biomarkers of ACS and AC. The results have great significance in our understanding of the metabolic pathway of flavonoid biosynthesis and may help establish effective quality classification and evaluation methods.

Determination of total saponin, flavonoid, and phenolic content
Ardisia Crenata Sims (ACS) and Ardisia Crispa (Thunb.) A.DC. (AC) plants were collected in Guiyang County, Guizhou Province, China (N.109.437569, E.19.19680). The samples were authenticated by Professor Sheng Hua Wei and stored at the Laboratory of Traditional Chinese Medicine and Ethnic Medicine, Guizhou University of Traditional Chinese Medicine, Guiyang, Guizhou, China. The specimen accession number was GUTCM0059.
Total saponin content (TSC) was measured according to the method described by Medina-Meza et al. (2016) with minor modifications. 1 mL of ACS or AC extract was added to a 10 mL test tube, and then 0.2 mL 5% Vanillin acetic acid solution and 0.8 mL HCLO 4 were added after 6 min, vortexed vigorously, and heated at 60 • C for 10 min; after the mixture was cooled in ice water, five mL glacial acetic acid was added. The absorbance was measured at 545 nm. Glacial acetic acid was used as the blank. The TSC was obtained by comparison with the standard curve of oleanolic acid; the analysis was done in triplicate.
Total flavonoid content (TFC) was determined according to the method described by Xu et al. (2014) with some modifications. 1 mL Ardisia Crenata Sims (ACS) or Ardisia Crispa (Thunb.) DC (AC) extract was added to a 10 mL test tube, and then 2.4 mL 70% ethyl alcohol and 5% NaNO2 (0.4 mL) were added after 6 min. Then, 0.4 mL 10% Al (NO 3 ) 3 was added. After 6 min, 4 mL 4% NaOH and 70% ethyl alcohol were added at room temperature for 15 min. It was then measured against ethyl alcohol as the blank at 510 nm, a calibration curve of rutin was plotted to calculate the TFC, and the analysis was done in triplicate.
The analysis of total phenolic content (TPC) was carried out according to the Folin-Ciocalteau spectrophotometric method, with some modifications (Vieira de Morais et al., 2021). The 1 mL gallic acid (standard solution) or ACS and AC extract and 3 mL Folin-Ciocalteau solution (10% in water) were pipetted into a 25 mL test tube. After 5 min, 6 mL 10% sodium carbonate aqueous solution was added to each tube. A control was prepared by replacing the sample with distilled water. The absorbance was measured at 750 nm after 50 min. The TPC was calculated by linear regression using gallic acid as the standard; all samples were analyzed in triplicate.

RNA extraction, library preparation and sequencing
In this study, 12 root samples produced from Ardisia Crenata Sims (ACS) and Ardisia Crispa (Thunb.) DC (AC) (six biological replicates) were tested using RNA-seq. Total RNA was extracted with the RNAprep Pure Plant Kit (Tiangen, China) and detected on 1% agarose gel. The total RNA concentration was determined using an Agilent 2100 Bioanalyser, and the purity of the samples was measured using a NanoDrop TM ultraviolet spectrophotometer. The purity, concentration, and integrity of the total RNA samples were assessed prior to further analysis. After the isolation and fragmentation of the total RNA, mRNA was enriched using poly-T oligo-attached magnetic beads. The enriched mRNA was fragmented into short fragments using M-MuLV Reverse Transcriptase and reverse transcribed into cDNA using random hexamer primer. Second strand cDNA was synthesized using DNA Polymerase I and RNase H. Then, 3 µl USER Enzyme (NEB, USA) was used with size-selected, adaptor-ligated cDNA at 37 • C for 15 min followed by 5 min at 95 • C before PCR. Finally, PCR products were purified (AMPure XP system) and library quality was assessed on the Agilent Bioanalyzer 2100 system. After cluster generation, the library preparations were sequenced using an Illumina HiSeqTM 4000 platform by Biomarker Technologies (Beijing, China). In this work, twelve cDNA libraries produced from ACS and AC were sequenced using the Illumina HiSeq TM 4000 platform and 150bp paired-end reads were generated. Clean reads were obtained by removing lower quality reads as well as raw reads containing an adapter or ploy-N. Transcriptome assemblies for the twelve libraries were performed separately using Trinity (Garber et al., 2011) with min_kmer_cov set to two by default and all other parameters set to default.

DEGs related to secondary metabolism pathways
Gene expression levels were calculated using RSEM (Li & Dewey, 2011) for each sample. The differentially expressed genes (DEGs) were identified by comparing raw readcounts using the DESeq 2 (Love, Huber & Anders, 2014). Genes with FC ≥ 1.0 and p-value <0.05 were identified as DEGs. We used the OmicShare tools (https://www.omicshare.com/tools) and KOBAS 2.0 (Xie et al., 2011) software to test the statistical enrichment of DEGs in KEGG pathways, and the significance of KEGG terms was assessed using the Bonferroni corrected Fisher exact test (P < 0.05). We then selected the pathways associated with flavonoid biosynthesis for a more detailed analysis.

qRT-PCR verification
In order to verify the accuracy of the RNA-seq, a qRT-PCR analysis was used to assess the quality of the RNA-seq data. Total RNA was extracted from root tissue with TRIzol according to the manufacturer's instructions (TIANGEN, China), and the isolated RNA was reverse-transcribed into cDNA with the RevertAid First Strand cDNA Synthesis Kit (Thermo Fisher, China). The expression of randomly selected genes was monitored by qRT-PCR using the SYBR Green qPCR Mastermix (TIANGEN, China) real-time PCR system by Bio-Rad CFX 96 TM following the manufacturer's instructions. Detailed information about the primer sequences for qRT-PCR was provided by the primer3 platform (http://frodo.wi.mit.edu/primer3/; Table S6). The qRT-PCR was used on three biological replicates with the expression levels of the genes determined using the 2 −− CT method. Expression levels were normalized against the GAPDH. Data are represented as mean values ± standard deviation, and the GraphPad Prism 6 software was employed to draw the histogram.

Metabolite profiling using UHPLC-QTOF-MS
In this study, 12 samples from ACS and AC (six biological replicates, respectively) were scanned for metabolite determination by UHPLC-QTOF-MS, and the samples were consistent with RNA-seq.25 mg of root tissue sample after liquid nitrogen grinding was weighed and placed in a 1.5 ml EP tube, and 500 µL extract solution (acetonitrile: methanol: water = 2: 2: 1) containing isotopically-labelled internal standard mixture was added to each EP tube. After a 30s vortex, the samples were placed in the TissueLyser at 35 Hz for 4 min and sonicated for 5 min in an ice-water bath. The samples were then incubated for 1 h at −40 • C and centrifuged at 12,000 rpm for 15 min at 4 • C. The supernatant was transferred to a fresh EP tube and dried at room temperature. Then, the samples were reconstituted in 300 µL 50% acetonitrile by sonication for 10 min, then centrifuged at 13,000 rpm for 15 min at 4 • C, and 75 µL of supernatant was transferred to a fresh glass vial for LC-MS analysis.

Metabolomics data analysis
The UHPLC-MS/MS analyses were applied to the 1290 Infinity series UHPLC System (Agilent Technologies) coupled with a TripleTOF 6600 mass spectrometer (AB Sciex) from Biomarker Technologies (Beijing, China). The MS raw data were converted to the mzXML format by ProteoWizard, and processed by R package XCMS (version 3.2). This process included peak deconvolution, alignment and integration. Minfrac and cut off were set as 0.5 and 0.3, respectively. An in-house MS/MS database was applied for metabolites identification. The principal component analysis (PCA) and partial least squares discriminant analysis (PLS-DA) were performed using R. We employed a univariate analysis to calculate statistical significance; metabolites with VIP ≥ 1, fold change ≥ 2.0 or ≤ 0.50, and a p-value <0.05, were considered differentially expressed metabolites (DEM).
Volcano plots were used to filter the metabolites of interest based on the log2 (FC) and log10 (p-value) of the metabolites. The KEGG database was used to annotate the differentially expressed metabolites with a p-value less than 0.05 set as the threshold. The enrichment factor represented the ratio between the proportion of differentially expressed metabolites in the pathway and all the metabolites in the pathway; the greater the value, the greater the degree of enrichment.

Combined transcriptome and metabolome analyses
To uncover the regulatory mechanism of flavonoid biosynthesis, a correlation analysis was performed using Pearson's correlation coefficient to calculate the correlation coefficient (CC). The p-value of DEG and DEM, and the |CC|>0.80 and CCP <0.05 were used for the cluster analysis. The correlation matrix was imported into the Cytoscape software to visualize the DEG and DEM network. The canonical correlation analysis (CCA) is statistical technique for studying associations between two sets of variables (Hotelling, 1936), and applied to the integration of data originating from different ''omics'' technologies (Le Cao, Gonzlez & Djean, 2009).

Total saponin, flavonoid, and phenolic content in Radix Ardisia from different origins
The total saponin content (TSC), total flavonoid content (TFC), and total phenol content (TPC) in Radix Ardisia from different origins were measured using the colorimetric method. The TSC in ACS was markedly higher than in ACSV (P < 0.05), however, there was no difference between the TSC in ACS and AC. The TFC and TPC in ACS were dramatically higher than in AC (P < 0.05) (Fig. 1). The differences in gene expression at varying origins in Radix Ardisia may lead to differences in the accumulation of secondary metabolites. We performed RNA-seq to examine the differences in gene expression between ACS and AC, and explore the regulation of flavonoid biosynthesis.

RNA sequencing and assembly results
We compared transcriptional profiles, using RNA-Seq. The root samples from ACS and AC were each sequenced in six replicates. Twelve RNA-Seq libraries of root tissue generated approximately 89.73 GB of clean data. The high-quality reads totaled 52,249 Unigenes, with an average length of 1440 bp; N50 was 2336 bp, the alignment rate was more than 67.52%, and the Q30 base percentages were greater than or equal to 94.85% (Table 1). The results indicated that the assembly quality was reliable, and satisfactory for further analysis. The raw sequencing data were deposited in the NCBI Bioproject database under accession number PRJNA739135. The Unigenes data were deposited in the Transcriptome Shotgun Assembly(TSA) Database under accession number GJZC00000000, and SUBID was SUB11534240.

Identification of differentially expressed genes
A total of 52,249 Unigenes were identified, which were expressed in at least one sample (Table S1). DESeq 2 was used to analyze the differentially expressed genes (DEGs) in Ardisia Crispa (Thunb.) DC. from origins ACS and AC. There were 17,685 DEGs between ACS vs. AC: 8,854 were upregulated and 8,831 were downregulated (Table S2; Fig. 2). To confirm the reliability of the gene expression, 20 genes were randomly selected for quantitative real-time polymerase chain reaction (qRT-PCR) detection. The qRT-PCR showed that the tendency of gene expression was similar to the RNA-Seq results (Figs. 3A and 3B). The results showed that the RNA-Seq results were reliable in this study.

DEG functional enrichment analysis
A KEGG pathway analysis showed that 139 pathways were represented in the transcriptome dataset (Table S3). Interestingly, these DEGs in ACS and AC were mainly enriched in the metabolism, genetic information processing, environment information processing, cellular process, and organismal systems, involved in Inositol phosphate metabolism, Other glycan degradation, Homologous recombination, Mismatch repair, Phosphatidylinositol signaling system, Glycosylphosphatidylinositol (GPI)-anchor biosynthesis, Autophagyother, Ubiquitin mediated proteolysis, Basal transcription factors, Galactose metabolism,  Table 2).

Metabolic differences in ACS and AC
RNA-Seq results demonstrated significant differences in metabolism between ACS and AC. Therefore, we investigated the changes in the metabolic compositions of ACS and AC.
In this study, we used 12 samples to survey the differences in the metabolic constituents of ACS and AC with six biological replicates. First, the principal component (PC1) in ESI+ mode (34% of the total variables) and PC1 in ESI − mode (47.20%) were clearly separated between the ACS and AC groups (Figs. S1A and S1B). The VIP values of the first two principal components of the multivariate orthogonal partial least square discriminant (OPLS-DA) (Figs. S1C and S1D), and VIP ≥ 1, fold change ≥2.0 or ≤ 0. 50, and p-value <0.05 to screen for differentially expressed metabolites (DEM). A total of 943 and 1,250 DEMs were identified between the ACS and AC groups in the ESI+ (ESI −) mode, respectively (Table 3; Fig. 6).

Correlation analysis between RNA-seq and metabolites uncovers the regulatory pathway of flavonoid biosynthesis
To explore the correlation between gene expression and metabolites, we performed correlation analyses of the metabolites related to flavonoid biosynthesis and the transcripts. The profiles of the metabolites and gene expression in ACS and AC were compared using a correlation coefficient and a canonical correlation analysis (CCA). These correlations and the CCA analysis results are shown in Figs. S2 and S3. The results indicated that metabolites such as Succinate, Salicylic acid, 2-Hydroxyphenylacetic acid, p-Hydroxycinnamaldehyde, L-Tyrosine, Phenylpyruvate, N-Acetyl-L-phenylalanine, Phenylacetic acid, Sinapic acid, Sinapyl alcohol, 4-Hydroxycinnamic acid, Alpha-N-Phenylacetyl-L-glutamine, Naringenin, 2-Phenylacetamide, Luteolin, Catechin, and Quercetin were more likely to be regulated in the flavonoid biosynthesis pathway. The correlation analysis between flavonoid-related genes and metabolites showed that 48 Key Unigenes (4CL, AOMT, CHS, CHI, DFR, LAR. . . ) were significantly correlated with metabolites ( Fig. 8; P < 0.05, |r| >0.8; Table 4). These results could provide insight into the relationship between genetically regulated metabolites and the metabolic impact on gene expression (Fig. 9).

DISCUSSION
In this study, a high-quality transcriptome database of Ardisia Crenata Sims (ACS) and Ardisia Crispa (Thunb.) A.DC. (AC) was generated based on RNA-seq technology to demonstrate the gene expression of Radix Ardisia from different origins. The reliability of the transcriptome results was verified using qRT-PCR. The metabolites of ACS and AC were generated using the UHPLC-QTOF-MS approach. The correlation analysis between the metabolites and genes was then performed to explore the regulation of flavonoid biosynthesis in ACS and AC. The botanical origin of Radix Ardisia includes both ACS and AC, which are closely related species in appearance and structure, but their chemical composition and content are not completely the same. Because of the multiple origins of Radix Ardisia, there are many varieties of medicinal materials created from the plant, which can seriously affect the quality and medication safety of Radix Ardisia use. In the previous study, we used DNA barcoding to distinguish the original species from Radix Ardisia; the results showed that ITS sequencing can distinguish AC and ACS . The total flavone content (TFC) and total phenol content (TPC) in ACS was dramatically higher than in AC (P < 0.05). Previously, we employed the UPLC-QE-HF-MS/MS to identify and analyze the flavonoids in Radix Ardisia from different sources and were able to identify a total of 17 flavonoids, including nine flavonols, three flavane-3-alcohols and three other types of flavonoids (Li et al., 2021). The results also suggested that the difference in gene expression may lead to the difference in the synthesis and accumulation of flavonoid metabolites. Transcriptome and metabolome analyses have been widely used to study the regulatory mechanisms in the secondary metabolic biosynthetic pathways of medicinal plants, including Chamomile (Tai et al., 2020), Ginseng (Fan et al., 2019), Primula Oreodoxa (Zhao et al., 2019), Anoectochilus Roxburghii (Zhang et al., 2020, Rheum (Liu et al., 2020), Sophora Flavescens (Wei et al., 2021), Ziziphora Bungeana (He et al., 2020. Therefore, we performed comparative transcriptome and metabolome analyses to examine the differences in gene expression in ACS and AC, and explore the regulation of flavonoid biosynthesis. The DEGs involved in flavonoid biosynthesis included those integral to phenylpropanoid biosynthesis, phenylalanine metabolism, flavonoid biosynthesis, flavonol and flavone biosynthesis and anthocyanin biosynthesis. There were 128 DEGs in ACS and AC identified in the second developmental stage, including CHS, CHI, DFR, F3 5 H, FLS, LAR, and PAL. Flavonoids are a large category of secondary metabolites ubiquitous in medicinal plants. Flavonoids biosynthesize through the phenylpropanoid pathway from substrate phenylalanine initially which is then converted to cinnamic acid through PAL catalysis (Ferrer et al., 2008;Li et al., 2020). We found that the expression of PAL genes was higher in ACS vs. AC. CHS is the key enzyme in the phenylpropane synthesis of flavonoids, and the expression of CHS in ACS was higher than AC. CHS in flavonoid biosynthesis    can catalyze one molecule of p-cinnamoyl-CoA and three molecules of malonyl-CoA to produce naringenin chalcone. CHI can convert naringenin chalcone into naringenin, which can be converted by F3H into dihydrokaempferol (Qiang et al., 2020;He et al., 2018;Fig. 5).
In this study, we identified 11 metabolites involved in flavonoid biosynthesis, including the upstream phenylpropane metabolic pathway of flavonoid biosynthesis, containing L-Phenylalanine, p-Hydroxycinnamaldehyde, L-Tyrosine, Ferulic acid, Sinapic acid, Sinapyl alcohol, 4-Hydroxycinnamic acid, Naringenin, Luteolin, Catechin, and Quercetin. Compared with AC, the content of Naringenin, Luteolin, Catechin, and Quercetin were significantly increased in ACS. The regulatory network and CCA analysis showed that Naringenin, Luteolin, Catechin, and Quercetin were more likely to be key compounds in the flavonoid biosynthesis pathway, also including 4CL, AOMT, CHS, CHI, DFR, F3 5 H, FLS, and LAR. Naringenin, Luteolin, Catechin, and Quercetin, as flavanones from the flavonoids family, have shown anti-inflammatory, and antioxidant activities (Naraki, Rezaee & Karimi, 2021;Aziz, Kim & Cho, 2018;Nakano et al., 2019;Hou et al., 2019). The contents of the flavonoid components are different in ACS and AC, which may lead to differences in the efficacy of Radix Ardisia from different botanical origins. Further studies are needed to explore this possibility.

CONCLUSION
In summary, we found the total flavone content (TFC) and total phenol content (TPC) differed in the roots of ACS compared to AC. We also identified flavonoid biosynthesisrelated genes with the majority more highly expressed in ACS than in AC. Furthermore, we explored the regulatory relationship between genes and flavonoid biosynthesis and metabolism. Correlation analyses can be very helpful for pinpointing candidate regulatory genes linked to compositional changes in ACS and AC. The data generated in the study will be an invaluable resource for further studies involving functional genomics, molecular biology, and plant breeding in ACS and AC. • Jiehong Zhao analyzed the data, prepared figures and/or tables, and approved the final draft.

ACS
• Xiu Dong analyzed the data, prepared figures and/or tables, and approved the final draft.
• Ying Zhou conceived and designed the experiments, authored or reviewed drafts of the article, and approved the final draft.

Field Study Permissions
The following information was supplied relating to field study approvals (i.e., approving body and any reference numbers): Field experiments were approved by the Guizhou University of Traditional Chinese Medicine (project number: GUTCM0059).

Data Availability
The following information was supplied regarding data availability: