Combined Analysis of the Metabolome and Transcriptome Identified Candidate Genes Involved in Phenolic Acid Biosynthesis in the Leaves of Cyclocarya paliurus

To assess changes of metabolite content and regulation mechanism of the phenolic acid biosynthesis pathway at different developmental stages of leaves, this study performed a combined metabolome and transcriptome analysis of Cyclocarya paliurus leaves at different developmental stages. Metabolite and transcript profiling were conducted by ultra-performance liquid chromatography quadrupole time-of-flight tandem mass spectrometer and high-throughput RNA sequencing, respectively. Transcriptome identification showed that 58 genes were involved in the biosynthesis of phenolic acid. Among them, 10 differentially expressed genes were detected between every two developmental stages. Identification and quantification of metabolites indicated that 14 metabolites were located in the phenolic acid biosynthetic pathway. Among them, eight differentially accumulated metabolites were detected between every two developmental stages. Association analysis between metabolome and transcriptome showed that six differentially expressed structural genes were significantly positively correlated with metabolite accumulation and showed similar expression trends. A total of 128 transcription factors were identified that may be involved in the regulation of phenolic acid biosynthesis; these include 12 MYBs and 10 basic helix–loop–helix (bHLH) transcription factors. A regulatory network of the phenolic acid biosynthesis was established to visualize differentially expressed candidate genes that are involved in the accumulation of metabolites with significant differences. The results of this study contribute to the further understanding of phenolic acid biosynthesis during the development of leaves of C. paliurus.


Introduction
Natural phenolic compounds can be commonly found in plants and form the largest group of phytochemicals. Polyphenols are secondary metabolites of plants and contribute to the bitterness, astringency, color, flavor, odor in food [1]. Polyphenols can be divided into different groups, such as phenolic acids, flavonoids, tannins, stilbenes and lignans [1]. These compounds have different roles, such as signaling molecules, plant defense agents and auxin transport regulators [2]. In addition, these compounds have been well investigated due to their antioxidant activity and ability to scavenge free radicals. They also have various health benefits such as anti-inflammatory, antibacterial, anti-proliferative, anti-cancer, anti-oxidant properties [3][4][5].
and Family Planning Commission of China [37]. It has been suggested that the biological active substances in C. paliurus leaves are responsible for its therapeutic effects [38]. Due to its various health benefits, a large production of C. paliurus leaves is required [39]. The accumulation of phenolic acids and the expression patterns of phenolic acid biosynthesis genes in the leaves of C. paliurus have not been investigated at different developmental stages. In this study, integrated transcriptomics and metabolomics techniques were used to investigate the changes in phenolic acid content and the different expressed gene involved in the synthesis pathway of phenolic acid, we aimed to find out that: 1) the phenolic acid synthesis pathway of C. paliurus leaves; 2) the regulatory mechanisms underlying the synthesis of phenolic acid in the leaves of C. paliurus at different developmental stages. The results of the present study add information on the accumulative dynamics of phenolic compounds in the leaves of C. paliurus at different developmental stages and provide a valuable reference for determining time for the harvest of leaves.

Expression of Phenolic Acid Biosynthesis Genes in Leaves of C. Paliurus at Different Developmental Stages
The phenolic acid synthesis pathway of C. paliurus leaves was constructed based on the detected metabolites in reference to the phenylpropanoid biosynthesis in the KEGG database ( Figure 1). A total of 58 genes were identified to be involved in the phenolic acid biosynthesis pathway ( Figure 2A). Comparison of the four stages of development identified a total of 10 DEGs, including one PAL gene (TRINITY_DN87383_c2_g1), one C4H gene (TRINITY_DN83539_c2_g6), one F5H gene (TRINITY_DN86835_c0_g1), one COMT gene (TRINITY_DN95472_c1_g2), two CAD genes (TRINITY_DN88256_c1_g1, TRINITY_DN92541_c0_g1), two UGT72E genes (TRINITY_DN91186_ c0_g1, TRINITY_DN93270_c1_g1), one HCT gene (TRINITY_DN97112_c1_g2) and one caffeoyl-CoA O-methyltransferase gene (TRINITY_DN84484_c2_g1) ( Figure 2B). The expressions of two DEGs (TRINITY_DN84484_c2_g1, TRINITY_DN88256_c1_g1) decreased with progressing developmental stages and the gene expression was highest during the F1 stage. The expression of only one DEG (TRINITY_DN93270_c1_g1) increased at first and then decreased and was highest during the F2 stage. The other DEGs followed different expression trends but all of their gene expression levels were highest during the F4 stage. A total of 22 transcription factors (TFs), including 12 MYB TFs and 10 basic helix-loop-helix (bHLH) TFs, were involved in the phenolic acid biosynthesis pathway ( Figure 2C).

Identified Metabolites Involved in Phenolic Acid Biosynthesis Pathway
A total of 14 metabolites involved in the phenolic acid biosynthetic pathway were detected, including six phenolic acids (cinnamic acid, caffeic acid, chlorogenic acid, ferulic acid, sinapic acid and p-coumaric acid) ( Table 1). Differentially accumulated metabolites (DAMs) were defined as those that exhibit a fold change ≥ 2 or a fold change ≤ 0.5 and a variable importance in project (VIP) ≥ 1 between the stages of leaf development. Eight DAMs were detected, including two phenolic acids (p-coumaric acid, sinapic acid) and other metabolites were identified as syringin, coniferin, sinapyl alcohol, sinapoyl malate, L-phenylalanine and coniferyl alcohol. The DAMs contents showed different expression trends during the four developmental stages but all the eight metabolite contents were highest at the F4 stage.
(TRINITY_DN93270_c1_g1) increased at first and then decreased and was highest during the F2 stage. The other DEGs followed different expression trends but all of their gene expression levels were highest during the F4 stage. A total of 22 transcription factors (TFs), including 12 MYB TFs and 10 basic helix-loop-helix (bHLH) TFs, were involved in the phenolic acid biosynthesis pathway ( Figure 2C).

Identified Metabolites Involved in Phenolic Acid Biosynthesis Pathway
A total of 14 metabolites involved in the phenolic acid biosynthetic pathway were detected, including six phenolic acids (cinnamic acid, caffeic acid, chlorogenic acid, ferulic acid, sinapic acid and p-coumaric acid) ( Table 1). Differentially accumulated metabolites (DAMs) were defined as those that exhibit a fold change ≥ 2 or a fold change ≤ 0.5 and a variable importance in project (VIP) ≥ 1 between the stages of leaf development. Eight DAMs were detected, including two phenolic acids (pcoumaric acid, sinapic acid) and other metabolites were identified as syringin, coniferin, sinapyl

Comprehensive Analysis of Metabolome and Transcriptome
To clearly determine the association between the genes and metabolites, the transcriptome and the metabolite data were integrated by correlation analysis and networks were constructed to determine transcript-metabolite correlations. The metabolite and transcriptome data were log 2 -transformed prior to correlation analysis. Only correlation pairs with a Pearson correlation coefficient (PCC) ≥ 0.9 and a p value ≤ 0.05 were included in the analysis. A total of 64 related pairs were identified and visualized using Cytoscape software (version 3.6.1) (Supplementary material 1). The visualized network showed that a total of 47 nodes were connected by 64 edges. These 47 nodes include 32 genes and 15 metabolites. Furthermore, 43 pairs had a positive correlation and 21 pairs were negatively correlated ( Figure 3). In the phenolic acid biosynthesis, a total of six candidate DEGs were found, including PAL (TRINITY_ DN87383_c2_g1), C4H (TRINITY_ DN83539_c2_g6), CAD (TRINITY_DN92541_c0_g1), COMT (TRINITY_DN95472_c1_g2), HCT (TRINITY_DN97112_c1_g2) and UGT72E (TRINITY_ DN93270_c1_g1). These results indicated that several DEGs were highly correlated with the corresponding metabolites that are involved in phenolic acid biosynthesis. The transcriptome data validated the authenticity and accuracy of the metabolic analysis.

Screening of Transcription Factors Related to Phenolic Acid Biosynthesis and Phylogenetic Analysis
To systemically identify unknown putative transcription factors that control the phenolic acid biosynthesis of C. paliurus during the different developmental stages, co-expression analysis was employed between metabolic pathway genes and differentially expressed transcription factor (TF) genes. Six phenolic acid synthesis genes were previously screened (genes encoding PAL, C4H, CAD, COMT, UGT72E and HCT) and were selected as "guiding genes" to identify co-expression specific relationships. A total of 386 differentially expressed TFs were identified via comparison with the PlantTFDB database. Correlation analysis was performed between guiding genes and differentially expressed TFs and correlation pairs with PCC ≥ 0.9 and p-value ≤ 0.05 were selected. A total of 414 related pairs were identified and visualized using Cytoscape software (version 3.6.1) (Supplementary material 2). The visualized network showed that a total of 134 nodes were connected by 414 edges and these 134 nodes included six guiding genes and 128 TFs ( Figure 4). The members of MYB, BHLH and ERF families were the most abundant positive correlation TFs among the identified co-expressed TFs (p-value ≤ 0.05).
Since MYB and bHLH are two of the most important transcription factor families that are involved in the biosynthesis of phenolic acid [23,40], the MYB and bHLH gene of C. paliurus was selected to construct a phylogenetic tree with Arabidopsis R2R3MYB TFs and Arabidopsis bHLH TFs, respectively. 12 MYB TFs could be divided into six clusters, where the TRINITY_DN86884_c1_g7, TRINITY_DN92774_c0_g3, TRINITY_DN89360_c1_g9, TRINITY_DN91730_c2_g1 and TRINITY_ DN97533_c2_g5 genes showed homology with AtMYB39, AtCDC5 and AtMYB91 and were clustered into a biggest cluster (Supplementary material 3). The other MYB genes were divided into another five groups (Supplementary material 3). Ten bHLH TFs could be divided into eight clusters, where TRINITY_ DN97140_c3_g1 and TRINITY_ DN85072_c1_g3 showed homology AtbHLH001 and AtbHLH002 and were clustered into one cluster (Supplementary material 4).

Regulatory Network of Phenolic Acid Biosynthesis
The phenolic acid synthesis pathway of C. paliurus leaves was constructed in reference to the phenylpropane biosynthesis method in the KEGG database ( Figure 5). The network was established by combining the results of all metabolites, genes and TFs to more directly show the relationship between TF-regulated gene expression, gene expression and metabolite accumulation. Transcription factor bind to the promoter of structural genes. Therefore, we hypothesized that the TFs activate the expression of genes of by binding to the promoters of these structural genes, inducing the accumulation of the metabolites described above. The structural genes of the phenolic acid biosynthesis pathway are regulated by many MYB and bHLH TFs, including PAL, C4H, CAD, COMT and HCT genes. All MYB and bHLH TFs, screened by correlation, showed similar expression trends and were highly expressed during the F4 stage, which was similar to the trend of most structural DEGs ( Figure 2C). Specifically, 12 MYB and 10 bHLH TFs and 10 structural DEGs were candidate genes obtained by this study. These can be used for investigating the regulatory mechanism underlying phenolic acid biosynthesis in the leaves of C. paliurus at different developmental stages.

RT-qPCR Validation of the Transcriptomic Data
The abundance of gene transcripts of enzymes and TFs related to the C. paliurus phenolic acids biosynthesis were determined via qRT-PCR analysis. Based on the differences in expression levels of genes at the four different developmental stages, 18 genes were screened for qRT-PCR, including six structural genes (PAL, C4H, HCT, CAD, COMT and UGT72E), six MYB TFs and six bHLH TFs. All of these 18 selected genes had similar expression patterns than identified in the RNA-seq data ( Figure 6). Therefore, the data obtained in this study can be used to study the phenolic acids biosynthesis and metabolism-related genes in C. paliurus.

RT-qPCR Validation of the Transcriptomic Data
The abundance of gene transcripts of enzymes and TFs related to the C. paliurus phenolic acids biosynthesis were determined via qRT-PCR analysis. Based on the differences in expression levels of genes at the four different developmental stages, 18 genes were screened for qRT-PCR, including six structural genes (PAL, C4H, HCT, CAD, COMT and UGT72E), six MYB TFs and six bHLH TFs. All of these 18 selected genes had similar expression patterns than identified in the RNA-seq data ( Figure  6). Therefore, the data obtained in this study can be used to study the phenolic acids biosynthesis and metabolism-related genes in C. paliurus.

Discussion
Phenolic acids in plants are primarily derived via the phenylpropanoid biosynthetic pathway [2]. PAL, C4H, 4CL, F5H and COMT are major enzymes in the phenolic acids biosynthesis pathway,

RT-qPCR Validation of the Transcriptomic Data
The abundance of gene transcripts of enzymes and TFs related to the C. paliurus phenolic acids biosynthesis were determined via qRT-PCR analysis. Based on the differences in expression levels of genes at the four different developmental stages, 18 genes were screened for qRT-PCR, including six structural genes (PAL, C4H, HCT, CAD, COMT and UGT72E), six MYB TFs and six bHLH TFs. All of these 18 selected genes had similar expression patterns than identified in the RNA-seq data ( Figure  6). Therefore, the data obtained in this study can be used to study the phenolic acids biosynthesis and metabolism-related genes in C. paliurus.

Discussion
Phenolic acids in plants are primarily derived via the phenylpropanoid biosynthetic pathway [2]. PAL, C4H, 4CL, F5H and COMT are major enzymes in the phenolic acids biosynthesis pathway,

Discussion
Phenolic acids in plants are primarily derived via the phenylpropanoid biosynthetic pathway [2]. PAL, C4H, 4CL, F5H and COMT are major enzymes in the phenolic acids biosynthesis pathway, which action leads to the production of different phenolic acid compounds. CAD, CCR and UGT72E enzymes mediate the synthesis of different phenolic acid derivatives. HCT, 4CL and E2.1.1.104 lead to the production of different phenolic acid compounds. In this study, all the 10 genes were found differentially expressed at four developmental stages of C. paliurus leaves ( Figure 2B). However, the expression trends of the 10 genes during four different leaf developmental stages were different. For example, expression of C4H, HCT, COMT and PAL were highest during F4 stage, followed by F1 and F2 stages and lowest during F3 stage; while expression of F5H was highest during F4 stage, followed by F3 and F1 stages and lowest during F2 stage. These genes may play important roles in the accumulation of all examined phenolic acids. It is well known that phenylpropanoid biosynthesis is a very complex process and that multiple enzyme metabolites are produced; moreover, several enzymes exist in many isoforms [41]. Gayoso et al. reported that of six PAL genes examined in tomato roots, PAL2 was the most highly expressed, followed by PAL3, PAL4 and PAL6 [42]. Other reports identified two BnC4H genes in rapeseed and two Cs4CL genes in tea [43,44]. This study also showed that multiple genes encode the same enzyme and have different expression patterns. For example, five PAL genes and seven 4CL genes were found in C. paliurus. These findings suggest that different genes of the multiple gene family play different roles in the phenolic acid biosynthetic pathway. Further research is required to determine the role these genes play during the biosynthesis of specific phenolic acids.
Metabolomics is an emerging omics technology that, similar to genomics and proteomics, can be used to qualify and quantify metabolites of small molecular weight within the cells of an organism [45]. Plant metabolomics, a new field in the post-genome era [46], has been widely applied for the investigation of patterns of metabolite accumulation. Furthermore, this technology was used to investigate the underlying genetic basis of these patters by identifying genes involved in the metabolism, which is currently a topic of interest in modern plant biology. Metabolites are the final products of cell biological regulation process and their levels can be regarded as the response of plant development to genetic and environmental changes [47]. This study performed metabolomics analysis of C. paliurus leaves at different developmental stages and identified a total of 14 metabolites of phenolic acid biosynthetic pathway, eight of which were differentially accumulated. L-phenylalanine, as a precursor of the phenolic acid biosynthesis pathway, accumulated during the first three stages and reached a peak during F3 stage, while decreased significantly during the F4 stage ( Table 1). The other differentially expressed metabolites also showed different accumulation trend during the four developmental stages, they were actually highest during the F4 stage. These results indicated that phenolic acids and their derivatives accumulate in leaf during the development. Non-differentially expressed metabolites may be affected by upstream or downstream pathways and do not show similar expression trends.
Since transcriptome analysis is regarded as a central way to study the expression level, structure and function of genes to identify phenotypic traits, the combined analysis of both transcriptome and metabolome has increasingly become a popular and practical tool for the mining of new genes that may be involved in various metabolic pathways [48]. Thus, this method will likely become an effective tool that can be used to investigate the mechanisms involved in the regulation of the phenolic acid biosynthesis in C. paliurus. In this study, metabonomic data and transcriptome profiling analysis were conducted in combined analysis to identify genes involved in phenolic acid biosynthesis. This enabled the search for useful information about the accumulation and regulation of phenolic acids in the leaves of C. paliurus during different developmental stages. Six DEGs that were significantly associated with metabolites were screened by correlation network analysis (PAL, C4H, COMT, HCT, CAD and UGT72E genes). PAL is the key and start enzyme in the phenylpropanoid pathway and controls the flux of precursors into the phenol network [49]. Both PAL activity and PAL expression are closely related to the accumulation of phenolic acid [50,51]. This study showed that the expression levels of the PAL gene followed an opposite trend to that of phenylalanine but showed the same trend as the content of synthetic product cinnamic acid. These results further confirm that PAL is related to the accumulation of cinnamic acid. C4H is the second key enzyme in the general phenylpropanoid pathway and in addition to PAL, C4H determines the flux through the phenylpropanoid metabolism [52]. COMT mediates the metabolism of coumaric acid to ferulic acid [53] and is also involved in the formation of sinapoyl residues and S-lignin from 5-hydroxyferuloyl derivatives [54]. This study identified significant positive or negative correlations between C4H, COMT enzyme genes and metabolites and a similar trend was found between DEGs and metabolites. These results identified the structural genes that play an important role in the phenolic acid biosynthesis pathway in the leaves of C. paliurus at different developmental stages.
TFs are necessary for the regulation of gene expression [55]. Normally, TF proteins function through the combination of their own DNA-binding domain and the cis-acting element of their target genes [56,57]. TFs are critical proteins that regulate gene expression and signal transduction networks during plant growth and development [58]. They also participate in various biological process, including developmental regulation, defense elicitation and stress responses [59][60][61][62]. Based on correlation network analysis, a total of 128 TFs were identified to be related to guiding genes. The three largest families of TFs, APETALA2/ethylene responsive element binding protein (AP2/EREBP), MYB-(R1)R2R3 and bHLH, play a variety of roles throughout the plant life-cycle [63]. This study identified 12 MYB and 10 bHLH TFs. Although it has been considered that only MYB or bHLH could regulate the phenylpropanoid metabolism and enhance the production of secondary metabolites, many studies confirmed that they can play more prominent roles in the form of a complex (e.g., MBW, MYB/bHLH/WD40 protein complex) [64][65][66]. This study showed that the selected MYB and bHLH TFs have similar expression trends and are also similar to the changes of structural genes and metabolite content. This indicates that MYB and bHLH TFs regulate the phenolic acid biosynthesis pathway. Research has shown that the MYB/bHLH complex may significantly elevate the production of secondary metabolites [67]. Based on the phylogenetic tree constructed between the screened MYB and bHLH gene and the Arabidopsis R2R3-MYB and bHLH TF ( Figures S1 and S2), we suggested that TRINITY_DN94789_c0_g2 MYB, TRINITY_DN94332_c1_g4 MYB, TRINITY_DN97140_c3_g1 bHLH and TRINITY_DN85072_c1_g3 bHLH genes may have the function in regulating the phenolic acid biosynthesis in the leaves of C. paliurus during different developmental stages. The possible reason might be due to the clear function of the homologous Arabidopsis thaliana MYB TFs ( Figure S1, green cluster) and bHLH TFs ( Figure S2, blue cluster). Heterologous expression in Salvia miltiorrhiza of either snapdragon ROSEA1 or A. thaliana MYB75/PAP1 boosts the level of both rosmarinic and salvianolic acids. This showed that AtPAP1 was a positive transcriptional activator of the phenolic acid biosynthesis [68,69]. Overexpression of PAP1, ROSEA1, VvMYB5a or StAN1 resulted in elevated expressions of the genes encoding enzymes for PAL, C4H, 4CL, HCT, COMT and CAD [69]. The reported MYB75, MYB90, MYB111, MYB113 and MYB114 were identified to have regulatory function in the phenylpropanoid biosynthesis pathway [70]. Both of TRINITY_DN94789_c0_g2 and TRINITY_DN94332_c1_g4 MYB genes are highly similar to the CAD, PAL, HCT and COMT gene expression trends and showed significant correlations. bHLH1 showed a strong association with phenylpropanoid expression and the expression patterns of AN1, bHLH1 and WD40 suggest that they are determinants of the amounts of phenylpropanoids that a given genotype will contain [40]. These results indicate that the above two MYB TFs and two bHLH TFs may regulate the expression of these four enzyme genes and thus affect the accumulation of phenolic acids.
Although the other five groups of Arabidopsis MYB genes ( Figure S1) and all the nine groups of Arabidopsis bHLH genes ( Figure S2) have not been reported to be involved in the regulation of the phenylpropanoid pathway, they have other functions. For example, AtMYB59 regulates the plant root development by controlling the cell-cycle progression at the root tips [71] and AtMYB99 controls anther development and/or functionality [72]. Both AtMYB88 and AtMYB124/FLP act on stomatal differentiation and patterning by restricting divisions late in the stomatal cell lineage and inducing terminal differentiation [73]. AtbHLH113 in Arabidopsis interacts with PAP1/MYB75 modulating anthocyanin biosynthesis [74]. Studies showed that TFs have functional redundancy [75]. However, the function of these TFs should be verified experimentally. Further research is required to identify whether these MYB and bHLH TFs play a regulatory role in the phenolic acid biosynthetic pathway, for instance by chromatin immunoprecipitation assay or by over expressing the candidate TF in plant.
The developmental stages of C. paliurus leaves are defined as follows: youngest leaf at the F1 stage was the smallest fully expanded leaf, while the matured leaf at the F4 stage referred to full leaf enlargement and full leaf thickness; F2 and F3 referred to intermediate developmental stages between F1 and F4 [76]. For each stage, we collected samples from three plants, which yielded three biological replicates for RNA-seq analysis and metabolome analysis. All samples were immediately frozen in liquid nitrogen after collection and stored at −80 • C until further analysis.

Metabolite Extraction, Multiple Reactions Monitoring and Parameter Setting
Multiple reactions monitoring (MRM) was performed by Metware Biotechnology Co., Ltd. (Wuhan, China). Zirconia beads were added to freeze-dried leaves and were then crushed for 1.5 min at 30 Hz, using a mixing mill (MM400, Retsch, Haan, Germany). 100 mg of the powder was weighed and 1 mL of a 70% aqueous methanol solution was added, followed by extraction overnight at 4 • C. The following ESI source operation parameters were used: ion source: turbo spray; source temperature: 500 • C; ion spray voltage (IS): 5500 V; ion source gas I (GSI), gas II (GSII) and curtain gas (CUR) were set to 55, 60 and 25.0 psi, respectively; the collision gas (CAD) was high. Instrument tuning and mass calibration were performed with 10 and 100 µmol/L polypropylene glycol solutions in QQQ and LIT modes, respectively. QQQ scans were acquired as MRM experiments with collision gas (nitrogen) at 5 psi. The declustering potential (DP) and collision energy (CE) for individual MRM transitions were measured with further DP and CE optimization. A specific set of MRM transitions was monitored for each period according to the metabolites that were eluted within this period. Metabolites with significant differences in content were defined as having a variable importance in project (VIP) ≥1 and a fold change of ≥ 2 or ≤ 0.5. The MRM for each cultivar was performed in triplicate. Three spears were used for each repetition.

Transciptomic Analysis
The total RNA was extracted from each sample using the Total RNA Extractor (Trizol) kit (B511311, Sangon, Shanghai, China) according to the manufacturer's protocol. After the removal of the genomic DNA contaminations by the digestion of DNase I. Three cDNA libraries were generated for every developmental stage using the VAHTSTM mRNA-seq V2 Library Prep Kit (Illumina, San Diego, CA, USA) following the manufacturer's protocols. In total, 12 cDNA library were constructed during four developmental stages. Index codes were added to attribute sequences to the specific samples. The first strand cDNA was synthesized using random hexamer primer and M-MuLV Reverse Transcriptase (RNase H-), while the second strand cDNA synthesis was subsequently performed using DNA polymerase I and RNase H. Remaining overhangs were converted into blunt ends via exonuclease/polymerase activities. Library fragments were purified with the AMPure XP system (Beckman Coulter Company, Beverly, MA, USA). Polymerase chain reactions (PCR) were performed with Phusion High-Fidelity DNA polymerase (Thermo Fisher Scientific Inc., Waltham, MA, USA), universal PCR primers and Index (X) Primer. The library quality was assessed on the Bioanalyzer 2100 system (Agilent Technologies Inc. Santa Clara, CA, USA). After quantifying and pooling, paired-end sequencing of these libraries was performed on HiSeq XTen sequencers (Illumina, San Diego, CA, USA) by Novogen Co., Ltd. (Beijing, China).
Raw data or raw reads of high-throughput sequencing, including gene ID and sequence, were stored in the FASTA file format. Raw reads were filtered by Trimmomatic (version 0.36) to gain clean reads, which were de novo assembled into transcripts using Trinity (version 2.0.6) (parameter: min_kmer_cov 2) (Trinity Technologies, Irvine, CA, USA). Transcripts with a minimum length of 200 bp were clustered to decrease redundancy. For each cluster, the one with the longest region of high-quality sequence data was selected as a Unigene. Unigenes were blasted and annotated against NCBI Nr (NCBI non-redundant protein database), SwissProt, TrEMBL, CDD (Conserved Domain Database), Pfam and KOG (eukaryotic Orthologous Groups) databases (E-value < 1e-5). According to the priority of the best alignment results, the corresponding amino acid sequence of Unigene ORF was determined. At the same time, TransDecoder (version 3.0.1) was used to predict the CDS sequences of the unaligned Unigene. According to the annotation results of SwissProt and TrEMBL, GO (Gene Ontology Database) function annotation information was obtained. KEGG (Kyoto Encyclopedia of Genes and Genomes) annotation was conducted by KAAS (version 2.1) (KEGG automatic annotation server).

Differentially Expressed Genes Analysis
The quality control sequence was mapped to the assembled transcript using Bowtie 2 (version 2.3.2). The statistical analysis of the alignment results used RSeQC (version 2.6.1). The count and expression values of single gene readings were calculated using salmon (version 0.8.2). Transcripts per million (TPM) was used to eliminate the influence of gene lengths and sequencing discrepancies, enabling direct comparison of gene expressions between samples. The differentially expressed genes (DEGs) between the two samples were identified using DESeq2 (version 1.12.4). Genes were considered at a q-value < 0.001 and |FoldChange| > 2. When the normalized expression of a gene was zero between two samples, its expression value was adjusted to 0.01 (since 0 cannot be displayed on a log plot). If the normalized expression of a specific gene in two libraries was lower than 1, the further differential expression analysis was conducted excluding this gene.

Real-Time Quantitative PCR
Six candidate genes, six MYB and six bHLH transcription factor genes, involved in phenolic acid biosynthesis, were selected for real-time quantitative PCR verification. Real-time PCR was performed on CFX Connect (Bio-Rad Laboratories Inc. Hercules, CA, USA) by using HiScript II reverse transcriptase according to the manufacturer's protocols (Vazyme Biotech Co. Ltd., Nanjing, China). The specific primers for glycosyltransferase genes were designed using Primer Premier 5.0 (Supplementary material 5). A constitutively expressed gene (β-Actin-1) was used as internal control [77]. The relative expression values of genes were calculated as 2 −∆∆Ct . Three biological replicates and three technical replicates were used for all qRT-PCRs.

Transcription Factors Prediction and Analysis
Transcription factors were predicted with PlantTFDB [78]. The correlation coefficient between genes and metabolites and between genes and transcription factors, was calculated with the corrplot package in R-3.6.1 and the correlation network was visualized using Cytoscape software (version 3.6.1). The phylogenetic tree of the predicted transcription factors and the known transcription factors in Arabidopsis thialiana was generated with 2000 bootstrap replicates by the neighbor-joining method using software Mega X ver. 10.0.2.

Statistical Analysis
The data are displayed as means ± standard deviations (SD). One-way analysis of variance (ANOVA) was used to evaluate the significance at the 0.05 level. All statistical analyses were conducted by SPSS 17.0 software (SPSS Inc., Chicago, IL USA).

Accession Numbers
All raw data have been deposited at the National Center for Biotechnology Information (NCBI) database (under the BioProject accession number PRJNA 548403).

Conclusions
This study used a combination of metabolome and transcriptome to interpret the relationships between key genes and metabolites involved in biosynthetic pathways. Both candidate genes and metabolites were identified that are involved in the phenolic acid biosynthetic pathway. The candidate genes include six structural genes (PAL, C4H, HCT, CAD, COMT and UGT72E), 12 MYB TFs and 10 bHLH TFs. Although these genes were predicted by the bioinformatics analysis, the validation of the transcriptomic expression were verified by qRT-PCR and these candidate genes provide valuable information and useful references to better understand the regulation of the phenolic acid biosynthesis pathway and the accumulation of metabolites in the leaves of C. paliurus during different developmental stages. Nevertheless, the specific mechanism still requires further research.