De novo transcriptome sequencing and anthocyanin metabolite analysis reveals leaf color of Acer pseudosieboldianum in autumn

Leaf color is an important ornamental trait of colored-leaf plants. The change of leaf color is closely related to the synthesis and accumulation of anthocyanins in leaves. Acer pseudosieboldianum is a colored-leaf tree native to Northeastern China, however, there was less knowledge in Acer about anthocyanins biosynthesis and many steps of the pathway remain unknown to date. Anthocyanins metabolite and transcript profiling were conducted using HPLC and ESI-MS/MS system and high-throughput RNA sequencing respectively. The results demonstrated that five anthocyanins were detected in this experiment. It is worth mentioning that Peonidin O-hexoside and Cyanidin 3, 5-O-diglucoside were abundant, especially Cyanidin 3, 5-O-diglucoside displayed significant differences in content change at two periods, meaning it may be play an important role for the final color. Transcriptome identification showed that a total of 67.47 Gb of clean data were obtained from our sequencing results. Functional annotation of unigenes, including comparison with COG and GO databases, yielded 35,316 unigene annotations. 16,521 differentially expressed genes were identified from a statistical analysis of differentially gene expression. The genes related to leaf color formation including PAL, ANS, DFR, F3H were selected. Also, we screened out the regulatory genes such as MYB, bHLH and WD40. Combined with the detection of metabolites, the gene pathways related to anthocyanin synthesis were analyzed. Cyanidin 3, 5-O-diglucoside played an important role for the final color. The genes related to leaf color formation including PAL, ANS, DFR, F3H and regulatory genes such as MYB, bHLH and WD40 were selected. This study enriched the available transcriptome information for A. pseudosieboldianum and identified a series of differentially expressed genes related to leaf color, which provides valuable information for further study on the genetic mechanism of leaf color expression in A. pseudosieboldianum.


Background
Leaf color is one of the most important characteristics of ornamental plants, and plants with colored foliage were often called "colored-leaf plants" [1,2]. Some researchers have analyzed and determined systematically the pigments and physiological indexes of the leaves of coloredleaf plants [3,4]. Result showed that the change of leaf color is closely related to the synthesis and accumulation of anthocyanins in leaves [5]. Anthocyanins are one of the important secondary metabolites of plants and they often have anti-cancer, anti-oxidation and antiatherosclerosis properties [6]. Anthocyanins confer orange, red, magenta, violet and blue and the biosynthetic pathway leading to floral or pulp pigment accumulation had been well characterized and the genes encoding relevant enzymes and transcriptional factors have been isolated [7,8]. The molecular mechanisms of the anthocyanin biosynthesis pathway also have been comprehensively reported. However, most of the researches mainly focused on f fruit color [9] and petal color [10][11][12], and anthocyanin biosynthesis in colored-leaf plants has rarely been researched prior to this study. In recent years, some scholars have identified PAL, CHS, CHI, DFR, ANS, F3H, F3'H, F3'5'H [13,14] and a few related regulatory genes such as MYB, bHLH and WDR in color changing of colored-leaf plants [15,16]. The process of anthocyanin synthesis and accumulation is relatively complex, and is regulated by multiple enzymes and transcription factors [17], as well as being influenced by external environmental factors such as light [18], water stress [19], and temperature [20]. Thus the mechanism of leaf color change in colored-leaf plants needs to be further studied.
Acer pseudosieboldianum is a small deciduous tree belonging to the Acer genus of the family Aceraceae. Because of its beautiful shape and brilliant leaves, it is an often used autumn leaf ornamental tree species [21]. In addition, it has high economic value, whose woods can be used for making utensils and leaves can be used as dyes [22]. Recently, some scholars have reported and studied the introduction, cultivation, and breeding of A. pseudosieboldianum [23,24]. However the key genes affecting leaf color change have not been determined yet, and relative information is relatively scarce. This fact means that the molecular regulatory mechanisms related to leaf color formation needs further study.
In recent years, transcriptome high-throughput sequencing technology has been widely used to study the mechanism of leaf color in various plants [25,26]. In this study, de novo transcriptome sequencing assembly, annotation, and bioinformatic analysis on leaves from A. pseudosieboldianum were performed at different colorchanging stages in autumn. The DEGs at different transformation stages were analyzed and validated. At last, combined this data with anthocyanin metabolism analysis data, some genes related to anthocyanin synthesis were identified. This study provides a theoretical basis for studying the molecular mechanism of leaf color in A. pseudosieboldianum.

Contents of anthocyanin in the leaves
In order to explore the mechanism of pigment formation in A. pseudosieboldianum leaves, we carried out qualitative analysis of anthocyanin components in the middle (M) and last stage (A) of leaf color transformation (The anthocyanin content was extremely low in early stage (B), Therefore, only M and A stage were analyzed). According to our UPLC-Q-TOF-MS data, five anthocyanins were identified (

Production statistics of sequencing data
In order to understand the molecular mechanism of color change in A. pseudosieboldianum leaves in autumn, sequencing was performed using the Illumina Hiseq 2500 (Additional file 1: Table S1). A total of 67.47 Gb of clean data was obtained from these sequencing results, and the percentage of Q30 bases was 93.10 % or more. After assembly, 50,501 unigenes were identified. Among these there were 20,706 unigenes over 1 kb in length, and the error rate of sequencing was less than 0.1 %, which indicates that the quality of sequencing data was good and could be used for subsequent analysis.

Functional annotation and classification
Unigene sequence was then compared with gene sequences in the NR, Swiss-Prot GO, COG, KOG, eggNOG 4.5, and KEGG databases using BLAST software (e < 0.00001). 35,316 unigenes were identified, accounting for 70.01 % of the 50,501 unigenes. 12,984 unigenes were annotated in the COG database, 25,375, 12,487 and 19,460 unigenes were annotated in the GO, KEGG, and KOG databases respectively. 25,226 unigenes were annotated in the Pfam database. 19,796 unigenes and 32,498 unigenes were also annotated in the Swanshot and eggNOG databases respectively ( Table 2).
According to NCBI NR database and E-value distribution, the number of unigenes annotated in our dataset was 35,024, of which 71.53 % of these unigenes (E < 10 − 50 ) had strong homology and 47.87 % of these unigenes (E < 10 − 100 ) had very strong homology (Fig. 2a).Ten popular-related species were also annotated based on the NCBI NR database (Fig. 2b). The highest homology to A. pseudosieboldianum was Citrus sinensis, accounting for 12.25 % homology, followed by Citrus clementina, which accounted for 9.74 % homology.
GO databases are divided into three categories: biological process, cellular component and molecular function, which are further divided into 42 functional subgroups. Biological process had the largest number of annotated unigenes, included metabolic process and cellular process with 13,141 (51.78 %) unigenes and 11,546 (45.5 %) unigenes, respectively. The cellular component class mainly included cell and cell part, with 11,886 (46.84 %) unigenes and 11,806 (46.53 %) unigenes, respectively. The molecular function category mainly included catalytic activity and binding, and there were 12,691 (50.01 %) unigenes and 1, 1049 (43.54 %) unigenes (Fig. 3).
In addition, Annotation data about COG and KEGG were found in Additional file 2: Fig. S1 and Additional file 3: Table S2, respecially.

Differentially Expressed Genes (DEGs)
In order to explore the genes related to anthocyanin biosynthesis in A. pseudosieboldianum at different colorchanging stages, the differential expression of A. pseudosieboldianum samples at different color-changing stages were then analyzed. The results showed that there were 16,521 DEGs in the three color-changing periods of A. pseudosieboldianum (Fig. 4a). Comparing between the   In order to further understand the function of these respective DEGs, we carried out KEGG pathway enrichment analysis in the three stages of A. pseudosieboldianum. Our results showed that there were 16,521 differentially expressed genes in the three stages (B, M and A). The anthocyanin biosynthesis pathways related to leaf tone control were significantly enriched in B vs. M and B vs. A up-regulated genes. Phenylalanine metabolic pathways were significantly enriched in B vs. M and B vs. A up-regulated genes (Additional file 4: Table S3; Additional file 5: Table S4).

Screening of different transcription factors for anthocyanin biosynthesis
Transcription factors play an important role in plant development and secondary metabolism. In this experiment, we screened out 31 MYB genes, 15 bHLH genes, and 28 WD40 protein genes from the three DEGs of B, M and A stages of A. pseudosieboldianum. In the 31 MYB genes, 17 were up-regulated and 14 down regulated (Additional file 6: Table S5). In the 15 bHLH genes, 6 were up-regulated and 9 down regulated. In the  qRT-PCR confirmation of RNA-seq data In order to verify the accuracy of our sequencing data, we selected eight genes involved in anthocyanin biosynthesis, and analyzed the expression level in leaves of different color from these three different stages of A. pseudosieboldianum by qRT-PCR. The results showed that all of these selected genes had similar expression patterns than identified in the RNA sequencing data (Fig. 6). Therefore, the data obtained in our study can be used to analyze the anthocyanin biosynthesis and metabolism gene in A. pseudosieboldianum.

Discussion
A. pseudosieboldianum is a wild ornamental maple native to Northeast China. Like A. palmatum Thunb., A. pseudosieboldianum belongs to Sect. Palmata Paxand Ser. Palmata (Pax) Pojark. There were many cultivars of A. palmatum and they had strong ecological adaptability [27]. However, there are few varieties of A. pseudosieboldianum, which was still in the wild state or in scenic forests, and are rarely used in urban greening even if the maple leaves are red and beautiful in autumn and have high ornamental value. At present, transcriptome sequencing technology has been used to study vegetables color formation [28], flower color mechanisms [10,29], fruit development [30,31]. Some scholars have analyzed the color mechanism of the related species in Acer [32]. However, due to the lack of genomic reference sequences, the molecular mechanism of leaf color is difficult to decipher in A. pseudosieboldianum. The change of anthocyanin content in plants was shown to be related to the differential expression of key genes encoding structural enzymes in the anthocyanin biosynthesis pathway [10]. The different genes including PAL, CHS, ANS, UFGT, FLS, C4H, 4CL, DFR and ANR were identified in the flavonoid biosynthesis pathway from the purple bud tea plant by transcriptome sequencing [33]. In this study, we used transcriptome sequencing technology to sequence and compare three different coloring stages of A. pseudosieboldianum leaves in autumn. We detected four PAL, one CHS, one CHI, two F3H, two F3'H, one F3'5'H, two DFR, one ANS, and six UFGT genes in the flavonoid anthocyanin complex related to leaf color in A. pseudosieboldianum. Three GT genes were down regulated in the M vs. A stage, which indicated that the change of leaves from green to red was controlled by multiple single genes.
Both F3'H and F3'5'H belong to the cytochrome P450 superfamily [34]. F3'H is an important intermediate in the synthesis of cyaniding, and F3'5'H is a key enzyme in the synthesis of blue flower anthocyanin. Masukawa T [35] reported that F3'H could make red cyanidin accumulate in purple and red root radishes. F3'5'H mainly accumulated in the blue waterlily [11]. Many important flower crops can't produce turquoise, meaning they cannot appear blue. In this study, two F3'H and one F3'5'H were detected, but the expression of F3'5'H was very small, which may be caused leaf colour did not appear blue.
DFR is the key enzyme that catalyzes the conversion of dihydroflavonol to corresponding colorless geranium delphinium and cyanidin [36]. The main function of ANS is to oxidize colorless proanthocyanidins to produce colored anthocyanidins, which are the first colored compound in the anthocyanin synthesis pathway [37]. ANS was originally identified in a maize A2 mutant and cloned by the transposon tagging technique [38]. GT is mainly responsible for transforming unstable anthocyanins into stable anthocyanins. Studies have shown that the expression of UFGT is different in different varieties [39]. For example, anthocyanin accumulation in apple was positively correlated with UFGT activity. The change of UFGT activity in grape leads to the change of their phenotype from white to red [40]. In this study, UFGT (c103768.graph_c0) expression in A. pseudosieboldianum leaves first accumulated and then was consumed in the process of leaf color formation, which was consistent with the conclusion that UFGT consumption was needed for paeoniae anthocyanin synthesis.
At present, the research about anthocyanin biosynthesis structural genes has been gradually improved, and the research on transcription factors has become the focus. The transcription factors may also be one of the important indicators of causing A. pseudosieboldianum to turn green and red. Now MYB transcription factors for anthocyanin biosynthesis have been identified and isolated in many plants. Some studies have shown that MYB transcription factor can enhance or inhibit some aspects of regulation [10]. It was found that PqMYB113 was a transcription factor promoting anthocyanin synthesis in the leaves of peony, while PqMYB4 was a transcription factor inhibiting anthocyanin synthesis in the leaves of peony. In this study, we found that ApMYB4 gene was down-regulated in the stage of green to red transformation, which is consistent with the previous research results, indicating that ApMYB4 gene may be a transcription factor promoting anthocyanin synthesis in A. pseudosieboldianum.
It is worth mentioning that Rosinidin O-hexoside was found in A. pseudosieboldianum leaves, although the content was very small. There was no Rosinidin Ohexoside found in Acer in previous studies. The distribution of Rosinidin O-hexoside in plants is very limited and has only been reported in Catharanthus roseus [41] and Primula [12]. In addition, the cyanidin 3-glucoside contents could be used as a quantitative index to determine the color of Acer palmatum 'atropurpureum' [42]. It was also found that the leaf color changing from green to red in A. palmatum was the result of the increase of the mass fraction of cyanidin galactoside and the decrease of the mass fraction of chlorophyll [43]. In this study, the contents of differential metabolites were very high about Cyanidin 3, 5-O-diglucoside. The above research results revealed that Cyanidin was important anthocyanin in Acer, and played a key role of leaf color change in autumn. This study provided the basis for molecular breeding theory for ornamental plant leaf color improvement.

Conclusions
In this study, five anthocyanins were detected in the leaves of A. pseudosieboldianum, especially, Cyanidin 3, 5-O-diglucoside played an important role for the final leaf color. A total of 50,501 unigenes were produced at three stages of leaf color changing among which 16,521 DEGs and 64 unigenes were identified as color-related homologous genes. Four PAL, one CHS, one CHI, two F3H, two F3'H, one F3'5'H, two DFR, one ANS and six GT about anthocyanin synthesis pathway were detected. Combined with the detection of metabolites, the gene pathways related to anthocyanin synthesis were conducted. Also, related regulatory genes include MYB, bHLH, and WD40 were found. This study provides a theoretical basis for the formation of leaf color in A. pseudosieboldianum.

Plant materials and treatments
The materials tested were A. pseudosieboldianum plants that were five years old from Tianchi Square, Yanji City, Jilin Province (The plant materials were identified by Professor Liu Ji-Sheng from Agriculture college, Yanbian University, engaged in dendrological research for many years). Three leaf samples were collected separately at three different stages (early stage: B; mid-stage: M; last stage: A) with three replicate libraries per stage (Fig. 7). Three stages were September 21, 2018 (B) September 30, 2018 (M) and October 11, 2018 (A). All flesh samples were frozen immediately in liquid nitrogen, and then stored in a refrigerator at -80 ℃ for transcriptome sequencing and anthocyanin metabolite analysis.

Extraction identification and data analysis of anthocyanin metabolites
Leaf tissue samples of A. pseudosieboldianum were ground to a powder, and 100 mg of powder was dissolved in 1.0 ml extract solution (70 % methanol aqueous solution). The dissolved sample was placed in a refrigerator overnight at 4 ℃, and then vortexed three times during the period to improve the extraction rate. After centrifugation, the supernatant was reserved and the sample was filtered with a microporous filter membrane, and then stored in a sample bottle for LC-MS / MS analysis. We used multiple reaction monitoring (MRM) for qualitative and quantitative analysis of metabolites by mass spectrometry. Combining single variable statistical analysis and multivariate statistical analysis, we calculated the fold-change, and called a metabolite as a differential metabolite when its value was between a fold change ≥ 2 and a fold change ≤ 0.5. The differential metabolites were then annotated using the KEGG database [44].

RNA isolation library construction and RNA-Seq
Total RNA was extracted from leaf samples using an RNA extraction kit (Beijing Tiangen in China). Agarose electrophoresis and the Agilent 2100 Bioanalyzer were used to determine the concentration, purity and integrity of RNA samples. Then, PolyA mRNA was reverse transcribed into cDNA, and the construction and sequencing of the cDNA library was completed by the BMK Technology company in Beijing. Raw reads were obtained using an Illumina Hiseq 2500 sequencing platform, and after filtering, clean reads were obtained. Contigs were assembled by overlapping information between sequences, transcripts were locally assembled, and unigenes were obtained by homologous clustering and splicing of transcripts with Tgicl and Phrap software, respectively [45].

De novo assembly and functional annotation
After obtaining high quality sequencing data, it was necessary to assemble the genomic sequence of A. pseudosieboldianum. First, Trinity software parsed the sequencing reads into shorter fragments (K-mers), extends these fragments into longer fragments (Contig), and uses the overlap between these fragments to determine the fragment set (Component). Finally, using the dual methods of De Bruijn mapping and sequencing read information analysis, each transcript sequence was identified in each fragment set. The Unigene sequence was compared with the gene sequence in NR [46], Swiss-Prot [47], GO [48], COG [49], KOG [50], egg-NOG4.5 [51], KEGG database by Blast software [52] (e < 0.00001). Using KOBAS 2.0 [53], the KEGG orthology result of unigenes from KEGG was obtained, and after predicting the amino acid sequence of each unigene, we used HMMER [54] software to compare with the Pfam [55] database, select unigenes whose BLAST parameter E-values were not greater than 1e − 5 and whose HMMER parameter E-values were not more than 1e − 10 , and thus, finally obtained a unigene with annotation information.

Expression and differentially expressed unigene annotation
Bowtie [56] was used to compare the sequenced reads with a unigene library, and RSEM [57] was used to estimate the expression level. The expression abundance of each corresponding unigene was expressed by its FPKM [58] value. It is a common method for estimating gene expression level in transcriptome sequencing data analysis. The use of FPKM values can eliminate the influence of gene length and sequencing on calculations of gene expression. When detecting differentially expressed genes, DESeq2 was used to analyze the differentially expressed genes between the sample groups and the differentially expressed gene sets between two different conditions were identified. In the process of differential expression analysis, the Benjamini−Hochberg method was used to correct the significance p-value of the original hypothesis test, so as to reduce the false positives in independent statistical hypothesis testing for a large number of gene expression values. In the screening process, the criterion was that the FDR (False Discovery Rate) was less than 0.01 and the difference factor FC (Fold Change) was greater than or equal to 2. Between these two factors, the FC represented the ratio of expression between two samples (groups).

Gene validation and expression analysis
In order to validate our differential gene expression analysis, RT-qPCR was used to validate the differentially expressed genes related to anthocyanin biosynthesis [59]. We used a fluorescence quantitative Kit (2×SYBR ® green premix) and an analytikjena-qTOWER 2.2 fluorescence quantitative PCR instrument for quantitative analysis. The primer sequences can be found in Additional file 7:

Statistical analysis
The procedure was repeated three times for each sample and the relative expressions were calculated using the 2 −ΔΔCt method. Excel and GraphPad Prism 5 were used for chart preparation. The R-3.4.2 was used to conduct the heatmap.

Funding
The high throughput sequencing, the editing and publishing fee were financially supported by the National Natural Science Foundation of China (No. 32060692) and the Science and Technology Development project of Jilin province (No. 20200402112 NC). The funding agencies were not involved in the experimental design of the study, data collection, analysis and interpretation or writing the manuscript.

Availability of data and materials
Raw-reads data were deposited in the NCBI Sequence Read Archive (SRA) with accession number of PRJNA596335. The Transcriptome Shotgun Assembly project has been deposited at DDBJ/EMBL/GenBank under the accession GJBB00000000.

Declarations
Ethics approval and consent to participate The plants under this study are not rare or endangered. The samples were collected in their wild populations in non-protected areas; no any legal authorization/license is required.

Consent for publication
Not Applicable.