RNA-sequencing analysis reveals betalains metabolism in the leaf of Amaranthus tricolor L.

Amaranth plants contain large amounts of betalains, including betaxanthins and betacyanins. Amaranthin is a betacyanin, and its molecular structure and associated metabolic pathway differ from those of betanin in beet plants. The chlorophyll, carotenoid, betalain, and flavonoid contents in amaranth leaves were analyzed. The abundance of betalain, betacyanin, and betaxanthin was 2–5-fold higher in the red leaf sectors than in the green leaf sectors. Moreover, a transcriptome database was constructed for the red and green sectors of amaranth leaves harvested from 30-day-old seedlings. 22 unigenes were selected to analyze the expression profiles in the two leaf sectors. The RNA-sequencing data indicated that many unigenes are involved in betalain metabolic pathways. The potential relationships between diverse metabolic pathways and betalain metabolism were analyzed. The validation of the expression of 22 selected unigenes in a qRT-PCR assay revealed the genes that were differentially expressed in the two leaf sectors. Betalains were biosynthesized in specific tissues of the red sectors of amaranth leaves. Almost all of the genes related to betalain metabolism were identified in the transcriptome database, and the expression profiles were different between the red sectors and green sectors in the leaf. Amaranth plants consist of diverse metabolic pathways, and the betalain metabolic pathway is linked to a group of other metabolic pathways.


Introduction
Plant pigments mainly include anthocyanins, betalains, carotenoids, and chlorophylls [1]. Anthocyanins are commonly used as natural colorants [2][3][4][5]. However, betalains, which are water-soluble nitrogen pigments, are not only more hydrophilic and have a higher tinctorial strength, they also have physiological functions, including anti-oxidative [6] and anti-cancer [7]. activities. Thus, betalains may be useful for developing novel products relevant to the food and medical industries.
Betalains are mainly existed in Caryophyllales species, with the exception of Caryophyllaceae and Molluginaceae species [8][9][10], as well as some higher fungi [11,12]. Betalains  Meanwhile, cell biological methods may be applied to conduct cell-and tissue-specific analyses of gene expression and physiological activities.
Researchers have analyzed the available genome and transcriptome databases to make considerable advances in the characterization of betalain metabolism. For example, some key genes related to betalain biosynthesis (e.g., bvMYB1 and CYP76AD1) have been identified in the beet genome database [38,39]. Additionally, researchers have identified the genes encoding CYP76AD5 and CYP76AD6, which catalyze the first tyrosine hydroxylation step, similar to CYP76AD1 [43,49]. Analyses of transcriptome databases resulted in the identification of the key betalain metabolism genes in pitaya [50], bougainvilleas [51], amaranth [52] and other plants [53,54]. Xu et al [45] described a competitive relationship between betalains and flavonoids in the inflorescences of Bougainvillea species. On the basis of these results, researchers have examined and functionally verified key betalain metabolism genes in betalain-containing plant species, including Opuntia species [55], Suaeda salsa [56], Portulaca [57], Parakeelya mirabilis [58]. A recent study concluded that catalase-phenol oxidase (CATPO), which exhibits catalase and tyrosinase activities, may be involved in betaxanthin synthesis. Furthermore, the CATPO gene was identified, and the encoded protein exhibits PPO activity and affects betaxanthin metabolism in red amaranth (Amaranthus caudatus). [59]. From the result, there were other enzymes nor PPO-type tyrosinase involved in the betalains biosynthetic pathway.
However, there has been no systematic study on the relationship between the betalain metabolic pathway and other secondary metabolic pathways. Therefore, we applied high-throughput sequencing technology to sequence the cDNA corresponding to the red and green sectors of amaranth leaves. The subsequent analyses of the GO and Kyoto Encyclopedia of Genes and Genomes (KEGG) databases and quantitative real-time polymerase chain reaction (qRT-PCR) assays revealed some differentially expressed genes (DEGs) related to betalain and flavonoid metabolism. This study revealed a relatively complex network involving the betalain biosynthesis pathway and related metabolic pathways. This network differs from known models. Thus, the data presented herein may be useful for more thoroughly characterizing betalain metabolism.

Determination of betalain, flavonoid, chlorophyll, and carotenoid contents in amaranth leaves
An analysis of the betacyanin and betaxanthin contents in the green and red leaf sectors indicated these two pigments were more abundant in the red sectors than in the green sectors. In contrast, there were no differences in the flavonoid contents of the two leaf sectors (Fig 1A). Within the red leaf sectors, the betacyanin and betaxanthin contents were similar (betaxanthin to betacyanin ratio of 0.98). However, within the green leaf sectors, betaxanthin was more abundant than betacyanin (betaxanthin to betacyanin ratio of 2.31). Moreover, the total betalain content was 3.09-fold higher in the red leaf sectors than in the green leaf sectors. Furthermore, the betacyanin and betaxanthin contents were 5.15-fold and 2.20-fold higher in the red leaf sectors than in the green leaf sectors, respectively ( Fig 1B).
However, an analysis of the chlorophyll and carotenoid contents in the red and green leaf sectors revealed a lack of differences between the two sectors (Fig 2). Perhaps betalains are hydrophilic pigments, they accumulate in leaves only in the epidermal subcells on the abaxial surface. Chlorophyll and carotenoids are normally synthesized in the other leaf tissues and cells. Therefore, photosynthates accumulate during plant growth and development, and the associated metabolic activities are influenced by photosynthetic capacity.

Transcriptome sequencing data assembly of A. tricolor
Because betalains were present only in the red sectors of amaranth leaves, we extracted RNA separately from the red and green leaf sectors to generate cDNA libraries for sequencing. After removing adapter sequences, ambiguous reads, and low-quality reads, approximately 15.08 Gb high-quality clean reads were obtained, with 29,914,176 reads (7.53 Gb) and 29,913,707 reads (7.53 Gb) corresponding to the red and green leaf sectors, respectively. The raw data were deposited in the NCBI Sequence Read Archive database https://www.ncbi.nlm.nih.gov/sra/ SRR5930345 (accession number: SRR5930345). The generated sequencing data are summarized in Table 1. The Q30 percentages (sequencing error rate < 0.1%) for the red and green leaf sectors were 93.58% and 93.37%, respectively. Additionally, the GC contents for the red and green leaf sectors were 43.19% and 42.32%, respectively. These values indicated that the quantity and quality of the sequencing data were sufficient to ensure an accurate sequence assembly with adequate transcriptome coverage.
All high-quality reads were assembled de novo using the Trinity program, which resulted in 9,414,697 contigs, with a total length of 407,232,755 bp, a mean length of 43.26 bp, and an N50 of 44 bp. On the basis of the paired-end sequence data, 151,663 transcripts were generated, with a total length of 180,734,750 bp, a mean length of 1191.69 bp, and an N50 of 2,168 bp. Transcripts were assembled into 80,026 unigenes, with a total length of 53,799,761 bp, a mean length of 672.28 bp, and an N50 of 1,202 bp. The length distribution of the contigs, transcripts and unigenes was demonstrated in Table 2 and S1 Fig.

Unigene annotation and analysis of amaranth leaf
To annotate genes and predict the encoded proteins, all assembled unigene sequences were used as queries in a BLASTX search (E-value < 1 × 10 −5 ) of the plant proteins in the Nr, Swiss-Prot, KEGG, COG, and KOG databases. A total of 34,350 significant BLAST hits (42.92% of all unigenes) were obtained (Table 3), with 33,893 unigenes (42.35% of all unigenes) annotated according to the Nr database and 22,508 unigenes (28.13% of all unigenes) annotated according to the Swiss-Prot database. Unigenes with lengths of 300-1,000 bp and > 1,000 bp accounted for 12,017 (34.98%) and 12,664 (36.87%) of the 34,350 annotated unigenes, respectively. Unigenes with lengths of 300-1,000 bp and > 1,000 bp accounted for 11,894 (35.09%) and 12,647 (37.31%) of the unigenes annotated according to the Nr database, respectively. Unigenes with lengths of 300-1,000 bp and > 1,000 bp accounted for 7,712 (34.26%) and 8,962 (39.82%) of the unigenes annotated according to the Swiss-Prot database, respectively. Many genes with unknown functions were detected in the transcriptome database.

Identification of genes participating in the betalains metabolism from the transcriptome database
On the basis of previous investigations by Teng et al. [59] and Zheng et al [52], we identified the unigenes involved in betalain metabolism in the transcriptome following a comparison of

Functions annotation of differentially expressed genes
We identified 2,178 unigenes that were differentially expressed between the green and red leaf sectors, including 1,320 and 858 genes whose expression levels were respectively up-and down-regulated in the red leaf sectors (Fig 3). Hierarchical clustering analysis was performed on the screened differentially expressed genes. The genes with the same or similar expression profiles were clustered (Fig 4).
All DEG sequences were used as queries in a BLASTX search (E-value < 1 × 10 −5 ) of the plant proteins in the Nr, Swiss-Prot, KEGG, COG, and KOG databases. A total of 1,580 (72.54%) unigenes were annotated.

GO annotation
921 DEGs were assigned to the three main GO categories and 50 sub-categories (functional groups). A large proportion of the genes in the biological process category belonged to the "metabolic process" (516), "cellular process" (438), and "single-organism process" (427) subcategories. Genes belonging to the "cell part" (321), "membrane" (181), and "organelles" (159) sub-categories were heavily represented among the genes in the cellular component category. Meanwhile, "catalytic activity" (564) and "binding" (398) were the dominant sub-categories in the molecular function category. Within each sub-category, the number of genes functionally annotated at each developmental stage was similar to the total number of annotated genes ( Fig  5).
Expression profiles of unigenes analysed by qRT-PCR. We completed a qRT-PCR assay to analyze the expression levels of 10 significant DEGs related to photosynthesis, phenylpropane synthesis, plant hormone signal transduction, and phenylalanine metabolism to validate the accuracy of the RNA-Seq data. The expression patterns of these genes according to the qRT-PCR were consistent with the RNA-Seq data, implying that the RNA-Seq data were suitable for subsequent analyses (Fig 7).
Meanwhile, we selected 5 and 7 genes involved in the flavonoid biosynthesis and the betalain metabolic pathways to detect the expression level by using qRT-PCR assay, respectively. The expression levels of some flavonoid biosynthesis genes, including PAL, CHS, and F3H, were higher in the red leaf sectors than in the green leaf sectors, while the FS and WRKY13 expression levels exhibited the opposite pattern (Fig 8A). To investigate the raw material supply-and-demand relationship between betalain biosynthesis and the synthesis of other metabolites in amaranth leaves, we analyzed the flavonoid metabolic pathway as an example. We completed a qRT-PCR assay to examine the expression levels of genes involved in the betalain metabolic pathways. The expression levels of genes related to betalain metabolism, such as CATPO, PPO, CYP76AD1, DODA, DOPA5-GT, and B5-GT, were higher in the red sectors than in the green sectors. The TyDC gene expression levels exhibited the opposite pattern. These results implied that betalain biosynthesis differs between the red and green leaf sectors of amaranth plants (Fig 8B).

Discussion
The conversion of tyrosine to L-DOPA during betalain biosynthesis in amaranth plants may require a complex enzyme-mediated process Polyphenol [60] Of these PPOs, tyrosinase, which is a special type of PPO, is believed to be active during the first two steps of betalain biosynthesis. Specifically, it catalyzes the hydroxylation of L-tyrosine to generate L-DOPA [41]. after which it catalyzes the oxidation of L-DOPA to produce dopaquinone. [61] cloned the BvcTYR gene, which encodes the tyrosinase responsible for the hydroxylation of Ltyrosine, and determined that this tyrosinase is involved in the first step of the betalain biosynthetic pathway. [62] recently hypothesized that low tyrosinase activity explains the relatively low betalain content of yellow beets. Two genes encoding copper-containing PPOs mediating betalain accumulation have been cloned from cultured cells [63]. Tyrosinases exhibiting in vitro tyrosine hydroxylase and/or DOPA oxidase activities were subsequently purified from red beet roots (Gandía-Herrero et al. 2004), Swiss chard leaves [64], and S. salsa seedlings [65]. However, there is still a lack of conclusive genetic evidence that tyrosinase is involved in betalain biosynthesis. In this study, we also observed that the PPO expression level is consistent with the betalain contents in the red and green leaf sectors. We believe that tyrosine is optimally used in the red leaf sectors because PPO is more highly expressed in the red leaf sectors than in the green leaf sectors. In fact, tyrosinases are widely distributed throughout the plant kingdom, including in betalain-producing Caryophyllales species. These tyrosinases still need to be comprehensively characterized in future studies.
The AcCATPO enzyme in Amaranthus cruentus was recently purified by Teng et al. [59]. This enzyme exhibits catalase and tyrosinase activities, and it is able to catalyze reactions involving a monophenol (tyrosine) and a diphenol (L-DOPA). Consequently, AcCATPO may be important for betalain biosynthesis, especially during the first two steps of the betalain biosynthetic pathway. Furthermore, a positive correlation between AcCATPO transcript abundance and the ratio of betaxanthins to betacyanins implies that AcCATPO might be mainly involved in the biosynthesis of betaxanthins. However, we observed that CATPO is more highly expressed in the red leaf sectors than in the green leaf sectors, which is consistent with the ratio of betacyanins to betaxanthins. Because CATPO catalyzes the conversion of tyrosine to L-DOPA, which is involved in the production of both betacyanins and betaxanthins, we speculate that CATPO mediates the biosynthesis of both betacyanins and betaxanthins.
Previous studies of betalain biosynthesis in beets identified three new cytochrome P450 enzymes, namely CYP76AD1, CYP76AD5, and CYP76AD6, which hydroxylate tyrosine (monophenolase activity) [43]. These three enzymes reportedly contribute to the initiation of the betalain biosynthetic pathway in tobacco, potato, tomato, and eggplant [66]. In the current study, we identified a cytochrome P450 gene (CYP76AD1) in the amaranth database. The expression levels of this gene were consistent with the betalain contents in the red and green leaf sectors. Although tyrosinase is thought to be the enzyme involved in the initial step of betalain biosynthesis, future studies will need to confirm which enzyme (or enzymes) is involved in this step. There are studies that have suggested tyrosine originated from pre-tyrosine or tyramine [67,68]. On the basis of the data presented herein, we speculate that there are multiple PPO enzymes or other enzymes involved in the conversion of tyrosine to L-DOPA in the betalain biosynthetic pathway of amaranth plants.

A transcriptome analysis and qRT-PCR verification can increase the accuracy of the analyses of amaranth metabolic activities
Amaranth plants are an important source of vital dietary components [19][20][21], including betalains, flavonoids, alkaloids, and other metabolites [25][26][27][28]. Photosynthesis provides the organic materials, energy, and oxygen required for almost all life activities, including secondary metabolism. In this study, we first constructed a betalain transcriptome database, which we then analyzed to detect the various amaranth metabolic pathways, including those related to phenylpropane synthesis (35 unigenes), plant hormone signal transduction (22 unigenes), phenylalanine metabolism (22 unigenes), flavonoid biosynthesis (14 unigenes), photosynthesis (including antenna proteins) (25 unigenes), and betalain metabolism (1 unigene). Thus, our analysis of the amaranth transcriptome revealed many of the metabolic pathways in amaranth plants. Additionally, the DODA expression level, which influences betalain production, differed significantly between the red and green leaf sectors. However, according to Teng [59] and Zheng et al [52], We also identified unigenes associated with betalain metabolism based on comparisons of homologous gene sequences. The unigenes that were differentially expressed between the red and green leaf sectors were identified based on the following criteria for unique sequence reads: FC � 2 and FDR < 0.01. The expression levels of CATPO, PPO, MYB, CYP76AD1, DODA, DOPA5-GT, B5-GT, and TyDC were significantly different between the red and green leaf sectors. The expression level of only TyDC was down-regulated in the red leaf sectors relative to the corresponding expression level in the green leaf sectors. The expression levels of all other genes exhibited the opposite pattern. We speculated that betalains in leaves accumulate only in the epidermal subcells of abaxial leaves, which explains the lack of difference in the expression of some genes between the red and green leaf sectors. Moreover, our results suggest there is also no difference in the flavonoid, chlorophyll, and carotenoid contents between the two examined leaf sectors. Furthermore, the expression levels of the other unigenes, as determined by qRT-PCR, were largely consistent with the data in the transcriptome database, verifying the reliability of the transcriptome database. Combining the transcriptome and qRT-PCR analyses increased the accuracy of our investigation of amaranth metabolic activities. Additionally, the MYB transcription factors involved in betalain metabolism have been cloned and subsequently verified in a qRT-PCR assay [69].

Hormones influencing the synthesis and accumulation of betalains
There are many external factors influencing the synthesis and accumulation of betalains, especially light and hormones. Betalain biosynthesis is differentially affected by various hormones, including IAA, ABA, and 6-BA. In this study, unigenes related to plant hormones were expressed in the red leaf sectors of A. tricolor plants, although these hormones are also present in the green leaf sectors. Previous studies revealed that plant hormones increase, decrease, or have no effect on betalain turnover. For example, ABA can inhibit cell growth and betalain accumulation in Phytolacca americana by decreasing the availability of a betalain precursor (i.e., free tyrosine) [70]. Other studies demonstrated that 6-BA can promote betalain biosynthesis in some plant species, including A. tricolor r [52], S. salsa [71], These studies indicated that betalain metabolism may be regulated by 6-BA via changes to the DODA expression level. Several studies have verified DODA as a key gene for betalain metabolism. Moreover, GA can maintain skotomorphogenesis [72], implying it might indirectly inhibit betalain biosynthesis [73]. We previously reported that GA 3 negatively regulates betalain accumulation in amaranth seedlings, and this effect is more pronounced at high GA 3 concentrations. Furthermore, 2,4-D reportedly may induce the accumulation or degradation of betalains [74][75][76][77]. In the current study, an analysis of a transcriptome database resulted in the identification of many unigenes related to hormone metabolism as well as flavonoid and alkaloid pathways. These observations were related to pathway interactions and source-sink relationships associated with betalain metabolism.

Many metabolic pathways co-exist in the A. tricolor L.
Betalains have been sub-divided into betacyanins and betaxanthins. There are four main types of betacyanins (amaranthin, betanin, gomphrenin, and decarboxybetanin), of which amaranthin is the main betacyanin among Amaranthus species. In amaranth plants, betalain biosynthesis involves multiple metabolic pathways (Fig 9). Photosynthetic products form aromatic amino acids, including tyrosine, phenylalanine, and tryptophan, via the shikimic acid pathway. Tyrosine is required for protein biosynthesis, while also serving as a precursor of some metabolites, including plastoquinone, tocopherols, rosmarinic acid, isoquinoline alkaloids, catecholamines, and betalains [62]. The classic betalain metabolic pathway involves a series of enzymes, such as PPO, CYP76AD1/5/6, DODA, and glucosyltransferase. However, some reactions in this pathway occur spontaneously.
Betalains are secondary metabolites derived from L-tyrosine, which is hydroxylated to L-DOPA through the monophenolase activity of tyrosinase (or polyphenoloxidase). The L-DOPA is subsequently oxidized to dopaquinone by the diphenolase activity of tyrosinase, and then spontaneously converted to cyclo-DOPA [40]. The L-DOPA can also be directly converted to cyclo-DOPA in a reaction catalyzed by CYP76AD1-α. Additionally, L-DOPA can be converted to seco-DOPA by DODA (4,5-DOPA-extradiol dioxygenase) and then spontaneously form betalamic acid, which is a chromophore in all betalains. The addition of an amine or amino acid to betalamic acid may result in the production of yellow betaxanthin [78]. Betalamic acid may also combine with cyclo-DOPA to spontaneously form betanidin, which is a key intermediate of betacyanin biosynthesis [79]. Glucose is added to the 5-hydroxyl group of betanidin in a reaction catalyzed by betanidin-5-O-glucosyltransferase to form betanin, which is responsible for the characteristic color of red beets (Beta vulgaris ssp.). A previous study confirmed that stress can increase betalain production and up-regulate B-5GT expression in red beet leaves, while betalain accumulation is inhibited by the transient expression of B-5GT in the antisense orientation [80]. Thus, B-5GT is important for betalain metabolism. Additionally, cDOPA 5-O-glucosyltransferase converts cyclo-DOPA to cDOPA 5-O-glucoside, which combines with betalamic acid to spontaneously form betanin. The betanin is subsequently converted to amaranthin, which is the main pigment in amaranth plants. Amaranthin may also be synthesized in other pathways. For example, cDOPA 5-O-glucoside can be transformed to cDOPA-glucuronylglucoside and then to amaranthin. In this study, we screened the transcriptome database to identify the key genes in the betalain metabolic pathway, and we observed that the gene expression tendencies were associated with betalain content.
Amaranthin may also be synthesized via phenylpropane and phenylalanine metabolic activities. The metabolites might be precursors related to flavonoid and lignin metabolism. The fact that PAL and C4H can convert phenylalanine to p-coumaric acid suggests there is a link between betalain biosynthesis and flavonoid metabolism. The p-coumaric acid can be subsequently converted to ferulic acid, caffeic acid, and sinapic acid. The dihydroindoles of betacyanins might be derived from dopamine precursors, phenylalanine and tyrosine. These precursors are converted to trans-cinnamic acid, caffeic acid, and p-coumaric acid, which are important substances that link phenylpropane metabolism to flavonoid metabolism [44]. The increasing conversion of p-coumaric acid to amaranthin decreases the precursors available for flavonoid biosynthesis, thereby affecting the flavonoid content. Meanwhile, lignin synthesis is inhibited by the increasing use of tyrosine and phenylalanine in the pathway that transforms L-tyrosine to DOPA and then amaranthin [18]. We observed that PAL is more highly expressed in the red leaf sectors than in the green leaf sectors, indicating that the production of caffeic acid, which may provide the dihydroindoles in amaranthin, increases with increasing p-coumaric acid levels during phenylpropane metabolism. Meanwhile, we determined that CHS and F3H expression levels are higher in the red leaf sectors than in the green leaf sectors, potentially leading to a greater availability of flavonoid precursors in the red leaf sectors. These differences may be responsible for the diversity in the flavonoid production of the two leaf sectors. In contrast, the FS gene expression level is lower in the red leaf sectors than in the green leaf sectors, which may affect the types of flavonoid that are biosynthesized. In this study, there were no differences in the total flavonoid contents of the red and green leaf sectors. Our findings suggest that phenylalanine metabolites are not only precursors of flavonoid or lignin metabolic activities, they may also contribute to amaranthin biosynthesis.
Dopamine is a betalain metabolic intermediate that is involved in the production of betalains and isoquinoline alkaloids, which are derived from L-tyrosine. Following a series of enzymatic reactions, L-tyrosine is converted to dopamine, which is transformed into an isoquinoline alkaloid and is involved in betalain biosynthesis. Although isoquinoline alkaloids exist in amaranth plants, the relationship between isoquinoline alkaloids and betalains remains unclear.

Conclusions
In amaranth plants, betalain metabolism is associated with a group of metabolic pathways. Additionally, multiple source-sink relationships exist between betalain biosynthesis and the synthesis of other metabolites.

Materials
Amaranth (A. tricolor cv. Dahong) was grown on garden soil in pots in the greenhouse (28 ±2˚C), located in the campus of Fujian Agriculture and Forestry University, China.
Leaves at the fourth and fifth positions were collected from ten 30-day-old amaranth seedlings (Fig 10). The green and red sectors without veins from each seedling were respectively collected for RNA extraction and determination of chlorophyll, carotenoid, betalain, and flavonoid contents.

Determination of chlorophyll, carotenoid, betalain, and flavonoid contents in amaranth leaves
The chlorophyll, carotenoid, betalain, and flavonoid contents in the red and green leaf sectors of amaranth plants were determined. Chlorophyll and carotenoid contents were analyzed as described by [81], Meanwhile, the betalain and flavonoid contents were determined as described by [52] and Li et al. [82], respectively.

Total RNA extraction and quality detection
The RNA of all the red sectors or green sectors from the ten seedlings was mixed respectively. Total RNA was extracted from the collected leaf samples as previously described [47]. Specifically, the RNA was extracted with Trizol reagent (Invitrogen, USA) and treated with DNase I to eliminate any residual genomic DNA. The extracted RNA was quantified with the Nano-Drop 2000 spectrophotometer (Thermo, USA), and its integrity was assessed with the Agilent 2100 Bioanalyzer (Agilent Technologies) and by denaturing agarose gel electrophoresis with ethidium bromide staining. Each RNA sample concentration and the amount was � 300 ng/ μL and 15 μg, respectively.

Construction, detection, and sequencing of transcriptome libraries
A cDNA library was constructed for each sample according to the 'Preparing Samples for Sequencing of mRNA Test Kits' instructions (Illumina). The cDNA library quality and fragment lengths were evaluated with the Agilent 2100 DNA 1000 Kit, after which the cDNA libraries were sequenced with the Illumina HiSeq 2500 system (125-bp paired-end reads).

Sequencing data assembly and gene annotation analysis
The raw data generated from the sequencing of each cDNA library were transformed into sequence data (i.e., raw data or raw reads) by base calling. Adapter fragments, duplicated sequences, low-quality reads with ambiguous bases ("N"), and reads with more than 10% of bases with a Q-value < 30 were removed from the raw reads to yield the clean reads for subsequent analyses. The clean reads were included in a de novo transcriptome assembly, which was completed with the Trinity program. Firstly, the sequencing reads were broken into shorter fragments (K-mer) using Trinity software. These shorter fragments were combined to form longer fragments (i.e., contigs) with no ambiguous basesby overlapping reads of a k-1 nucleotideslength Thirdly, by using the overlap between these longer fragments, the component was obtained. Finally, the De Bruijn graph method and sequencing read information were used to identify transcripts in each component. The resulting sequences were designated as unigenes. The ORFs of unigene sequences longer than 200 bp were identified with the getorf program (http://emboss.bioinformatics.nl/cgi-bin/emboss/getorf). The unigene sequences were then used as queries in a BLASTX search (E-value < 1 × 10 −5 ) of the NCBI non-redundant (Nr), Swiss-Prot, GO, Clusters of Orthologous Groups (COG), euKaryotic Orthologous Groups (KOG), and KEGG databases. The enriched KEGG pathways among the unigenes were identified with the KOBAS 2.0 program. We used the HMMER program to compare the predicted amino acid sequences encoded by unigenes with sequences in the Pfam database to functionally annotate the unigenes.

Analysis of differentially expressed genes
The Bowtie program was used to compare the reads for each sample with the sequences in a unigene library. On the basis of this comparison, gene expression levels were estimated with the RSEM v1.3.0 program (2016). The FPKM (fragments per kilobase of transcript per million mapped reads) value was used to represent unigene expression levels. After normalizing the unigene expression levels, genes that were differentially expressed between the red and green leaf sectors were identified with the DESeq program. A p-value < 0.05 was applied as the threshold after the adjustment for multiple comparisons based on the Benjamini and Hochberg false discovery rate (FDR) method. The unique reads with a fold-change (FC) value � 2 and FDR < 0.01 were identified as DEGs.
We used the MapMan 3.5.1R2 program to compare the unigene sequences with Arabidopsis thaliana metabolism pathways. Some of the matching genes related to flavonoid metabolism were selected for further study. Additionally, the betalain-related genes (i.e., associated with the betalain biosynthetic pathway) were also selected.

Quantitative real-time polymerase chain reaction analysis
Ten DEGs were analyzed in a qRT-PCR assay to validate the accuracy of the RNA-sequencing (RNA-Seq) data. The genes related to betalain biosynthesis and flavonoids were then analyzed. Gene-specific primers designed with the DNAMAN 6.0 program (LynnonBiosoft, America) were synthesized by Shanghai Bio-engineering Co., Ltd. Details regarding the primers are presented in S5A and S5BA Table. The RNA samples used to construct cDNA libraries were used for the qRT-PCR analysis, which was completed with the SYBR Green I Master Mix (Takara) and the LightCycler 480 qRT-PCR instrument (Roche, Switzerland). All samples were analyzed in triplicate, with three biological replicates per sample. The EF1a gene was used as an internal reference for calculating relative unigene expression levels. Specific details regarding the qRT-PCR method were previously described [47].

Statistical analysis
Quantitative results of the gene expression and component contents analyses were presented in terms of means ± SDs of at least three biological replicates. The gene expression and component contents were analysed by one-way analysis of variance (ANOVA) followed by Duncan's test using SPSS version 19.0. These pictures were made using the GraphPad Prism 6.0 software and Excel 2013.
Supporting information S1