Transcriptomic profiling of two Pak Choi varieties with contrasting anthocyanin contents provides an insight into structural and regulatory genes in anthocyanin biosynthetic pathway

The accumulation of anthocyanin in horticultural crops not only improves their stress tolerances but also their nutritional values. Many key regulatory and structural genes in anthocyanin biosynthesis have been identified in model plants, but limited information is available for non-model plant species featured with colored leaves. In this study, two Pak Choi varieties with green or purple leaves were selected to analyze the anthocyanin biosynthesis through RNA-Seq. A total of 2475 unigenes were differentially expressed between these tested varieties, including 1303 down-regulated and 1172 up-regulated genes in the purple-leafed one. The reliability of the RNA-Seq was further confirmed by using real-time quantitative PCR. Kyoto Encyclopedia of Genes and Genomes enrichment analysis of the differentially expressed genes revealed ‘flavonoid biosynthesis’ was the only enriched pathway in the purple-leafed variety: In the pathway of phenylpropanoid metabolism, Bra017210, Bra039777, and Bra021637 were expressed at higher levels in the purple-leafed variety;　among the early anthocyanin biosynthetic genes, Bra037747 transcripts were only detected in the purple-leafed variety but not in the green-leafed one; among the late anthocyanin biosynthetic genes, Bra027457, Bra013652, Bra019350, Bra003021, Bra035004, and Bra038445 were all up-regulated in purple-leafed variety; and genes encoding anthocyanin-related transcription factors, such as Bra016164, and genes encoding anthocyanin transportation, such as GST F12, were also identified as up-regulated ones in the purple-leafed variety. The current result provided a valuable insight into the anthocyanin accumulation in the purple-leafed variety of Pak Choi and a bioinformatic resource for further functional identification of key allelic genes determining the difference of anthocyanin content between Pak Choi varieties.


Background
Most autotrophic higher plants in temperate regions are featured with leaf coloration during the progression of leaf senescence in autumn or under stress conditions mainly due to the accumulation of anthocyanins [1]. However, there are plants having inherent colored leaves (e.g. red or purple) during their entire life cycles. Horticultural crops with high anthocyanin contents are considered valuable for the health-promoting effect of anthocyanins [2]. The accumulation of anthocyanin is also proposed as an important strategy for plant tolerance against abiotic or biotic stresses at least partially through its photoprotective function against excessive sunlight and its antioxidant function for mitigating the signal transduction of and the damage effect of reactive oxygen species [3].
The anthocyanin biosynthesis pathway genes were regulated by transcription factors, such as the conserved MBW complex composed of MYB, basic Helix-Loop-Helix (bHLH) and WD40 subunit proteins in higher plants [8][9][10]. This MBW complex played essential roles in the synergistic regulation of anthocyanin accumulation through inhibiting or activating the expression of related genes [11][12][13][14], and this complex per se is repressed by MYBL2 and JAZ family proteins [15]. Furthermore, this MYBL2-JAZ repressor complex is sequestered by DELLA proteins [15]. This JAZ-DELLA-MYBL2 module upstream of the MBW complex together mediated abiotic stress-caused anthocyanin accumulation in Arabidopsis [15]. While the transferability of this knowledge into non-model plants needs to be further tested, it is of particular interest to understand the regulatory mechanism of persistent accumulation of anthocyanin at high levels in horticultural plants where high contents of anthocyanins are often desirable.
Pak Choi (Brassica Campestris L. ssp. chinensis L. Makino), one of the most consumed vegetables in China and phylogenetically close to Arabidopsis contains different varieties with highly varied levels of anthocyanin [16,17], which makes Pak Choi an ideal plant system for studying the mechanism underlying persistence accumulation of anthocyanin. The objective of this study was to identify and characterize key genes and regulating elements involved in leaf anthocyanin accumulation in Pak Choi through RNA-Seq with Brassica rapa L. spp. pekinensis as reference.

Plant materials
Two varieties of Pak Choi with different leaf colors were selected in this study: 'Jingguan' with green leaves (abbreviated as 'G' in this study), and 'Zizuan' with purple leaves (abbreviated as 'P'). Seeds of 'G' were from Beijing Vegetable Research Center, Beijing Academy of Agriculture and Forestry Sciences, and those of 'P' were from Beijing Dongsheng Seed Company. The seeds were sown in pots at 25°C in July 2015 in a modern greenhouse located in Chinese Academy of Forestry Sciences, Beijing, China. When the seedlings had five leaves in total, the third leaf from the bottom was sampled and stored at -80°C before further processing. Six biological replicates were applied for anthocyanin content measurements, and three replicates for the RNA-Seq analysis.

Analysis of total anthocyanin
Fresh leaf tissue (0.2 g) was collected for the measurement of anthocyanin content. In brief, anthocyanins were extracted in 10 ml acidified methanol (99 CH 3 OH:1HCl, v/v) at 4°C for 24 h in the dark, clarified by centrifugation at 12,000 × g for 2 min, and the absorbance of supernatants was determined using a UV-visible spectrophotometer (UV-2550, Shimadzu, Japan) from optical density (OD) at 530 and 600 nm. Anthocyanin concentrations were expressed as U•g -1 , where U was calculated as (OD530-OD600)/0.1 [18].

RNA extraction and library preparation for transcriptome analysis
A total of 3 μg RNA per sample was applied for the RNA sample preparations. Sequencing libraries were generated using NEBNext® Ultra™RNA Library Prep Kit for Illumina® (NEB, USA) according to manufacturer's instructions. By using poly-T oligo-attached magnetic beads, mRNA was purified from total RNA. Fragmentation was operated using divalent cations under elevated temperature in NEBNext First Strand Synthesis Reaction Buffer (5X). First strand cDNA was synthesized using random hexamer primer and M-MuLV Reverse Transcriptase (RNase H -). Second strand cDNA synthesis was then performed using DNA Polymerase I and RNase H. Remaining overhangs were converted into blunt ends via exonuclease/polymerase activities. After adenylation of 3′ ends of DNA fragments, NEB-Next Adaptor with hairpin loop structure were ligated to be ready for hybridization. To select cDNA fragments of 150~200 bp in length, the library fragments were purified with AMPure XP system (Beckman Coulter, Beverly, USA). Then 3 μl USER Enzyme (NEB, USA) was used with size-selected, adaptor-ligated cDNA at 37°C for 15 min followed by 5 min at 95°C before PCR. Then PCR was performed with Phusion High-Fidelity DNA polymerase, Universal PCR primers and Index (X) Primer. In the end, PCR products were purified (AMPure XP system) and the assessment of library quality was conducted on the Agilent Bioanalyzer 2100 system. The library preparations were sequenced on an Illumina Hiseq 2500 platform and 125 bp paired-end reads were generated.

Analysis of Illumina sequencing results
Clean data (clean reads) were obtained by deleting reads containing poly-N, reads containing adapters, and low quality reads from raw data. At the same time, Q20, Q30, GC-content of the clean data were calculated. Based on clean data of high quality, the downstream analyses were conducted.

Quantification of gene expression levels and differential expression analysis
Reference genome and gene model annotation files were downloaded from Brassica database website directly [http://brassicadb.org/brad/] [19]. Clean data were mapped back onto the reference genome using TopHat v2.0.12. The HTSeq v0.6.1 software was used to count the reads numbers mapped onto each gene. And the expected number of Fragments per Kilobase of transcript sequence per Millions base pairs sequenced (FPKM) of each gene was calculated based on the length of the gene and reads count mapped to this gene. Differential expression analysis of two varieties (three biological replicates per variety) was performed using DEseq program [20]. Genes with an adjusted P-value <0.05 found by DESeq were considered as differentially expressed. Gene Ontology (GO) enrichment analysis of the differentially expressed genes (DEGs) was conducted by the GO seq R packages based Wallenius non-central hyper-geometric distribution [21], which can adjust for gene length bias in DEGs. KOBAS 2.0 [22] was used to test the statistical enrichment of DEGs in Kyoto Encyclopedia of Genes and Genomes (KEGG) pathways.

Real-time quantitative RT-PCR (RT-qPCR) assay
The expression patterns of nine genes involved in the anthocyanin pathway (Gene ID: Bra013652, Bra019350, Bra027457, Bra037747, Bra017520, Bra017523, Bra026967, Bra016164, and, Bra032635) were analyzed using qRT-PCR. cDNA was synthesized using Reverse-Tra Ace qPCR RT Kit (Toyobo, Japan) according to the manufacturer's recommendation. The reverse transcription reaction system contained 2 μL 5 × RT buffer, 0.5 μL primer mix, 0.5 μL RT enzyme mix, 2 μL RNA template, 5 μL ddH 2 O. Three biological replicates were applied for each gene expression analysis. Gene-specific primers were designed according to the reference unigene sequence using the online tool [GenScript Real-time PCR (TaqMan) Primer Design, https://www.genscript.com/sslbin/app/primer]. The cDNA diluted to 100 ng/μL was used for qPCR assay with each gene-specific primers and SYBR® Green Real time PCR Master Mix (Toyobo, Japan) on the Bio-Rad iQ5 real time system. Reactions were performed at 96°C for 1 min, 40 cycles of 95°C for 15 s, 60°C for 15 s and 72°C for 45 s. All primers for RT-qPCR are listed (Additional file 1: Table S1).

Quantification of anthocyanin contents of two Pak Choi varieties
Two Pak Choi varieties were used in this study with contrasting colors that the purple variety (hereafter abbreviated as 'P') had 11.703 U•g -1 of anthocyanins while the green one (hereafter abbreviated as 'G') only had 0.320 U•g -1 (Table 1).

RNA-sequencing of two Pak Choi varieties with different leaf colors
Leaves at the same growth stage were sampled from these two varieties and their transcriptomes were profiled using Illumina sequencing. In total, 172,380,454 and 168,368,500 raw reads were generated from the 'G' and 'P' libraries, respectively (Table 2). To ensure the quality of the libraries, adaptor reads, ambiguous reads and low-quality reads were removed (Fig. 1). Finally,  Table 2).

Differentially expressed genes (DEGs) between the two Pak Choi varieties
Normalized-FPKM (fragments per kilobase per million) were used to quantify the transcript levels. The FPKM of the two varieties ('G' and 'P') were almost the same (Fig. 2) for unbiased analysis of their transcript profiles. The DEGs were analyzed to identify candidate genes related to the synthesis of anthocyanins. DEGs (padj < 0.05) were defined as genes that were significantly enriched or depleted in one variety relative to the other one. Between 'G' and 'P' , a total of 2475 DEGs were identified, including 1303 down-regulated and 1172 upregulated DEGs in 'P' (Additional file 2: Table S2).

Verification of RNA-Seq data by real-time quantitative RT-PCR
To confirm the reliability of the RNA-Seq data, the relative expression levels of nine DEGs were examined by real-time quantitative PCR (RT-qPCR). Among the tested genes, one encoded a member of the oxidoreductase family protein, four transcription factor genes, and four anthocyanin biosynthetic enzymes. Consistent with the RNA-Seq data, relative expression levels of all selected genes were higher in the purple-leafed variety than in the green-leafed variety. This result further confirmed the reliability of the RNA-Seq data ( Table 3).  suggested their variety-specific functions. In the category of biological processes, the highly enriched DEGs in 'P' included those involved in 'carboxylic acid metabolic process' , 'oxoacid metabolic process' , 'organic acid metabolic process' , 'organic acid biosynthetic process' , 'carboxylic acid biosynthetic process' and 'RNA-dependent DNA replication'. In the molecular function category, the 'oxidoreductase activity' and 'oxidoreductase activity, acting on the CH-OH group of donors, NAD or NADP as acceptor' were the mostly highly enriched (Fig. 3). GO analysis (corrected p-value < 0.2) was also conducted for the downregulated DEGs in 'P' (Fig. 4). In the category of biological processes, 'single-organism process' and 'single-organism cellular process' were the mostly highly enriched ones; and in the category of molecular function, the top enriched terms were 'anion binding' , and 'carbohydrate derivative binding' , (Fig. 4). The result of KEGG pathway enrichment analysis for DEGs showed that the only enriched pathway of DEGs (corrected-P value is almost 0.05) was 'flavonoid biosynthesis' [KEGGmap00941 (http://www.genome.jp/ dbget-bin/www_bget?map00941)] (Fig. 5), and there were 11 DEGs between 'P' and 'G' involved in this pathway (Fig. 6). Most of these genes were upregulated in purple-leafed variety. For examples, the biosynthetic genes of anthocyanins, Bra027457, Bra013652, Bra019350, Bra003021, Bra035004, and Bra038445 were all up-regulated in 'P' (Table 4). In the phenylpropanoid metabolism, Bra017210, Bra039777, and Bra021637 were expressed at a higher level in 'P' than those in 'G'. However, Bra006985 had higher expression in 'G' than those in 'P'. For biosynthetic genes in upstream of the anthocyanin biosynthesis pathway, Bra037747 was expressed in 'P' rather than in 'G' , yet Bra029212 was expressed at a lower level in 'P' than those in 'G'. Genes involved in anthocyanin transport  Table 3 The expression patterns of selected genes using real-time quantitative RT-PCR and RNA-Seq were also identified, and the pair of allelic genes (Bra008570 and Bra023602) encoded Glutathione Stransferase F12 were both significantly up-regulated in the purple-leafed variety (Table 4). Transcription factors play essential roles in the transcriptional regulation of structural genes in anthocyanin biosynthesis. Therefore candidate transcription factors potentially involved in anthocyanin biosynthesis were further pinpointed in the RNA-Seq analysis. It was found that Bra036145 was suppressed, while Bra037887, Bra016164 and Bra039283 were all upregulated in purple-leafed variety (Table 5). Yet, Bra011772 was expressed at significantly lower level in purple-leafed variety than green-leafed one. The   Table 5.

Discussion
The increased accumulation of anthocyanin in leafy vegetables improves their commercial and healthy values. Yet, there are limited studies on mechanisms underlying the anthocyanin biosynthesis in leafy vegetables [23]. In this study, transcriptomic analysis of the green-leafed and purple-leafed varieties of Pak Choi was conducted using RNA-Seq technology. The current results not only identified Pak Choi's vital structural genes involved in anthocyanin biosynthesis, but also revealed the potential regulatory mechanism involved in anthocyanin biosynthesis, transport and accumulation in the purple leafed variety.  One objective of this study is to identify the structural genes involved in anthocyanin biosynthetic pathway. The current result confirmed the presence of all known structural genes in this pathway in Pak Choi, which result was in agreement with previous proposal that anthocyanin biosynthesis pathway and structural genes are relatively conserved across higher plants [24]. Yet, at the transcriptional level, great variations were found among these structural genes between 'P' and 'G'. In the variety 'P' , most genes in flavonoid pathway had significantly higher expression levels. For examples, our results showed that the expression of Bra017210 and Bra039777, encoding for PAL1 and PAL2 respectively, were both higher in 'P' than in 'G' , suggesting that the phenylpropanoid pathway which is upstream of anthocyanin biosynthesis was more active in the purple leafed variety [25,26]. 4CL2 was another enzyme catalyzing the formation of hydroxycinnamic acid derivatives [27], and our study also revealed that the homologous gene of 4CL2 was up-regulated in 'P'.
In the anthocyanin biosynthetic pathway, structural genes can be classified into two types: Early Biosynthetic  Genes (EBGs) and Late Biosynthetic Genes (LBGs) [28]. The EBGs, including CHS, CHI, F3H, F3'H, and F3'5'H, catalyze the production of flavonols and other flavonoid compounds, while the LBGs, including DFR, ANS/ LDOX, and the UDP-glucose: flavonoid-3-O-glucosyltransferase (UFGTs), are specifically for anthocyanin biosynthesis [5,29]. Our results showed that both early biosynthetic genes (Bra037747 and Bra029212) and late biosynthetic genes (Bra027457, Bra013652, Bra019350, Bra003021, Bra035004, and Bra038445) concurrent with the high anthocyanin content in 'P'. O-methyltransferase (OMT) plays important roles in catalyzing the methylation of anthocyanins [30]. And in 'P' , the expression level of Caffeic acid 3-O-methyltransferase 1 and Caffeoyl-CoA O-methyltransferase 1 were both higher than those in 'G' , while Quercetin 3-O-methyltransferase 1 and Caffeic acid 3-O-methyltransferase OS were lower in 'P' than in 'G'. Together, these results confirmed that most structural genes were up-regulated for biosynthesis of anthocyanin which was in accordance with previous studies [31]. On the other side, exceptions did exist for the expression of the anthocyanin structural genes. For example, Bra029212 was expressed at a lower level in 'P' than those in 'G'. We reasoned that the possible reasons for such exceptions were that these genes either did not directly participate in the flavonoid biosynthesis pathway, or their expressions were suppressed due to the high-expression level and compensation effect of their redundant allelic genes. Nevertheless, these structural genes in anthocyanin biosynthesis were associated with the level of anthocyanin contents in 'P' and 'G'.
On the other hand, transcriptional regulation of the structural genes in anthocyanin biosynthesis pathway was an important regulatory strategy for anthocyanin formation and accumulation in Arabidopsis [28,32]. These regulatory genes can be mainly classified into two groups: positive and negative regulatory genes. Unlike bHLH and WD proteins that may have broader and overlapping regulatory targets, MYB proteins are the key components providing specificity for the subsets of regulated genes [33]. It has been reported that three transcription factors (MYB11, MYB12, and MYB111) activate early biosynthetic genes of the anthocyanin biosynthetic pathway in Arabidopsis [34]. While in Pak Choi, there's only one ortholog of AtMYB111, but no ortholog of AtMYB11 and AtMYB12 was found either due to the absence of these genes in the Pak Choi genome or because of their extremely low expression level in both of these two Pak Choi varieties. The bHLH gene family regulates anthocyanin biosynthesis through formation of MBW ternary complexes [35]. In Pak Choi, only one bHLH transcriptional factor (TRANSPARENT TESTA8, TT8) was found that potentially involved in the regulation of anthocyanin biosynthetic genes. In Arabidopsis, TT8 not only regulates anthocyanidin production towards the synthesis of proanthocyanidins in seeds, but is also involved in the regulation of anthocyanin biosynthesis in vegetative tissues and cell cultures [35]. Our transcriptional result was also consistent with the functional role of TT8 in Pak Choi. There are several transcription factors including two R3-type single MYB proteins (MYBL2 and CPC) and three members of the LBD (LATERAL ORGAN BOUNDAR-IES DOMAIN) family that act as negative regulators of activities of WBM complexes decreasing anthocyanin biosynthesis in A. thaliana [5,36]. Yet, the corresponding orthologs in Pak Choi (Bra016164 and Bra039283) showed higher expression levels in 'P' than in 'G' , probably either due to the different but contradictory function of these two genes in Pak Choi or due to the feedback effect of the high anthocyanin content in 'P' that in turn inhibited their expressions.
The flavonoid transporters also involved in the vacuolar transport of anthocyanins and proanthocyanidin precursors and might contribute to the accumulation of anthocyanin in plant as well [37][38][39]. In Arabidopsis, three genes, TT12, TT19 and AHA10, have been found to be related to the transport of anthocyanins. However, our results showed that Bra009033 was down regulated while genes encoding ATPase 10 and ATPase 1 were upregulated in purple-leafed variety. Glutathione Stransferases (GSTs) act as non-enzymatic carrier proteins enabling intracellular shuttling of endogenous compounds such as anthocyanin in plants. Previous gene knockout experiments revealed that GSTs were involved in anthocyanin transport [40] and the deposition of anthocyanin pigment into the vacuole [41]. The differential expressions of the GST encoding genes were found in this study, indicating a possible link between GST and the transportation of anthocyanin in Pak Choi as well.

Conclusions
We performed RNA-Seq for two varieties of Pak Choi with contrasting different anthocyanin contents. Nine structural genes in the anthocyanin biosynthetic pathway were up-regulated in the purple-leafed variety. Among them, the late biosynthetic genes including Bra027457, Bra013652, Bra019350, Bra003021, Bra035004, and Bra038445, probably played more pivotal roles in the biosynthetic process of anthocyanin. In addition, key transcription factor and transporter genes, such as Bra016164 and Bra009033, were also identified for the regulation of anthocyanin accumulation and transportation. The results in this study paved groundwork for further functional study on anthocyanin-related genes that will ultimately decipher the mechanism underlying persistently high anthocyanin contents in certain varieties of Pak Choi, and such knowledge shall also be useful for the other leafy horticultural crops.