Transcriptome sequencing and comparative analysis for mining genes related to flower color variation in Iris laevigata Fisch.

Background: Iris laevigata Fisch. (Iridaceae) is an important aquatic plant with dark blue flowers. I. laevigata var. alba Ling Wang, L. Su & F. J. Shang is especially rare, it has white perianths with white anthers with a blue tinge. I. laevigata var. alba provides a good reference material for revealing complex and variations in anthocyanin catabolism networks. Modern molecular biology technology will facilitate the study of the mechanisms underlying metabolite variation. However, molecular research on I. laevigata is limited due to the lack of sequence data. Results: In this study, RNA-Seq was performed on I. laevigata and I. laevigata var. alba at flowering stage. Two libraries were sequenced using Illumina Hiseq 2000 platform. Clean data of each sample reached 7.01 Gb. A total of 64,537 unigenes were obtained after assembly, including 28,936 unigenes annotated to seven public databases. Then, 143 unigenes were putative homologs to color-related genes, including 1 up-regulated and 12 down-regulated unigenes. Combined with reverse transcript polymerase chain reaction (RT-PCR), a number of important color-related genes were tested. In perianths, 5 flavonoids (4 anthocyanins and 1 flavone) were detected using HPLC at flowering stage. Then, the putative anthocyanin metabolic process of flowering stage and differential genes related to flower color variation were put forward. A new hypothesis about the absence of phenotypic color of I. laevigata var. alba was proposed. It was suggested that the loss of anthocyanin was due to the interaction of multiple genes. First, upstream metabolic fluxes were redistributed by up-regulated CHI. Second, synergism of down-regulated genes (F3H/DFR/ANS) and competition for substrates between DFR and FLS resulted in relatively low biological flux and multi-shunt of metabolic pathways. As a result, the anthocyanins content in I. laevigata var. alba is very limited, which leads to the variation of flower color phenotype. Conclusions: This study provides mass sequence data by the deep sequencing of I. laevigata and I. laevigata var. alba for the first time. Combined with the detection and metabolite analysis of flavonoids, it will increase our understanding of the molecular mechanism of the phenotypic variation in I. laevigata var. alba and provide the basis for molecular breeding of unique flower colors. modification, protein turnover, chaperones (9.89%), translation, ribosomal structure and biogenesis (9.21%), carbohydrate transport and metabolism, (7.53%), amino acid transport and metabolism (7.01%), energy production and conversion (6.04%). in white perianths. These proves that the delphinidin pigment is off I. laevigatas var. alba the metabolic flow. In order to study the biochemical basis of I. laevigatas var. alba color phenotype, the metabolic spectra of perianths were compared, and the compounds related to color pigmentation are discussed. Previous have shown that the blue-to-white phenotypic changes in I. laevigatas are mainly due to the absence of anthocyanins. Meanwhile, naringenin which was key node compound in anthocyanin synthesis pathway not in I. laevigatas var. alba This implies that a certain portion of the pathway before naringenin formation may be blocked or shunted. The gene closely related to naringenin synthesis is chalcone isomerase (CHI). Naringenin is both a catalytic product of CHI and a catalytic substrate for F3'H, DFR, F3H. However, no naringenin is inconsistent with the significant up-regulation of CHI gene in white perianths in transcriptome sequencing data. In addition, naringenin is a necessary metabolite of the pelargonidin pigment synthesis pathway. Naringenin’s absence in white flowers means the disruption site of pelargonidin was at naringenin or earlier in the pathway. The objective was to determine why CHI was significantly up-regulated, but naringenin was not detected. The pathway of naringenin must be blocked in some steps in white perianths. We analyzed and simulated the related metabolic pathways and structural genes. instructions. The cDNA obtained by reverse transcription was used as a template to perform q-PCR. The q-PCR analysis was run on an ABI 7500 real-time PCR detection system (using BiesysTEMS) using the SyBR® Premix ExtAGTM kit (TakaaBio Inc.). The PCR reaction was at 35 cycles (95 °C for 5 min, 95°C for 15 s; 60°C for 30 s; 72°C for 30 s; 72°C for 10 min). After RT-qPCR, a melting curve was generated to test the specificity of the product. The Actin gene was selected as an internal reference gene. To ensure the authenticity and reproducibility of the results, at least two independent biological replicates and three technical replicates were performed for each treatment, and differential analysis of the two conditions/groups was performed using the DESeq R package (1.10 1). Relative expression levels apply to the 2 −ΔΔCt method.

there are six major anthocyanidins (chromophores of anthocyanins); pelargonidin, cyanidin, peonidin (3' O-methyl cyanidin), delphinidin, petunidin (3' O-methy delphinidin) and malvidin (3',5' O-dimethy delphinidin) [19][20]. Among these, malvidin and petunidin were regarded as major anthocyanins in I. laevigata and I. ensata [21]. However, the means by which the flux is controlled in the branched flavonoid pathway has remained largely unknown for I. laevigata and I. laevigata var. alba. Changes in flux in the three different branches of ABP may be caused by one of the following reasons: loss of function, changes in gene expression of encoding branching enzymes, or alteration of the substrate specificity making the enzyme unable to metabolize the specific precursor [22]. Therefore, it is very important to screen the key genes that control the branching pathway. In addition, flavones and flavonols also affect pigmentation [23][24][25], which changes the color of the pigment. For example, studies have shown that the co-staining effect of anthocyanin-flavonoids contributes to the expression of the gray-purple color of Iris species [26]. The intermolecular co-pigmentation of flavonoids and anthocyanins isolated from the Iris of the Netherlands causes the petals to be bluish [27].
In this study, the RNA-Seq project for I. laevigata and I. laevigata var. alba was carried out for the first time using Illumina (Illumina, Biomarker Technologies Inc.) sequencing technology. Combined with HPLC analysis, the pathways and products related to the anthocyanin metabolism of I. laevigata were revealed. Attempts were made to screen out the major metabolic branches affecting the perianth pigmentation and candidate genes for ABP blockade. In addition, this study will create new breakthroughs for the targeted breeding of Iris.

The main color composition of I. laevigatas
To clarify the metabolic pathways of flower color variation in I. laevigatas, we compared the compounds and contents of flavonoids (Table 1). As speculated, I. laevigatas contain three anthocyanin components responsible for color pigmentation: Del, Cy and Pel. In contrast, no color anthocyanins or derivatives were detected in I. laevigatas var. alba. This suggests that the co-coloration of these three pigments (Del, Cy and Pel) forms the color phenotype of blue flowers. The total amount of the three anthocyanins in blue flowers is 84 times greater than white flowers. This illustrates that the loss of anthocyanins and the decrease in anthocyanin content are the important factors affecting pigment formation. It is worth noting, that in blue flowers Cy content is the primary anthocyanin, having approximately 1.2 and 1.5 times greater content than Pel and Del, respectively.
Core genes in the ABP were studied in detail. The results demonstrated that most of the uni-transcripts had significant changes in expression level. Most genes showed lower transcript abundance in white perianths than in blue perianths, except CHI (Fig.3). So naringenin, a key product in blue and white perianths, was tested to understand the pigment metabolic pathway. Surprisingly, naringenin appeared in blue flowers and was not detected in white flowers (Fig.2).

RNA-sequence and assembly
In order to analyze the metabolism of anthocyanins in different colors, two libraries were constructed with blue and white perianths during the full bloom period for high-throughput sequencing (Fig. 1B, D).
The Clean Data of each sample reached 7.01 Gb, and the Q30 base percentage was 91.70% and above.
The GC content of white flower was 51.98%, and the GC content of blue flower was 52.23%. Through the above sequencing quality control, high quality Clean Data can be obtained and analyzed in-depth.
Subsequently, 108,768 transcripts and 64,537 unigenes were recombined, and the N50 of transcript and unigene (the maximum length of the nucleotide sequence covering 50% of all genes) was 1,399 nt and COG database is constructed based on the phylogenetic relationships of bacteria, algae and eukaryotes, and it can be used for direct homologous classification of gene products. There are 7,499 unigenes that aligned with 25 COG classifications. Of the 25 COG categories, the cluster for general function prediction was only (26.74%) representing the largest group, followed by replication, recombination and repair, transcription (13.2%), and signal transduction mechanisms (11.43%). Zero and a small percentage (lower than 0.1 %) of unigenes were assigned to the number of unigenes which involved extracellular structures and nuclear structure, respectively. It is worth noting that a large number of unigenes were assigned to posttranslational modification, protein turnover, chaperones (9.89%), translation, ribosomal structure and biogenesis (9.21%), carbohydrate transport and metabolism, (7.53%), amino acid transport and metabolism (7.01%), energy production and conversion (6.04%).
According to the results of expression, the white perianths have 1,908 differential genes relative to blue perianths, of which 818 are up-regulated and 1,090 are down-regulated (Additional file 2 Table S1).
Then in the further functional annotation of the identified differentially expressed genes, a total of 1,458 functional genes were annotated. GO analysis showed that a total of 3,993 unigenes can be divided into three categories: biological processes, cellular components, and molecular functions (Additional file 1: Figure S1). A large number of differential genes are classified as biological processes and molecular functions. The biological process category mainly includes molecular processes (904, 22.6%) and cellular processes (97, 2.4%), while the molecular function category: consists of catalytic activity (132, 3.3%) and binding (2,744, 68.7%) (Additional file 3: Table S2). The 11 top-hit species based on NCBI NR annotation are shown in Fig.4. More than 60% of unigenes could be annotated with the sequences from the 3 species, such as Elaeis guineensis, Phoenix dactylifera and Musa acuminata. Among them, Elaeis guineensis has the highest homology (Fig.4).

GO functional enrichment analysis of DEGs
The GO database structure is divided into several levels. The lower the level, the more specific the functions represented by nodes. From the above figure, we can see the annotations of DEGs and all genes in the secondary functions of GO. The nodes with obvious differences between red and blue columns may be related to the differences. with obvious proportion differences show that the enrichment trend of differentially expressed genes is different from that of all genes. Therefore, we can focus on whether this function is related to the differences. The results showed that metabolic, cellular, and single-organism processes were the most significant enrichment GO terms under the biological process category. Catalytic activity, binding, and transporter activity were the most enriched in the molecular function category.

KEGG function enrichment analysis of DEGs
KEGG database is the main public database about pathways, which helps us to further understand the function of genes [28].

Genes related to color development
In order to obtain the relevant gene information related to I. laevigata colore, all the node genes involved in the secondary metabolic pathways of anthocyanin flower material (anthocyanin synthesis pathway, flavone and flavonol synthesis pathway, flavonoid synthesis pathway) were extracted from the transcriptome database. A total of 143 unigenes in the I. laevigata transcriptome database were matched to these three metabolic pathways (Table 3). A comprehensive search was performed in the I. laevigatas transcriptome annotation results according to the standard gene names and synonyms provided by the KEGG database ( Table 3). The corresponding gene was then mapped to the reference pathway provided by KEGG according to the EC identification number corresponding to the annotation results. The data set includes more than 90% of the annotated sequence genes in the biosynthesis of flavonoids ( Fig. 6-1), while the other two pathways cover only a small fraction ( Fig. 6-2, 6-3). In many cases, changes in the accumulation of anthocyanins are related to the amount of expression of the genes encoded by the relevant enzymes [32][33][34][35]. The green box ( Fig.6-1) is a significant down-regulated gene in white perianths, and the red box is a significantly up-regulated gene in white perianths. In the flavonoid biosynthesis pathway, there were significant changes in the expression levels of 12 key genes.
Only one of the CHI genes was up-regulated in white perianths, and other genes were down-regulated.
Subsequently, in order to verify the reliability of the transcriptome sequence data, we used the designed primers to quantitatively analyze the six core genes. The sequences of the six core genes are not only related to the color formation of I. laevigatas, but also their expression levels are different. The transcriptome sequencing data was reliable due to the high correlation coefficient between the transcriptome sequence and q-PCR (r 2 =0.5198) (Fig.7). Our q-PCR results are basically consistent with the results obtained by the RNA-Seq method, and only the individual sequences in CHS, CHI and F3'H are slightly different from the sequencing results. The expression of individual sequences of these three genes was significantly increased in white flowers (Fig.8).

Discussion
According to Table 1, blue perianths contain three anthocyanin compounds responsible for color pigmentation: Cy, Pel and Del. In contrast, anthocyanins were absent except for a small amount of delphinidins in white I. laevigatas var. alba. Furthermore, the content of delphinidin in blue perianths is 22 times greater than in white perianths. These results proves that the delphinidin pigment pathway is not cut off in I. laevigatas var. alba, but decreased the metabolic flow. In order to study the biochemical basis of I. laevigatas var. alba lacking color phenotype, the metabolic spectra of perianths were compared, and the compounds related to color pigmentation are discussed. Previous studies have shown that the blue-to-white phenotypic changes in I. laevigatas are mainly due to the absence of anthocyanins. Meanwhile, naringenin which was key node compound in anthocyanin synthesis pathway was not detected in I. laevigatas var. alba (Fig.2). This implies that a certain portion of the pathway before naringenin formation may be blocked or shunted. The gene closely related to naringenin synthesis is chalcone isomerase (CHI). Naringenin is both a catalytic product of CHI and a catalytic substrate for F3'H, DFR, F3H. However, no naringenin is inconsistent with the significant up-regulation of CHI gene in white perianths in transcriptome sequencing data. In addition, naringenin is a necessary metabolite of the pelargonidin pigment synthesis pathway. Naringenin's absence in white flowers means the disruption site of pelargonidin was at naringenin or earlier in the pathway. The objective was to determine why CHI was significantly up-regulated, but naringenin was not detected.
The pathway of naringenin must be blocked in some steps in white perianths. We analyzed and simulated the related metabolic pathways and structural genes.
Studies have shown that DFR has strict substrate specificity and the substrate specificity of DFR can be altered by minor changes in DFR [36]. CHI may have the same characteristics as DFR. It is understood that CHI is a chalcone isomerase. CHIs ultilize pinocembrin chalcone and pinocembrin chalcone.
When CHI gene is overexpressed, CHI may be easier to catalyze other chalcone(such as pinocembrin chalcone) rather than naringenin chalcone. Naringenin chalcone is the precursor to naringenin. This is also an important factor in the synthesis of Pel, because there is no substrate naringenin. Therefore, it can be inferred that the Pel pathway is blocked upsteam. At this point, metabolic flow has to be diverted to Caffeoyl-CoA pathway to produce Del and Cy anthocyanins. In view of the down-regulated expression of downstream genes (F3H, DFR, ANS), the metabolic flux decreases step by step. Finally, only a small amount of delphinidin was detected. In addition the CHI gene does not have enzymatic activity and fails to catalyze the formation of naringein, which results in a redirection of the flux of metabolites. Delphinidin was detected in the white perianths, indicating that the presence of eriodictyol in I. laevigatas var. alba, while naringin, as one of the substrates of eriodictyol, was not detected.
A structural or regulatory blockage in the ABP would decrease the amount of flavonoid intermediates below the blockage, but would increase the amount of intermediates in upstream side branches (depending on the dynamics of metabolite flux through the pathway) [37]. Consistent results were found in I. lutescens, where the colorless anthocyanins (including chalcones, flavones, and flavonols) is higher than that of purple [38]. Lou et al. [7] concluded that the limitation of flux in upstream reactions and the multishunt process in downstream reactions led to the process of elimination of red pigmentation in the white flowers of M. armeniacum. They argue that the proportion of corresponding products (flavonols and anthocyanins) changed due to the competition between FLS and DFR on the basis of consistency of substrates, caused a redirection of the flux of metabolites through a side-chain to other products [7]. Similar arguments may exist between I. laevigatas and I. laevigatas var. alba.
Previous studies have shown that inactivation of CHI was an enhancer accompanying with one or more flavonoid pathway genes, including flavonol synthase (FLS) and ANR [39][40][41][42][43][44]. The genetic and biochemical effects of chalcone isomerase-like (CHIL) are still unclear, but CHIL overexpression can increase the accumulation of wild-type Arabidopsis proanthocyanidins and flavonols [45]. When FLS competes for substrates with DFR, excessive CHI genes enhance FLS's competitiveness to substrates, especially when DFR is down-regulated. Previous research have shown that FLS can promote the conversion of dihydroflavonols to flavonols, resulting in the relatively higher accumulation of anthoxanthins (flavones and flavonols) in the nearly white flowers than in deeper color flowers [46].
The competition between FLS and DFR for substrates seems to exist, controlling metabolic flux to anthoxanthins (flavones and flavonols) or anthocyanins branches respectively, thus affecting the variation of flower color. Furthermore, studies show that Ipomoea CHI mutants have been shown to produce pale yellow flowers that accumulate chalcone 2'-O-glucoside rather than lacking flavonoid [47][48][49]. Chalcone 2'-O-glucoside is one of the naringenin chalcone by-products. Therefore, naringenin chalcone is more easily glucosidated to form chalcone 2'-O-glucoside. Similary, RT results showed that CHS was significantly up-regulated in white perianths. We speculated that CHI was also co-expressed with the CHS gene as an enhancer, which resulted in the tranformationsfer of naringenin chalcone into In addition, peonidin, which is the methylated derivative of Cy. is detected in blue perianths. This indicates that the anthocyanins synthesis in the late stage of Cy is accomplished by methylation Additional research is needed to better understand and detect the related metabolites.

Conclusions
Although anthocyanins have been elaborated in some Iris species, the current information is far from complete in relation to the mutation mechanism behind the phenotype of I. laevigata var. alba. The transcriptomics data combined with liquid chromatography provides an effective method for identifying the key genes responsible for the absence of flower color. One hundred and forty-three (143) unigenes were identified as putative homologues of color-related genes, including 1 up-regulated and identified. These anthocyanin components include Pel, Del and Cy. Among these, Cy is the primary pigment. We combined the metabolites and transcriptome data of the anthocyanin synthesis pathway to simulate the expression patterns of related structural genes in the flavonoid metabolic pathway.
Through differential comparison, reverse metastasis of corresponding metabolites, we speculated that the upstream flux is redistributed by up-regulated CHI, the downstream gene (F3H, DFR, ANS) is down-regulated, and the competition between DFR and FLS for substrates make the color change from blue to white. This indicates that the absence of flower color is not determined by a single factor, but by the interaction and coordination of genes. This provided molecular biological references for the study of I. laevigata, especially variation in flower color.

Pigment extraction and HPLC analysis
Anthocyanins were detected by HPLC. The freshly ground petals weighed 3.0 g and were extracted in the 2:1:1 v/v/v mixture (absolute ethanol: water: HCl) and homogenized. The material was then placed in a constant volume 50 ml colorimetric tube ultrasonic extraction performed for 30 min, followed by shaking for 1 min., with hydrolysis in a boiling water bath for 1 h and then cooled. The volume was adjusted to a volume of 50 ml twice with the above mixed solution, and then allowed to stand. The supernatant was collected using a 0.45 μm reinforced nylon membrane filter for HPLC analysis.
Standards for anthocyanins were provided by Technology Innovation Co., Ltd. (Qingdao, China).
Liquid chromatography was performed using a ThermoFisher U3000 HPLC. The HPLC included a C18 column (250×4.6 mm, 5 μm), a flow rate 0.8 mL/min, injection volume 10 μL, column temperature 35°C, and anthocyanidin chromatography. The anthocyanidin was measured at 530 nm, and anthocyanin determination content was based 100 g fresh weight in milligrams, and was repeated three times to record the peak area.

RNA extraction, cDNA library construction and RNA sequencing
Total RNA extraction from I. laevigata was extracted using a Qubit® RNA kit, and RNA purity was checked by 1% agarose gel electrophoresis and a Nanophotometer® spectrophotometer (IMPLEN, CA, USA). RNA integrity was assessed using the RNA Nano 6000 Assay Kit from the Agilent Bioanalyzer 2100 System. (Agilent Technologies, CA, USA). In order to increase the understanding of the molecular basis of the flower color variation of I. laevigata, two different flower blooms were used to construct cDNA libraries (Fig.1B, D). RNA sequencing was commissioned by Beijing Baimai Biotechnology Co., Ltd. The eukaryotic mRNA was enriched with magnetic beads with Oligo (dT).
The mRNA was randomly interrupted by the addition of Fragmentation Buffer. The first cDNA strand was synthesized using random hexamers using mRNA as a template, and then the second cDNA strand was synthesized by adding buffer, dNTPs, RNase H and DNA polymerase I. The cDNA was purified by AMPure XP beads. The purified double-stranded cDNA was subjected to end repair. A tail was added and the sequencing linker was ligated, and then AMPure XP beads were used for fragment size selection.Finally, a cDNA library was obtained by PCR enrichment. After the library was constructed, the concentration and insert size of the library were detected using Qubit2.0 and Agilent 2100, respectively, and the effective concentration of the library was accurately quantified by Q-PCR method to ensure the library quality. High-throughput sequencing was performed with HiSeq2500 after passing the library.

Transcription assembly and gene function annotation
Data filtering was performed on raw data, and the joint sequences and low quality reads were removed to obtain high quality clean data. The cDNA library was sequenced based on Sequencing By Synthesis (SBS) technology. After sequencing quality control, 14.81 Gb clean data was obtained. The Q30 base percentage of each sample was not less than 91.70%. After obtaining high-quality sequencing data, the sequencing reads were broken into shorter fragments (K-mer) using the Trinity platform, and then these small fragments were extended into longer fragments (Contig) and the overlap between the fragments was utilized. The fragment was obtained, and finally the transcript sequence was identified in each fragment set by using the method of De Bruijn graph and sequencing read information. Unigene sequences were aligned with NR, Swiss-Prot, GO, COG, KOG, eggNOG4.5, and KEGG databases using BLAST software. KEGG Orthology results of unigene in KEGG were obtained using KOBAS 2.0, and the amino acid sequence of unigene was predicted to be used. HMMER software was compared with the Pfam database to obtain unigene's annotation information. In order to obtain as much of the I. laevigata anthocyanin gene sequence as possible ,, KEGG was used to establish a reference map of the secondary metabolic pathway related to the I. laevigata flower color The corresponding reference sequence was downloaded from the public database based on the nodal enzyme name and EC identification code provided by the KEGG database. All flower color related candidate genes annotated in the I. laevigata database and the corresponding reference sequences were subjected to BLASTx alignment.

Gene expression and co-expression analysis
The reads obtained by sequencing were compared with the unigene library by Bowtie, and the expression level was estimated according to the comparison result and RSEM. The expression abundance of the corresponding unigene is expressed by the FPKM value. In the process of differential expression analysis, the well-established Benjamini-Hochberg method was used to correct the p-value of the original hypothesis test, and finally the corrected p-value. FDR (False Discovery Rate) was a key indicator for differentially expressed gene screening to reduce false positives caused by independent statistical hypothesis testing of expression values of a large number of genes. In the screening process, the FDR is less than 0.01 and the difference fold FC (Fold Change) is greater than or equal to 2 as a screening criterion. FC represents the ratio of the expression levels between the two samples (groups).
The hierarchically clustered analysis of the differentially expressed genes was performed, and the genes with the same or similar expression behavior were clustered to display the differential expression patterns of the gene sets under different experimental conditions.

Real-time quantitative PCR
Real-time quantitative PCR (q-PCR) specific primers were designed using Primer 5 software and the detailed information is shown in Additional file 4. Total RNA was extracted and the synthesis of the first cDNA strand was performed as described in the PrimeScriptTM RT reagent Kit ( TaKaRa) instructions. The cDNA obtained by reverse transcription was used as a template to perform q-PCR.
The q-PCR analysis was run on an ABI 7500 real-time PCR detection system (using BiesysTEMS) using the SyBR® Premix ExtAGTM kit (TakaaBio Inc.). The PCR reaction was at 35 cycles (95°C for 5 min, 95°C for 15 s; 60°C for 30 s; 72°C for 30 s; 72°C for 10 min). After RT-qPCR, a melting curve was generated to test the specificity of the product. The Actin gene was selected as an internal reference gene. To ensure the authenticity and reproducibility of the results, at least two independent biological replicates and three technical replicates were performed for each treatment, and differential analysis of the two conditions/groups was performed using the DESeq R package (1.10 1). Relative expression levels apply to the 2 −ΔΔCt method.

Consent for publication
Not applicable.

Availability of data and materials
The datasets supporting the conclusions of this article are included within the article and its additional files.