De novo sequencing and analysis of the cranberry fruit transcriptome to identify putative genes involved in flavonoid biosynthesis, transport and regulation

Cranberries (Vaccinium macrocarpon Ait.), renowned for their excellent health benefits, are an important berry crop. Here, we performed transcriptome sequencing of one cranberry cultivar, from fruits at two different developmental stages, on the Illumina HiSeq 2000 platform. Our main goals were to identify putative genes for major metabolic pathways of bioactive compounds and compare the expression patterns between white fruit (W) and red fruit (R) in cranberry. In this study, two cDNA libraries of W and R were constructed. Approximately 119 million raw sequencing reads were generated and assembled de novo, yielding 57,331 high quality unigenes with an average length of 739 bp. Using BLASTx, 38,460 unigenes were identified as putative homologs of annotated sequences in public protein databases, including NCBI NR, NT, Swiss-Prot, KEGG, COG and GO. Of these, 21,898 unigenes mapped to 128 KEGG pathways, with the metabolic pathways, secondary metabolites, glycerophospholipid metabolism, ether lipid metabolism, starch and sucrose metabolism, purine metabolism, and pyrimidine metabolism being well represented. Among them, many candidate genes were involved in flavonoid biosynthesis, transport and regulation. Furthermore, digital gene expression (DEG) analysis identified 3,257 unigenes that were differentially expressed between the two fruit developmental stages. In addition, 14,473 simple sequence repeats (SSRs) were detected. Our results present comprehensive gene expression information about the cranberry fruit transcriptome that could facilitate our understanding of the molecular mechanisms of fruit development in cranberries. Although it will be necessary to validate the functions carried out by these genes, these results could be used to improve the quality of breeding programs for the cranberry and related species.

The cranberry fruit is an accessory fruit that develops from the development of the inferior ovary, which consists of the ovary wall and the floral tube [18]. Fruit ripening has an impact on the levels of various phytochemicals, such as vitamins, flavonoids and the phenolic acid derivatives. Hence, the study of the growth and ripening of cranberry fruits is an important field of research, as it influences the quality of the fruit, affecting pigment accumulation, flavor, nutritional value, and functionality. In particular, pigment accumulation during fruit ripening takes place from a bright green to white, then pink, and finally red, conferring the natural pigmentation to mature fruits [19,20]. Given the distribution of these phytochemicals vary differently among different plants, even within different populations of the same species and different organs of plants, for this reason the molecular mechanisms of their biosynthesis, transport and regulation might be diverse and complex. Therefore, it is essential to use modern genetic tools for dissect out the complexities involve with phytochemicals in cranberry.
There has been some previous research into the cranberry, including simple sequence repeats (SSRs) mining and validation, genetic map construction, and quantitative trait loci (QTL) analysis [21][22][23]. Recently, a sequencing of one cultivar of the cranberry was published [24]. However, information about the development of the fruit of the cranberry is still scarce. The application of nextgeneration sequencing (NGS) technologies such as deepsequencing dependent RNA sequencing (RNA-Seq), in particular de novo sequencing, provides a cost-effective means for sequencing the transcriptome of an organism [19][20][21][22][23]. Among the new-generation sequencing methods, 454 pyrosequencing techniques and Illumina sequencing are widely used to analyze transcriptomes [25][26][27][28][29]. Compared with 454 pyrosequencing, Illumina HiSeq 2000 costs less and has a much greater output. This makes HiSeq 2000 an enabling approach for high-throughput RNA-Seq. HiSeq 2000 sequencing technology has become a valuable tool for the study of fruit trees, such as the black raspberry [30], Lycium chinense [31], and the longan [32].
In this work, we present a de novo assembly of the fruit transcriptome of V. macrocarpon using high-throughput Illumina HiSeq 2000 sequencing. Furthermore, differential gene expression between white and red fruits was investigated to reveal differential regulation of key pathways. This study provides an important genetic resource for understanding molecular mechanisms on the cranberry fruit ripening, and the results may be helpful for further gene expression and functional genomic studies, and molecular breeding of the cranberry.

Results and discussion
cDNA sequence generation, de novo assembly and mapping to the cranberry genome To obtain a complete profile of the cranberry transcriptome during fruit development, two cDNA libraries were built for two fruit developmental stages: white fruit (W) and red fruit (R). 59,986,374 and 59,690,570 raw reads were generated from the W and R libraries, respectively. A summary of these sequencing results are presented in Table 1. After removing low quality short sequences, 52,413,112 and 53,352,808 clean reads for W and R, respectively, remained and were used for assembly. The Q20 percentages (sequencing error rate <1 %) and GC percentages obtained from the W and R libraries were 97.86 % and 46.93 %, and 97.93 % and 47.24 %, respectively. These results suggest that the sequencing data have sufficient quantity and quality to ensure accurate sequence assembly and adequate transcriptome coverage. Cleaned reads from each library were assembled independently using Trinity tool. Inchworm assembly of reads, the first step of Trinity, resulted in 105,377 and 114,265 contigs with mean sizes of 359 bp and 367 bp, and N50s of 739 bp and 783 bp, for W and R, respectively. After clustering with the TGICL software [33], the contigs were assembled into 69,540 unigenes for W with a mean length of 510 bp and an N50 of 763 bp, and 66,917 unigenes for R with a mean length of 597 bp and an N50 of 1,020 bp. At last, these two sets of unigenes were merged with TGICL resulting in a final assembly of 57,331 All-unigenes (with a total length of~42 Mb), with a mean size of 739 bp and an N50 of 1,209 bp (Additional file 1). The N50 value is one of the most popular metrics to assess assembly quality, which reflects a continuous and complete assembly [34]. The N50 value of the cranberry fruit transcriptome is longer than that reported in previous studies on blueberry fruit transcriptomes (1,100 bp) [35]. The unigene size distribution showed the following: 76.1 % (43,643) of the unigenes were between 300 and 1,000 bp in length; 22.2 % (12,720) of the unigenes were between 1,000 and 3,000 bp; and 1.7 % (969) were more than 3,000 bp long (Fig. 1). The assembled unigenes were also mapped to the cranberry genome to examine the accuracy of the transcriptomes. 47,316 (82.5 %) unigenes were mapped to the cranberry genome sequence using the NCBI Mega BLAST program [36]. The remaining unigenes that did not map to the cranberry genome may be owing to differences between cultivars, gaps in the genome sequence or too short exons.

Functional annotation by searching similarity
For validation and annotation of the assembled unigenes, all unigenes were aligned against the NCBI nonredundant protein (NR), Swiss-Prot protein, Kyoto Encyclopedia of Genes and Genomes (KEGG), Clusters of Orthologous Group (COG) and Gene Ontology (GO) databases using BLASTx, and the nucleotide database (NT) by BlastN with an E-value threshold of 1e-5. A total of 38,460 unigenes (67.08 %) could be matched to known genes in the public databases. This suggests that many genes of unknown function play important roles during cranberry fruit development, and that this process is unexpectedly complex ( Table 2). The identity distribution and species distribution were analyzed (Fig. 2). According to the E-value distribution of the top hits in the NR databases,   (Fig. 2a).
For the similarity distribution of the predicted proteins, 67.1 % of the sequences had a similarity higher than 60 %, whereas 32.8 % showed similarity between 18 % and 60 % (Fig. 2b). The species distribution of the top BLAST hits against the NR database for the cranberry fruit transcriptome showed that these unigenes had the greatest number of matches with genes of Vitis vinifera (37.5 %), followed by other species including Prunus persica (9.5 %), Solanum lycopersicum (9.3 %), Ricinus communis (7.1 %), Populus trichocarpa (6.9 %), Fragaria vesca subsp. vesca (4.0 %) and Glycine max (2.8 %) (Fig. 2c). The other 22.9 % unigenes had first hits with other species. This indicates that the transcriptome of V. macrocarpon is more closely related to that of V. vinifera than to other plant genomes that are present in current public databases. In summary, the annotation results suggest that the HiSeq 2000 sequencing project in this study generated a substantial number of assembled transcripts of cranberry fruits.

Functional classification by GO and COG
GO is a useful program for annotating and analyzing the functional categorization of annotated genes. To facilitate the organization of the cranberry transcripts into putative functional groups, GO terms were assigned using Blas-t2GO [37]. A total of 27,286 unigenes (47.59 %) were assigned into GO ontologies based on their similarity to sequences with previously known functions, including 31,474 sequences assigned to the molecular function category, 104,552 to the biological process category and 83,221 to the cellular component category ( Table 2 and Fig. 3). The assigned sequences were divided into 55 functional subcategories. Because 86 % of the unigenes were assigned to more than one GO term, the total number of GO terms was larger than the total number of the unigenes with GO assignments. "Cellular process", "cell" and "catalytic activity" were the dominant subcategories in the biological process, cellular component and molecular function categories, respectively. Moreover, "metabolic process", "single-organism process", "cell part", "organelle" and "binding" were also well represented, which suggests that many novel genes involved in extensive metabolic activities could be playing important roles during the growth and development stages of the fruit. However, few genes were assigned to the categories "locomotion", "extracellular matrix", "extracellular matrix part", "extracellular region part", "virion", "virion part", "channel regulator activity", "metallochaperone activity", "nutrient reservoir activity", "protein tag", and "translation regulator activity". Overall, these GO terms account for almost half of the overall unigenes in the cranberry fruit transcriptomic dataset, suggesting that a large number of cranberry unigenes may be conserved across different plant species. The COG database can phylogenetically predict and classify the completeness of the unigene sequences. Based on sequence homology, 14,758 unigenes (25.7 % of all the unigenes) were assigned into 25 COG categories (Fig. 4). "General function prediction only" represented the largest group (4,511, 15.5 %), followed by "transcription" (2,690, 9.24 %), "translation, ribosomal structure and biogenesis" (2,317, 7.96 %) and "posttranslational modification, protein turnover, chaperones" (2,292, 7.87 %). "RNA processing and modification" (174, 0.60 %), "extracellular structures" (11, 0.04 %) and "nuclear structure" (4, 0.01 %) were the smallest groups. It is noteworthy that 816 unigenes (5.5 %) were classified into the group of "secondary metabolites biosynthesis, transport and catabolism", suggesting that those secondary metabolite processes play important roles in cranberry fruit development.

Candidate genes involved in flavonoid biosynthesis
Given that cranberries are rich in flavonoids, we focus on identifying the candidate genes involved in flavonoid biosynthesis. In plants, flavonoids are a group of polyphenolic secondary metabolites, which play many diverse physiological functions [38]. The flavonoid biosynthetic pathway has largely been characterized in Arabidopsis thaliana, Zea mays, and V. vinifera [39][40][41]. However, the overall molecular mechanism of flavonoid biosynthesis and accumulation in the cranberry is not fully understood.
In the anthocyanin branch, the dihydroquercetins and dihydromyricetins are converted to leucocyanidins or leucodelphinidins by dihydroflavonol 4-reductase (DFR, 20 unigenes). Leucoanthocyanidin dioxygenase (LDOX, 4 unigenes), known as anthocyanidin synthase (ANS), converts When comparing the cranberry transcriptome to the blueberry fruit transcriptome [35], we found that the cranberry fruit transcriptome data show more abundant flavonoid biosynthesis enzymes (Table 3), especially the number of UFGT enzymes. This suggests that the cranberry contains diverse flavonoid compounds, which have different chemical, size, three-dimensional shape, and physical and biochemical properties. In a previous study that carried out cranberry genome assembly and generated a leaf transcriptome dataset, it was suggested that the cranberry might be lacking two enzymes, F3′5′H and LAR [24]. However, we found seven unigenes encoding F3′5′H and five unigenes encoding LAR in the cranberry fruit transcriptome. Given that flavonoid biosynthesis has temporal and spatial properties, RNA-Seq data of special organs or tissues from different growth and developmental stages could dissect out the complexities involved in flavonoid biosynthesis. Therefore, our cranberry fruit transcriptome undoubtedly provides a powerful supplement to the previous cranberry genome and leaf transcriptome dataset. Furthermore, the metabolomics results showed that the American cranberry contains small amounts of Malvidin, Pelargonidin, Delphinidin, Petunidin, (+)-catechin and (epi)gallocatechins [6,7]. Therefore, it is not surprising that F3′5′H and LAR are expressed in cranberry fruit.

Candidate genes involved in flavonoid transport
According to previous research performed in plant species including Arabidopsis and grape, two mechanisms have been proposed to explain both flavonoid transport from the ER to the vacuole, and the reverse transport from storage sites to other cell targets [47]. Flavonoids could be accumulated into vacuoles or cell walls by the membrane transporter-mediated transport (MTT) system. The proton gradient between the cytosol and the vacuole (or cell wall) by H + -ATPases (and H + -PPases in the tonoplast) has been proposed to be the main driving force for the transport of some flavonoids [48]. ATP-binding cassette (ABC) transporters have also been claimed to play a role in sequestration of flavonoids into the vacuole [49][50][51]. In particular, multidrug and toxic compound extrusion protein (MATE) transporters have been described as participating  in flavonoid vacuolar sequestration in the tomato, Arabidopsis and grapes [48,52,53]. Besides the MTT system, the membrane vesicle-mediated transport (MVT) mechanism could also be involved in flavonoid accumulation. The MVT mechanism involves flavonoid-containing vesicles releasing their content into the accumulation targets by fusion [54]. These vesicles require vacuolar sorting receptor (VSR) proteins and soluble N-ethylmaleimidesensitive factor attachment protein receptors (SNARE) proteins to be addressed to the correct compartment and fuse to the membrane target [47]. Additionally, glutathione S-transferase (GST) gene has also been demonstrated to catalyze the conjugation of the tripeptide glutathione (c-Glu-Cys-Gly, GSH) as flavonoid binding proteins, and also be responsible for vesicle uploading or vacuolar transport [55,56]. In the cranberry fruit transcriptome, 341 unigenes encoding ABC, 75 unigenes encoding MATE, 174 unigenes encoding H + -ATPases and 41 unigenes encoding H + -PPases were identified. Moreover, 78 unigenes encoding GST, 6 unigenes encoding VSR and 30 unigenes encoding SNARE were found. These results imply that the two distinct transport mechanisms (MTT and MVT) could both be present, and transport may be a multifactorial process in the cranberry, involving different strategies and the contribution of several enzymes. Unlike similar studies in the carnation and the grape vine, we did not find bilitranslocase (BTL), which is a putative flavonoid that is localized in the liver and gastric mucosa in mammals [57,58]. These findings provide a powerful genomic tool to research the mechanisms of transport and accumulation of flavonoids in the cranberry.

Candidate transcription factors involved in flavonoid biosynthesis and transport
The main structural genes encoding enzymes involved in the flavonoid biosynthetic pathway have been studied in many species, including Arabidopsis, maize, the petunia, the snapdragon, apples and grapes [59]. In particular, the expression of structural genes involved in flavonoid synthesis is largely controlled by basic helix-loop-helix (bHLH) transcription factors (TFs), MYB proteins and WD-repeat-containing proteins [60,61]. bHLH TFs belong to multigenic families and are structurally organized into basic helix-loop-helix DNA-binding conserved motifs [62,63]. MYB TFs are a large gene family that is thought to be one of the most important plant regulatory gene families [64,65]. In plants, MYB TFs have been demonstrated to be involved in control of phenylpropanoid secondary metabolism, including the biosynthesis of flavonoids and anthocyanins [66][67][68][69]. WD-repeatcontaining proteins consist of four or more copies of the WD (tryptophan-aspartate) repeat, which is a sequence motif approximately 31 amino acids long that encodes a structural repeat [70]. It has been reported that certain MYB TFs interact with bHLH and WD40 proteins to form a MYB-bHLH-WD40 complex in the regulation of the flavonoid biosynthetic pathway [71,72]. Moreover, TFs also control the regulation of flavonoid transport. For example, Arabidopsis R2R3-MYB TF (AtTT2) regulates the expression of the MATE transporter gene TT12 to control the flavonoids in developing siliques; and the maize ABCC transporter protein ZmMRP3 is regulated by the R (bHLH family) and C1 (R2R3-MYB) TFs to control anthocyanin transport [73,74]. Based on our results, a total of 669 unigenes were identified as putative TFs or regulators, with 641 of the unigenes belonging to 65 subclasses ( Table 4). The unigenes encoding WD40 (202), bHLH (53), MYB (41) and WRKY (41) are the most abundant in the cranberry fruit transcriptome. By comparison to the number of WD40 unigenes detected in the blueberry fruit transcriptome, the cranberry shows more than twice the number of WD40 unigenes. These results imply that the WD40 proteins may be playing an important role in the regulation of flavonoid biosynthesis and transport. Because all major TF subclasses known to be involved in regulating anthocyanin biosynthesis are highly expressed in the cranberry fruit transcriptome, we conclude that flavonoid biosynthesis and transport are regulated by TFs in the cranberry, such as WD40, bHLH, MYB and WRKY. The control mechanism of flavonoid biosynthesis and transport is complex in the cranberry, and may be independently regulated by a single TF, or controlled by combinations of TF complexes.

SSR motif discovery
SSRs have been widely used in the study of genetic identification and fingerprint mapping, due to their high polymorphic information content, simple technology, and high reproducibility. The transcriptome is also an important resource for rapid and cost-effective development of genetic markers. To further evaluate the quality of assembly and develop new molecular markers, the cranberry unigenes generated in this study were used to identify SSRs. The distribution of mono-, di-, tri-, quad-, penta-and hexanucleotide SSRs in these unigenes are shown (Table 5). A total of 14,473 SSRs were identified in 11,980 unigenes. Of the11,980 unigenes, 2,215 and 851 unigenes contained more than one SSR and SSRs in compound formation, respectively. This indicates that nearly 21 % of the cranberry unigenes contained SSRs, which is higher than the SSR ratio (8.6 %) reported in the cranberry leaf transcriptome, but lower than the SSR ratio (37.8 %) in the cranberry genome [24]. The largest fraction of SSRs identified was dinucleotide repeats (62.4 %), followed by tri-(17.4 %) and mono-repeats (17.3 %). The di-nucleotide repeats were the most abundant type, accounting for 62.4 %, which is higher than the findings in the cranberry genome (44 %) and the leaf transcriptome (35 %) [24]. SSR motifs were further analyzed for the number of repeating units, and the most represented repeats of potential SSRs was nine (Additional file 3). The most frequent mono-, di-, tri-, quad-, and penta-nucleotide motifs were A, GA, GAA, AAAG and AAAAG, accounting for 8.7 %, 18 %, 1.4 %, 0.3 % and 0.8 %, respectively. These results show that the SSRs identified in the cranberry fruit transcriptome were GA rich, which is similar to the earlier reports on the cranberry genome (16.5 %) and the leaf transcriptome (15 %). The SSRs identified in the present study provide an abundant resource for molecular marker studies in the cranberry.

SSR marker validation
We designed primer pairs for 12 unigenes from the GO term "response to stimulus/chemicals". The sequences of the primers are in Additional file 4. Silver stained polyacrylamide gels results showed that 11 pairs (91.7 %) successfully amplified bands. Among the 11 primer pairs, four primer pairs that showed polymorphic bands and seven that showed monomorphic bands. The four polymorphic SSR loci are shown in Fig. 6. The primers for CL2296.Contig2 amplified bigger band (~200 bp) than expected size (Fig. 6a), that may be due to the regions between the two primers contain introns. The primers for CL284.Contig2 amplified expected band (Fig. 6b), the primers for CL2216.Contig1 and CL2216.Contig4 both amplified two alleles (Fig. 6c, d). Thus, these results indicate that the novel SSR primers show good transferability to cultivars of cranberry. However, the polymorphic rate is low owing to the conserved nature of expressed sequence tag SSRs (EST-SSRs) [75]. Therefore, the novel primers developed may represent highly-conserved genes involved in the response to stimuli/chemicals, which would be useful for the breeding of resistant cranberry cultivars and other Vaccinium plants in future studies.

Difference in gene expression between white and red fruit
To obtain a digital expression profile of the differentially expressed genes between the two fruit development stages in the cranberry, we used the FPKM (Fragments per kb per million fragments) method to perform gene expression analysis between the W and R libraries [76]. The transcripts with at least a two-fold difference between white and red fruits are shown in Fig. 7. A total of 3,257 unigenes were identified as differentially expressed genes (DEGs) between the two developmental stages of cranberry fruit (Fig. 8). Among them, 2,125 and 1,132 unigenes were highly expressed in the red and white fruits, respectively. More genes were highly expressed in red fruits, suggesting that more genes were involved in complex metabolites during the full ripening stage of the fruit. Moreover, 3,010 and 1,328 unigenes were only expressed in red and white fruits, respectively. This indicates that the unigenes may be specifically expressed in the different development stages of cranberry fruits. We identified 1,246 DEGs that were assigned to 49 significantly enriched GO terms. The most abundant GO terms were "metabolic process", "cell and cell part" and "catalytic activity" in the biological process, cellular component and molecular function categories, respectively (Fig. 9). Moreover, there are some interesting biological processes which are associated with fruit development, such as "developmental process", "growth", "reproduction", "reproductive process", "antioxidant activity" and "transporter activity".
To validate the data from our digital expression analysis, quantitative Real Time PCR (qRT-PCR) assays were performed on 10 DEGs involved in the phenylpropanoid biosynthesis, flavone and flavonol biosynthesis, and flavonoid biosynthetic pathways ( Table 6). The qRT-PCR expression pattern of these unigenes is shown in Fig. 10. Except for the unigenes encoding UDP-glycosyltransferase (CL754.Contig1_All), cytochrome P450 (CL1200.Conti-g1_All, CL1200.Contig2_All, CL3597.Contig3_All, CL3597.Contig4_All, CL3597.Contig5_All and Unige-ne7973_All), flavonoid 3′ hydroxylase (Unige-ne20161_All), carboxylesterase (CL1068.Contig2_All), and UDP-glycosyltransferase (Unigene 24441_All), the unigenes were up-regulated in the red fruit stage. The results of the qRT-PCR are consistent with our digital expression data (Table 6). This also showed an association with flavonoid production, and it is likely that these genes are involved in the fruit development stage. Plant cytochrome P450s are a large superfamily of heme-containing monooxygenases, which are involved in a wide range of secondary metabolite biosynthetic reactions, such as terpenoids, phenylpropanoids, and nitrogen-containing compounds, including alkaloids, cyanogenic glucosides, and glucosinolates [77]. Therefore, the selected unigenes encoding cytochrome P450s may play important roles in flavonoid accumulation in cranberry fruit.

Conclusions
A high-quality transcriptomic dataset of the cranberry fruit was obtained using the Illumina HiSeq 2000 sequencing platform. A significant number of important metabolic pathways and functions associated with the unigenes were identified. Moreover, a large number of SSRs were identified that can be used for subsequent marker development, genetic linkage and QTL analysis. Many candidate genes that are potentially involved in flavonoid biosynthesis, transport and regulation were identified and are worthy of further functional research. Our study provides the largest number of unigenes to date, and lays the groundwork for in-depth transcriptomic profiling of cranberry fruits.

Plant material and tissue collection
Six-year-old cranberry plants (V. macrocarpon 'Bergman') were grown in Changchun, Jilin Province in northeast China. This cultivar is a cross between 'Early Black' and 'Searles' with a basic chromosome number x = 12 (2x = 2n = 24) [78]. It propagates by asexual propagation of cutting and micro-propagation, which permits homogeny among individual plants. Ten plants were selected as a sample. We removed the weak buds and opened the flowers, keeping the flowers at the full pink bud stage with forceps. The flowers can be thinned to one per stem to ensure a better fruit set. Fruit samples (almost 20 fruits of the same maturation degree) were collected based on the skin color of the fruit and the development period in days. Fruit Fig. 7 Gene transcription profile between W and R libraries. For comparing gene expression level between the two libraries, each library was normalized to 1 million tags. Red dots represent transcripts more prevalent in R library, green dots show those present at a lower frequency in W library and blue dots indicate transcripts that did not change significantly Fig. 8 The number of up-regulated and down-regulated genes between W and R libraries Fig. 9 Histogram of GO analysis of DEGs sampling included two ripening stages, the white fruit (30 days after full bloom, stage W) and red fruit (60 days after full bloom, stage R). For both of the stages, all of the fruit, including pulp, skin and seeds, were mixed, and immediately frozen in liquid nitrogen and kept at −80°C until they were used for sample preparation.

Library preparation and sequencing
Total RNA was isolated by TRIzol® Reagent (Invitrogen, Carlsbad, CA, USA). The RNA quantity and quality were assessed by a Nanodrop 2000 instrument (Thermo Scientific, USA) and an Agilent 2100 Bioanalyzer (Agilent Technologies, California, USA). The cDNA libraries were constructed by mixing equal quantities of RNA from each fruit stage, including pericarp, pulp and seeds, according to the manufacturer's instructions (Illumina, San Diego, CA, USA). First, magnetic beads with oligo(dT) molecules were used to enrich for poly(A) mRNA from 20 μg of total RNA. Then, samples were fragmented into short pieces and reverse transcribed into cDNA with a PrimeScript™ 1st Strand cDNA Synthesis Kit (Takara). The secondstrand cDNA was then synthesized using buffer, dNTPs, RNase H, and DNA polymerase I. Short fragments were purified using the QIAquick PCR Purification Kit (QIA-GEN) and resolved with EB buffer for end repair and single nucleotide A (adenine) tailing. After this, short fragments were connected with sequencing adapters, and enriched by 15 cycles of PCR amplification to select suitable fragments for the final cDNA library. The quantification and qualification of the cDNA library were characterized by an Agilent 2100 Bioanalyzer and ABI Ste-pOnePlus Real-Time PCR System, and sequenced on the Illumina HiSeq™ 2000 pair-end system. Illumina sequencing was performed at the Beijing Genomics (BGI) Center in Shenzhen, China.

Sequence assembling and analysis
The raw reads generated from the Illumina sequencing were pre-processed with the filter-fq software (unpublished software). The whole sequences of reads only with adaptor, reads containing more than 5 % unknown nucleotides, and low quality reads (reads containing more than 20 % bases with Q-value ≤ 10) were eliminated. The clean reads were assembled into contigs using the Trinity tool with the following parameters: seqType = fq, min_contig_length = 100, min_glue = 3, group_pairs_distance = 250, path_rein-forcement_distance = 85, bfly_opts = ′-V 5 -edge-thr = 0.05 -stderr', min_kmer_cov = 3, CPU = 7 [79]. First, the RNA-Seq data were assembled into longer, gapless unique sequences with the Inchworm software. The resulting sequence from Inchworm was defined as contigs. Subsequently, the contigs were clustered and complete de Bruijn graphs were constructed for each cluster using the Chrysalis software. Then, the Butterfly software was used to track the paths that reads and pairs of reads took within the graphs. As multiple samples from the same species were sequenced, unigenes from each sample's assembly were taken for further processing with the sequence clustering software TGICL, for sequence splicing and overlap removal to acquire non-redundant unigenes that were as long as possible [33]. At last, the unigenes from each sample were assembled for unique All-unigene sequences by TGICL software. For gene family clustering, the unigenes were divided into two classes: clusters and singletons. The former were prefixed 'CL' and the cluster id followed this prefix. In one cluster, the similarity between unigenes was more than 70 %. For the singletons, the prefix was 'unigene'. To assess the quality of the de novo assembly, a similarity search against the cranberry genome was further conducted using the Mega BLAST algorithm with a score greater than 200 [36].
All unigene sequences were aligned by BLASTx (Evalue < 0.00001) to protein databases such as NR, Swiss-Prot, GO, COG and KEGG, and to NT using the BlastN (E-value < 0.00001), retrieving proteins with the highest sequence similarity with the given unigenes, along with their protein functional annotations [80,81]. When a unigene was not aligned to any of the above databases, ESTScan software was used to decide upon the sequence direction [82]. For NR annotation, the Blast2GO program was used to get the GO annotation of unigenes [37]. Then, WEGO software was used to conduct GO functional classification for all unigenes and to understand the distribution of the gene functions [83].

SSR detection and validation
MicroSAtellite (MISA) (http://pgrc.ipk-gatersleben.de/ misa/misa.html) was used to identify SSRs in the assembled transcriptome of the cranberry fruit. SSR motifs were identified between one and six nucleotides in size and with 1, 2, 3, 4, 5 and 6 repeats for mono-, di-, tri-, quad-, penta-, and hexamers, respectively. The sequences with a minimum of 150 bp flanking region from SSR loci were used to design primers using PRIMER 3.0 software [84]. The major primer design parameters were as follows: a PCR product length of 80-300 bp; a primer size of 18-28 bp (optimum 23 bp); a melting temperature of 51-58°C (optimum 60°C); and a temperature difference between the forward primer and reverse primer less than 2°C. Then, the final primers were obtained by filtering rules as follows: no SSRs in the primer; align the primers to the unigene sequence with the 5′ site allowed three mismatches and the Fig. 10 The expression profiles of 10 unigenes between W and R libraries in cranberry 3′ site allowed one mismatch; remove the primers which aligned to more than one unigene; find SSRs on the product sequences by ssr_finder (http://www.fresnostate.edu/ ssrfinder/) and keep the product for which the result of ssr_finder is the same as the result of MISA. Thirteen cranberry cultivars were used to validate the SSR markers by PCR. Fresh leaves were collected for genomic DNA extraction by the cetyltrimethylammonium bromide (CTAB) method [85] with a minor modification. The PCR reaction was performed in a 20 μl reaction volume containing 2 μl of 10× PCR Buffer, 0.4 μl of dNTPs (10 mmol/L), 0.4 μl of genomic DNA (100 ng/μl), 0.8 μl of both the forward primer and reverse primer (10 μM), and 0.1 μl of Taq polymerase (5 U/μl). The PCR conditions used were as follows: initial denaturation for 2 min at 94°C, followed by 35 cycles of denaturation for 30 s at 95°C, annealing for 30 s at 55°C (primer specific), and extension for 30 s at 72°C, followed by a final extension for 5 min at 72°C, ending at 4°C. The amplification products were separated by 6 % native polyacrylamide gel electrophoresis and were visualized by silver staining.

Differential gene expression analysis
An analysis of statistical comparison was performed to predict genes with different expression levels using the method previously described by Audic and Claverie [86]. The calculation of unigene expression used the FPKM method and SOAP (http://soap.genomics.org.cn/soapaligner.html), which were able to eliminate the influence of different gene lengths and sequencing levels on the calculation of gene expression [76]. Therefore, the calculated gene expression could be directly used to compare the difference in gene expression between the samples. The FPKM formula was: where C is the number of cleaned reads that were uniquely aligned to one unigene; N is the total number of cleaned reads that were uniquely aligned to all unigenes; and L is the base number in the CDS (Coding sequence) of one unigene. The expression between two samples can also be assessed with the FDR (False Discovery Rate) method, which is a statistical method used in multiple hypothesis testing to correct for P-value [87]. In our analysis, an estimated absolute value of log 2 -fold change of ≥2 and FDR adjusted P-value ≤ 0.001 were used as the threshold to judge the significance of DEGs. DEGs were subsequently carried into GO functional analysis and KEGG pathway analysis.

Confirmation of gene expression by qRT-PCR
Total RNA was extracted as described for the cDNA library preparation and sequencing. Each RNA sample was treated with PrimeScript™ RT Reagent Kit with gDNA Eraser (Takara), to remove residual genomic DNA and reverse transcribe into cDNA, according to the manufacturer's protocol. The primers for qRT-PCR were designed using the Primer Express® Software v3.0.1 (Applied Biosystems), and are listed in Additional file 6. The actin gene of cranberry (CL7164.Contig2_All) was used as the internal housekeeping gene control. qRT-PCR was performed using the SYBR Premix Ex Taq Kit (Takara) according to the manufacturer's instructions. The PCR was carried out on a Stratagene Mx3000P instrument (Agilent, USA) with the following conditions: denaturation at 95°C for 15 s; 40cycles of 95°C for 15 s, and 60°C for 30 s. All qRT-PCR reactions were repeated three times, with three technical replications per experiments. The results were normalized to the expression level of the constitutive actin gene. A relative quantitative method (2 -ΔΔCt ) was used to evaluate the quantitative variation.