Transcriptome datasets from leaves and fruits of the sweet cherry cultivars ‘Bing’, ‘Lapins’ and ‘Rainier’

Sweet cherry fruits from different cultivars have different pre- and post-harvest qualities. Here we present the transcriptome profile datasets of leaves and mature fruits of three sweet cherry cultivars (‘Bing’, ‘Lapin’ and ‘Rainier’). Using 454 GS-FLX technology (454 Life Sciences, Roche), transcriptomes of leaves and mature fruits were obtained from these cultivars. These transcriptome data sets are reported here.

Value of the data These sweet cherry transcriptome datasets may be used to improve the genome assembly and annotation of Prunus avium [1].
Transcriptome datasets may be used to mine for Single Nucleotide Polymorphisms (SNPs) that are polymorphic between these sweet cherry cultivars.
Transcriptome datasets may be used to identify differentially spliced transcripts in sweet cherry leaves and fruits.
These datasets may be used to identify differentially expressed genes that may correlate with the phenotypic variabilities among these and other sweet cherry cultivars.
Profiling of leaf and fruit transcriptome datasets may be used to identify genetic targets for molecular markers assisted breeding programs [2][3][4].

Data
Here we report the transcriptome datasets of leaves and fruits of three sweet cherry cultivars ('Bing', 'Lapin' and 'Rainier') with contrasting pre-and post-harvest fruit qualities (Table S1). Total RNA sequencing resulted in 956,609 total raw reads, comprising 362 Mb of data, with an average read length of 378 bp. After trimming the number of reads was 938,279 (98% of raw reads) and the average read length was found to be 380 bp. Reads mapped against Prunus avium genome predicted genes (n ¼ 43,673 genes) [1] correspond to 55% of filtered reads and covered on average 24% of all genes in all samples. The summary of the data is listed in Table 1. Predicted genes were also functionally annotated using Blast2GO (Table S2).
A Principal Component Analysis of the predicted genes from all the tissue samples in the sweet cherry cultivar transcriptome datasets was performed in order to determine the similarities between the whole RNA profiles of these samples (Fig. 1).
We analyzed the Differentially Expressed Genes (DEGs) of these transcriptome datasets using three criteria for their expression profiles: (i) the number of reads that maps to each contig was more than or equal to 10 when compared to another cultivar (dual comparison); (ii) the average fold change in transcript accumulation was more than or equal to 2-fold between cultivars and (iii) the Z-Test p-value was less than or equal to 0.05. Under these criteria, 138 and 52 genes were identified to be differentially represented in the pairwise comparison of the leaf or fruit transcriptome datasets, respectively ( Table 2 and Table 3). The transcriptomic analyses for all samples are available in Table S3  and Table S4. Table S5 and Table S6 contain the complete list of genes that are differentially represented in these transcriptome datasets using the pair-wise comparison of the cultivar specific transcriptome datasets of fruit and leaf tissues, respectively.  2. Experimental design, materials, and methods

Plant material
Sweet cherry leaf and fruit samples from the 'Bing' cultivar were collected from La Palma Experimental Station, Faculty of Agronomy PUCV, Quillota (latitude 32º 54' S and longitude 71º12' W). Samples from the 'Lapins' cultivar were collected from Los Andes (latitude 32°49' S and longitude 70°35' W). Samples from the 'Rainier' cultivar were collected from San Francisco de Mostazal (latitude 33°59' S and longitude 70°41' W). The fruits from these three cultivars were harvested at commercial maturity. All the fresh samples were frozen in liquid nitrogen and stored at -80°C until used for total RNA isolation. These cultivars were selected because they are commercially important cultivars that have contrasting pre-and post-harvest fruit quality (Table S1).

Library construction and deep sequencing
The total RNA was isolated using the protocol of Meisel et al. [5]. The quality and quantity of RNA was determined spectrophotometrically (A260/A280 ¼ 1.8 and A260/A230 ¼ 2.0) and electrophoretically using denaturing formaldehyde agarose gel.
Library construction and 454 FLX deep sequencing (454 Life Sciences, Roche) was performed by the Center for Integrated Biotechnology, Washington State University, using 1/8 plate. Trimming and quality filters were applied to the sequences using the CLC Genome Workbench software, version 11.0.1 (CLC Bio [http://www.clcbio.com]) [6].

RNA sequence analysis
The predicted coding sequences within the Prunus avium genome (43,673 predicted genes [1]) were used as reference sets to map the transcripts of these transcriptome datasets. The sequences from each cultivar and tissue were separately mapped against the corresponding reference transcriptome using RNA-seq function of the CLC Genome Workbench version 11.0.1 under the following parameters, similarity: 0.9; length fraction: 0.6; insertion/deletion cost: 3; mismatch cost: 3 and unspecific match limit: 10.
The relative transcript abundance in these datasets were obtained as the unique number of reads mapped to each gene. The transcript abundance in these datasets were compared using a Z-Test [7]. This test compared read counts by considering the proportions in which they make up the total sum of counts in each dataset, correcting for the size of the dataset. For visual inspection, the relative transcript abundance values were transformed using the Log10 method and then normalized by the Quantile method that was the best to fit the result [8,9].

Functional annotation
Functional annotation was performed on the genes differentially represented in the pairwise comparison of the leaf and fruit transcriptome datasets, using the coding regions predicted in the Prunus avium genome [1] and the Blast2GO CLC plugin version 1.11.9 [10] as described in [11]. A besthit annotation was determined for the differentially represented gene sets by using these genes in a BLASTX (version 2.6.0) analysis against the nr NCBI database with an e-value cutoff of 1e -6 . INTER-PROSCAN analysis (version 5.31-70) with Blast2GO default parameters were also performed [12]. Blast2GO was also used for gene ontology mapping, with the program defaults being applied for all annotation steps and a False Discovery Rate (FDR) cut-off at the 0.05% probability level. The data from the INTERPRO terms, enzyme classification codes (EC), and metabolic pathways (KEGG, Kyoto Encyclopedia of Genes and Genomes) were merged with GO terms to provide a larger accumulation of evidence to support the annotations represented in the supplemental tables (Table S2).

Transparency document. Supporting information
Transparency data associated with this article can be found in the online version at https://doi.org/ 10.1016/j.dib.2019.01.044.

Appendix A. Supporting information
Supplementary data associated with this article can be found in the online version at https://doi. org/10.1016/j.dib.2019.01.044.

Appendix B. Supporting information
Supplementary data associated with this article can be found in the online version at https://doi. org/10.1016/j.dib.2019.01.044. These data include Google maps of the most important areas described in this article.