Draft genome data of Prunus avium cv ‘Stella’

Prunus avium cv. ‘Stella’ total cellular DNA was isolated from emerging leaf tissue and sequenced using Roche 454 GS FLX Titanium, and Illumina HiSeq 2000 High Throughput Sequencing (HTS) technologies. Sequence data were filtered and trimmed to retain nucleotides corresponding to Phred score 30, and assembled with CLC Genomics Workbench v.6.0.1. A total of 107,531 contigs were assembled with 185 scaffolds with a maximum length of 132,753 nucleotides and an N50 value of 4,601. The average depth of coverage was 135.87 nucleotides with a median depth of coverage equal to 31.50 nucleotides. The draft ‘Stella’ genome presented here covers 77.8% of the estimated 352.9Mb P. avium genome and is expected to facilitate genetics and genomics research focused on identifying genes and quantitative trait loci (QTL) underlying important agronomic and consumer traits.


Value of the Data
• The sequence data from the first self-fertile named sweet cherry cultivar generated via mutation breeding will be useful for genomics research. • The genomic data from 'Stella' cultivar are valuable for understanding the genetic diversity of sweet cherry, since this fruit crop experienced a genetic bottleneck during its domestication. • Plant breeders, bioinformatics scientists, genomics and genetics scientists and biotechnologists will benefit from these data. • These data can be used in developing trait-linked molecular markers and quantitative trait loci to develop superior cultivars. The collective information will also aid in performing functional characterization of genes.

Data Description
Cultivated Sweet Cherry varieties are an outcome of the domestication of the wild cherry Prunus avium L., and are thought to have originated in the region between the Black and Caspian Seas [ 1 , 2 ]. The data from Prunus avium cv. 'Stella' was generated using two different next generation sequencing platforms. Single, 8 kb paired-end (PE), and 20 kb PE reads were generated using pyrosequencing on the 454 GS FLX instrument, and 100 bp PE reads were generated using Illumina 20 0 0 instrument. A total of 18.38 GB of data were generated and with the genome size estimated to be 352.9 Mb, these data represent 81.68X coverage of the genome. Specific details about the data are summarized in Table 1 .
A draft genome assembly of Prunus avium cv 'Stella' was developed using CLC genomics that consisted of 107,531 assembled contigs organized into185 scaffolds with a maximum length of 132,753 nucleotides (nt) and an N 50 value of 4,601. Average depth of coverage was 135.87nt, and median depth of coverage 31.50nt. The assembly summary report is attached as Supplementary file 1. In addition contig information in terms of consensus length, total read count and reads in pairs and average coverage is summarized in Supplementary file 2.

Leaf material and Genomic DNA Purification
Leaf material from 'Stella' sweet cherry cultivar was collected from WSU IAREC, Prosser. Total genomic DNA was extracted from young leaf tissue using cetyltrimethylammonium bromide (CTAB) phenol chloroform extraction method [3] . Extracted DNA pellets were air dried and suspended in 50 μl of nuclease-free water and incubated at 37 °C with DNase free RNase for 30 min. RNase was inactivated by incubating the tubes at 65 °C for 10min. DNA was quantified using Nanodrop 80 0 0 spectrophotometer (Thermo Scientific, Waltham, MA, USA) and 50 ng of extracted genomic DNA was electrophoresed on a 1% agarose gel and compared to lambda DNA dilution series (100, 80, 60, 40, 20, 10 ng) to confirm quality and quantity.

Genome Sequencing and Assembly
The 'Stella' sweet cherry genome was sequenced using multiple sequencing platforms. The data were primarily generated via Illumina sequencing platform where 76 × Illumina data were obtained from 2 × 100 standard Illumina HiSeq 20 0 0 sequencing. Read files were obtained after initial sorting and filtering of the data via Illumina's standard data processing. Additional sequencing data were generated on the 454-sequencing platform accounting for 1.18 Gb data.
A reference-based assembly of cherry genomic Illumina and 454 data was performed using CLC Genomics assembler v 7.0 with the peach genome v 2.0 as the reference [4] and using the default parameters: length fraction = 0.5, Similarity fraction = 0.8. Additionally, a de novo assembly was generated based on Illumina reads using CLC Genomics v 7.0 with the default Illumina assembly parameters: Minimum contig length = 200, mismatch cost = 2, Insertion cost = 3, Deletion cost = 3, Length fraction = 0.5, Similarity fraction = 0.8. This assembly generated 96,080 contiguous sequences with an N50 of 4,130 from a total of 136,453,160 high quality reads.

Ethics Statement
This work did not involve human subjects, animal experiments and data collected from social media platforms.

Declaration of Competing Interest
The authors declare that they have no known competing financial interests or personal relationships which have, or could be perceived to have, influenced the work reported in this article.