Nanopore sequencing data and structural variants identified in Prunus avium seedlings derived through mutagenesis

DNA from four sweet cherry seedlings derived from gamma-irradiated female parents was sequenced via nanopore technology (Oxford Nanopore MinION). Total data yield was 8.07 Gb, ranging from 0.92 to 3.36 Gb per sample, with the average length of mapped reads ranging from 22 Kbp–24 Kbp. Sequence data was then analysed to identify and characterize variants using a published sweet cherry reference genome. Small and medium-sized indels (55–135 bp), as well as structural variants, including several large indels and complex variants were detected. Of these, 20 variants were localized within protein-coding gene sequences, including those encoding a putative F-box protein, an ADP-ribose glyxohydrolase protein, a predicted 26S protease regulatory subunit, an E3 ubiquitin protein ligase, a UDP-galactose/UDP-blucose transporter, an alpha/beta hydrolase domain-containing protein, a rhodanese-like domain-containing protein, a cytochrome p450 protein, phosphoinositide phosphatase, cysteine synthase-like, phosphoenolpyruvate carboxylase 4, and several uncharacterized proteins. These variations could have functional and phenotypic consequences that are useful in basic research and breeding.


Value of the Data
• Mutation breeding can be used to introduce novel traits such as self-compatibility and dwarfing. • Irradiation commonly introduces large scale lesions in DNA, including chromosomal rearrangements and large deletions. • Breeders and geneticists working on sweet cherry (or related Prunus species) can benefit from this data. • These data can be used to guide targeted phenotyping experiments (including proteomics/metabolomics) to characterize the effects of the mutations identified and to develop markers to track the mutations in progeny for breeding or research purposes.

Data Description
Historically, mutation breeding has been used in sweet cherry to introduce novel traits such as self-compatibility and dwarfing [ 1 , 2 ]. Long-read DNA sequencing technologies, such as nanopore sequencing, are ideally suited for the detection of large-scale changes to DNA structure. The data presented herein include the raw nanopore sequencing data referenced in "Data accessibility" above. In addition, four tables and two supplementary data files are included. Table 1 is a summary of the total number of reads (sequences) and the total number of nucleotides sequenced for each of the four sweet cherry samples. Table 2 lists the percentage of raw sequence data that was mapped to the reference sweet cherry genome, and the average length of both mapped and un-mapped reads. A list of the short (up to 135 bases) insertions detected in the sequence analysis are shown in Table 3 , and a description of the larger structural variants is included in Table 4 . Tables 3 and 4 also include any predicted genes affected by such variants. Supplementary File 1 contains all manuscript tables in Excel format. Supplementary File 2 contains QC reports for sequencing reads for each sample. The structural variants (from Table 4 ) are first, followed by the short insertions. Collectively, these data are useful in demonstrating the utility of nanopore sequencing for genome characterization in sweet cherry, and the variations identified herein are a foundation for additional research in functional genetics and breeding.

Plant Material
The plant material consisted of seedling progeny of irradiated sweet cherry varieties 'Royal Ann', 'Bada', and 'Bing'. Irradiation was accomplished by placing newly sprouted shoots of each variety in a radiation chamber with a 60 Co gamma ray source. Following irradiation, the shoots were immediately grafted onto a rootstock for propagation. Mutant shoots with reduced or compact growth were repropagated by budding (a form of grafting using single buds). When the mutants proved unstable (likely due to chimerism), open-pollinated seed from the mutant trees was collected and planted, and the less vigorous seedlings were selected and propagated vegetatively via budding/grafting. A planting of 12 selections (vegetatively propagated seedling progeny), each with three replicates, was established at the Oregon State University Mid-Columbia Agricultural Research and Extension Center in Hood River, OR. Of the 12 selections, four were sequenced: 1-15, 2-2, 3-1, and 3-14.

DNA Extraction and Nanopore Sequencing
Tissue from field-grown newly expanded leaves was ground to a fine powder in liquid nitrogen using a mortar and pestle. DNA was extracted using a CTAB-based buffer, washed with 70% ethanol, and the dried pellet was re-suspended in low EDTA buffer (10 mM Tris, 0.1 mmn EDTA, pH 8.0). The DNA was quantified using a NanoDrop spectrophotometer and diluted to a concentration of 150 ng/μL. Prior to sequencing, DNA fragments < 25 Kb were removed using a Circulomics Short Read Eliminator Kit [3] . A total of 9 μg of DNA (the maximum for the SRE kit) was processed for each sample according to manufacturer instructions and re-suspended in 50 μL of the provided elution buffer. DNA repair, end-prep, native barcode ligation (for multiplexing), and adapter ligation/cleanup were performed using reagents supplied and/or recommended by Oxford Nanopore Technologies (ONT) with the exceptions that Agencourt AMPure XP beads were Table 2 Read mapping statistics for each cherry sample.  Table 3 List of short and medium-sized indels identified for each sample, their genomic location, length, supporting evidence, and genes containing variant breakpoints.    Table 2 .
Structural variants, indels, and putative chromosomal breakpoints were identified using CLC's "Indels and Structural Variants" tool with the following parameters: P -Value threshold = 0.001, Maximum number of mismatches = 3, Minimum quality score = 20; Minimum relative consensus coverage = 0.5, Filter variants = Yes; Minimum number of reads = 2; Ignore broken pairs = No, Create breakpoints = Yes, Create Indel variants = Yes, Create structural variations = Yes. A detailed report containing positional location of all identified variants was also generated. The data were additionally filtered for variants, indels, and breakpoints present in genes, and the resulting selections extracted. The final number of SVs and Indels for each genotype that passed the specified filtering parameters is as follows: 1-15 -9 structural variants, 3 Indels; 2-2 -2 structural variants, 2 Indels; 3-1 -7 structural variants, 1 Indel; 3-14 -4 structural variants, 0 Indels ( Table 3 , Table 4 , Supplementary File 1).

Annotation with Overlap Information
The .gff file containing the gene annotation information corresponding to the Prunus avium reference genome pseudomolecule (v1.0.a1) was imported into CLC to generate Gene, Exon, and CDS tracks [6] . To identify which of the putative variant end breakpoints were associated in coding regions of the sweet cherry genome, the CLC "Annotate with Overlap Information" feature was used to add the information from the imported gene tracks to the called variant datasets for each genotype. Gene ID and annotation information for indels and structural variants is shown in Tables 3 and 4 .

Declaration of Competing Interest
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Data Availability
Structural Variant Detection in Four Sweet Cherry F1s Derived from Irradiated Parents (Original data) (NCBI SRA Database).