De novo genome and transcriptome resources of the Adzuki bean borer Ostrinia scapulalis (Lepidoptera: Crambidae)

We present a draft genome assembly with a de novo prediction and automated functional annotation of coding genes, and a reference transcriptome of the Adzuki bean borer, Ostrinia scapulalis, based on RNA sequencing of various tissues and developmental stages. The genome assembly spans 419 Mb, has a GC content of 37.4% and includes 26,120 predicted coding genes. The reference transcriptome holds 33,080 unigenes and contains a high proportion of a set of genes conserved in eukaryotes and arthropods, used as quality assessment of the reconstructed transcripts. The new genomic and transcriptomic data presented here significantly enrich the public sequence databases for the Crambidae and Lepidoptera, and represent useful resources for future researches related to the evolution and the adaptation of phytophagous moths. The genome and transcriptome assemblies have been deposited and made accessible via a NCBI BioProject (id PRJNA390510) and the LepidoDB database (http://bipaa.genouest.org/sp/ostrinia_scapulalis/).


a b s t r a c t
We present a draft genome assembly with a de novo prediction and automated functional annotation of coding genes, and a reference transcriptome of the Adzuki bean borer, Ostrinia scapulalis, based on RNA sequencing of various tissues and developmental stages. The genome assembly spans 419 Mb, has a GC content of 37.4% and includes 26,120 predicted coding genes. The reference transcriptome holds 33,080 unigenes and contains a high proportion of a set of genes conserved in eukaryotes and arthropods, used as quality assessment of the reconstructed transcripts. The new genomic and transcriptomic data presented here significantly enrich the public sequence databases for the Crambidae and Lepidoptera, and represent useful resources for future researches related to the evolution and the adaptation of phytophagous moths. The genome and transcriptome assemblies have been deposited and made accessible via a NCBI BioProject (id PRJNA390510) and the LepidoDB database (http://bipaa.genouest. org/sp/ostrinia_scapulalis/

Value of the data
The draft genome represents the first available genome assembly for O. scapulalis. The reference transcriptome of O. scapulalis will allow comparative expression studies.
The new genomic and transcriptomic data enrich the public sequence databases for the Crambidae and Lepidoptera.
The data represent pangenomic resources for future researches related to the evolution and the adaptation of phytophagous moths.

Data
The Adzuki bean borer, Ostrinia scapulalis (hereafter OSCA), is a palaearctic phytophagous moth feeding on various dicotyledons, including hop (Humulus lupulus), mugwort (Artemisia vulgaris) and hemp (Cannabis sativa) [1]. In Europe, it partly co-occurs with its sibling species, the European corn borer, Ostrinia nubilalis, which is a major pest of maize (Zea mays). Previous studies demonstrated that O. scapulalis and O. nubilalis are specialized to their respective host plants [2][3][4][5][6][7] and that their genetic divergence is rather low so that they can be considered as sibling species [8]. Yet, a few genomic sequences and rearrangements are much more divergent than the rest of the genomic background [1,[9][10][11]. These genomic regions are of particular interest to understand the divergence process between O. scapulalis and O. nubilalis. To further investigate the host adaptation and divergence between these two sibling species at a pangenomic scale, we have elaborated new genomic and transcriptomic resources consisting of an OSCA draft genome and a related reference transcriptome. The latter extends a published transcriptomic set generated with Roche 454 sequencing technology [12].

De novo draft genome
Diapausing larvae were collected in mugwort stems in 2008 near Amiens (Picardie, France) and stored in 95% ethanol at −20°C. Whole genomic DNA extracts were obtained from a CTAB-based method [13]. DNA quality and integrity was evaluated through migration on an agarose gel and nanodrop technology. The sex of each sampled larvae was determined with a molecular coamplification of markers specific to each heterochromosome (Z and W in Lepidoptera) as described in Orsucci et al. [6]. Only samples of the ZZ homogametic sex (males in Lepidoptera), were retained for the libraries construction. A 2 × 100 bp shot-gun paired-end library and a 3 (2 × 50 bp) and an 8 kb (2 × 100 bp) mate-pair library were generated using the DNA extract of one larva for each library and the Illumina TruSeq TM and Nextera Mate Pair Library Preparation kits, respectively. All libraries were sequenced by LGC Genomics GmbH (Berlin, Germany) on an Illumina HiSeq 2000 platform using the paired-end protocol. Between 234 and 351 million DNA raw reads were generated per library (Table 1). Assembly and scaffolding of the cleaned reads were done with the software Allpaths-LG   (Table 4). Of these coding genes, 80.3% could be functionally annotated. Furthermore, 19,023 OSCA genes were assigned to 9785 ortholog groups (Fig. 1) of which 93% were shared with at least one of the three Lepidoptera species Bombyx mori, Danaus plexippus or Spodoptera frugiperda.

De novo transcriptome
In March 2011, diapausing larvae were collected in mugwort stems from Nadarzin (Poland) and then reared in the laboratory to obtain fresh tissues from the following developmental stages: eggs and larval whole body and hemolymph from the fifth instar. At the adult stage we sampled and separated heads/thorax from abdomens and males from females. In total, we prepared 7 RNA extracts   [12] on the best matches indicated that 61% of the CDS were reconstructed at least at 60% of the corresponding reference lepidopteran protein homolog, whereas 43% of the transcript CDS were assembled at full length.