Data on draft genomes and transcriptomes from females and males of the flour moth, Ephestia kuehniella

We present genomes and pupal transcriptomes of the Mediterranean flour moth, Ephestia kuehniella. The moth is a world-wide storage pest as well as a laboratory species with a considerable background in developmental biology, genetics, and cytogenetics. The sequence data were derived from a highly inbred laboratory strain and, hence, display very little heterozygosity. Female and male genomes and transcriptomes are represented separately in two sets each of raw and assembled sequence data. They are designed as a basis to develop new strategies in pest control, to elucidate the molecular adaptation for its peculiar lifestyle, and for research on sex chromosome structure, sex determination and sex-specific gene activity. For a test, all genes known or suspected to have a role in sex determination were extracted from the data. Raw sequencing data and assemblies are available at European Nucleotide Archive under accession number PRJEB49052.


Specifications
Entomology and insect science Specific subject area Insects, Lepidoptera, Genomics, Transcriptomics Type of data Raw data from DNA and RNA-sequencing of females and males (fastq files) Genome assembly (fasta files) Transcriptome assembly (fasta files)

Value of the Data
• The source of the genomes and transcriptomes, E. kuehniella , is a storage pest with worldwide distribution. It is also a favorable laboratory species and has a rich background in developmental biology, genetics, and cytogenetics. • Researchers developing new molecular strategies of pest control will benefit from the data as well as those interested in insect phylogeny, genetic adaptation for the peculiar lifestyle of the species and its sex determination, sex chromosome content and sex-specific expression of genes. • The female and male genomes and transcriptomes are from a highly inbred line and have a very low level of heterozygosity. This makes them especially valuable for female-versusmale comparisons. The developmental stage, mid-pupa, is a stage when genes involved in morphogenesis and sex-differentiation are supposed to be active.

Data Description
The dataset contains a draft genome and draft pupal transcriptome assembly, separately from females and males of the Mediterranean flour moth, E. kuehniella (Lepidoptera, Pyralidae), besides two sets of raw sequencing data referred to as 'Mainz' and 'Novogene'. For dataset 'Mainz', data from a highly inbred line was collected from a single female and a single male individual for the transcriptome assembly and from two female and two male individuals for the genome assembly. RNA libraries were submitted to paired-end Illumina sequencing and DNA libraries were sequenced using paired-end and 8 kb mate-pair Illumina sequencing technology (dataset Table 1 Accession numbers for the European Nucleotide Archive (ENA) for the sequencing data and Genbank IDs for E. kuehniella orthologs genes known or suspected to have a role in its sex determination.  Table 1 ). A second set of raw data (dataset 'Novogene') was obtained by pooling 5 females and 5 males separately of the same inbred E. kuehniella strain and can be obtained from the same study accession number. RNA and DNA libraries from this dataset were sequenced using paired-end sequencing and the RNA-seq data was used to perform a second transcriptome assembly for each sex (accession numbers for transcriptome assemblies derived from the 'Novogene' dataset: female ERS8464945, male ERS8464946). Genome sizes for the haploid female and male genomes were estimated using the 'Novogene' data. Estimates based on a kmer approach were 363Mb (megabases) for the haploid female genome and 365Mb for the male genome. This is significantly less than 440Mb, the value determined by flow cytometry and confirmed by Feulgen cytometry [1] . Assembled genomes were 357Mb (female) and 354Mb (male) with an N50 of 11,860 bp (female) and 12,636 bp (male). The longest contigs were ∼197 kb in the female genome assembly and ∼426 kb in the male assembly, respectively. GC content was very similar between the two sexes ( ∼36%). Further assembly details are shown in Table 2 . The completeness of the genome and transcriptome assemblies was assessed using BUSCO with the lepidoptera-odb10 lineage dataset ( Table 2 ). The 'Novogene' dataset was used to estimate heterozygosity. As expected from a highly inbred line, heterozygosity was very low. For the female genome, heterozygosity was estimated between 0.152 and 0.156% and for the male assembly the estimated heterozygosity was between 0.034 and 0.037%. The higher estimate of heterozygosity in females is probably due to the fact that females are the Table 2 Assembly statistics for the genome and transcriptome assembly and results from benchmarking universal single-copy orthologs (BUSCO) analysis against lepidoptera-odb10 as reference dataset for genome and transcriptome completeness. Percentage of genes per assembly from BUSCO analysis are shown for complete single copy and duplicated genes as well as for fragmented genes (5286 genes in total). heterogametic sex in E. kuehniella and have WZ sex chromosomes while males are homogametic with a ZZ sex chromosome pair. For a test of the data set, we searched for the E. kuehniella orthologs of all genes known or suspected to have a role in its sex determination. EkMasc and EkMascB , the orthologues of Masculinizer ( Masc ) from Bombyx mori were recently described to produce the primary signal of the sex determining cascade in E. kuehniella [2] . Our assemblies allowed us to extract these and Ekdsx, EkPSI, EkIMP, EkTra2, EkSxl, EkHSP70, as well as the sex-specific splice variants of Ekdsx (GenBank accession numbers: OU228360-OU228368; see Table 1 for details).

Sample collection
E. kuehniella strain L has been kept in laboratory cultures for more than 80 years. For sequencing, female and male mid-stage pupae were selected. One female and one male pupa each were used in paired-end and 8 kb mate-pair genome sequencing as well as in paired end transcriptome sequencing for data set 'Mainz'. The 'Novogene' data set was derived from mid-stage pupae for RNA and DNA sequencing (five females and males each, which were pooled by sex for sequencing).

Declaration of Competing Interest
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Data Availability
Sequencing read data and assemblies for Ephestia kuehniella draft genomes and transcriptomes (Original data) (European Nucleotide Archive (ENA)).