The genome sequence of the Olive Pearl, Udea olivalis (Denis & Schiffermüller, 1775) [version 1; peer review: awaiting peer review]

We present a genome assembly from an individual male Udea olivalis (the Olive Pearl; Arthropoda; Insecta; Lepidoptera; Crambidae). The genome sequence is 624.4 megabases in span. Most of the assembly is scaffolded into 31 chromosomal pseudomolecules, including the Z sex chromosome. The mitochondrial genome has also been assembled and is 15.3 kilobases in length.


Background
The superfamily Pyraloidea includes over 15,000 species of moths adapted to a wide diversity of habitats. Molecular phylogenetic analysis divides the superfamily into two sister clades, each usually given family status: the Pyralidae and Crambidae (Regier et al., 2012). Udea olivalis, sometimes given the common name the Olive Pearl, is a widespread member of the latter clade. Similar to several related 'crambids', U. olivalis holds its wings at rest in a flat delta shape, clearly showing the grey-brown ground colour of the forewings marked with cream spots, including a diagnostic central trapezoid shape (Asher et al., 2013).
U. olivalis is distributed patchily across northern and central Europe and can be locally common in southern England, Wales, Northern Ireland, lowland regions of Scotland and eastern regions of Ireland (GBIF Secretariat, 2022;NBN Atlas Partnership, 2021;National Biodiversity Data Centre, 2023). The moth is found in woodlands, hedgerows and suburban gardens and in some regions is more frequent in woodlands on calcareous soils (Davey, 2019). The larvae feed on a wide range of herbaceous plants, including woundworts Stachys sp., nettle Urtica dioica, ground ivy Nepeta hederacea, dog's mercury Mercurialis perennis, dock Rumex sp. and hop Humulus lupulus (Beirne, 1952). In Britain and Ireland, the species is univoltine, with the adults on the wing primarily in June and July (Asher et al., 2013;NBN Atlas Partnership, 2021;National Biodiversity Data Centre, 2023).
A genome sequence for U. olivalis will facilitate studies investigating adaptations to polyphagy and contribute to the growing set of genomic resources for understanding the evolutionary diversification of Lepidoptera.

Genome sequence report
The genome was sequenced from one male Udea olivalis ( Figure 1) collected from Wytham Woods, Oxfordshire (biological vice-county: Berkshire), UK (latitude 51.77, longitude -1.34). A total of 36-fold coverage in Pacific Biosciences single-molecule HiFi long reads was generated. Primary assembly contigs were scaffolded with chromosome conformation Hi-C data. Manual assembly curation corrected 19 missing joins or mis-joins and removed one haplotypic duplication, reducing the scaffold number by 16.07%.
The final assembly has a total length of 624.4 Mb in 47 sequence scaffolds with a scaffold N50 of 21.5 Mb (Table 1). Most (99.87%) of the assembly sequence was assigned to 31 chromosomal-level scaffolds, representing 30 autosomes, and the Z sex chromosome. Chromosome-scale scaffolds confirmed by the Hi-C data are named in order of size (Figure 2- Figure 5; Table 2). While not fully phased, the assembly deposited is of one haplotype. Contigs corresponding to the second haplotype have also been deposited. The mitochondrial genome was also assembled and can be found as a contig within the multifasta file of the genome submission.
Metadata for specimens, spectral estimates, sequencing runs, contaminants and pre-curation assembly statistics can be found at https://links.tol.sanger.ac.uk/species/1002971.

Sample acquisition and nucleic acid extraction
Woods, Oxfordshire (biological vice-county: Berkshire), UK (latitude 51.77, longitude -1.34) on 16 June 2021. The specimen was taken from the orchard by Douglas Boyes (University of Oxford) using a net. The specimen was identified by the collector and snap-frozen on dry ice.
The ilUdeOliv2 sample was prepared at the Tree of Life laboratory, Wellcome Sanger Institute (WSI). the sample was weighed and dissected on dry ice with tissue set aside for Hi-C sequencing. Whole organism tissue was disrupted using a Nippi Powermasher fitted with a BioMasher pestle. DNA was extracted at the WSI Scientific Operations core using the Qiagen MagAttract HMW DNA kit, according to the manufacturer's instructions.

Sequencing
Pacific Biosciences HiFi circular consensus DNA sequencing libraries were constructed according to the manufacturers' instructions. DNA sequencing was performed by the Scientific Operations core at the WSI on the Pacific Biosciences SEQUEL II (HiFi) instrument. Hi-C data were also generated from tissue of ilUdeOliv2 that was set aside for the purpose using the Arima2 kit and sequenced on the Illumina NovaSeq 6000 instrument.

Genome assembly, curation and evaluation
Assembly was carried out with Hifiasm (Cheng et al., 2021) and haplotypic duplication was identified and removed with purge_dups (Guan et al., 2020). The assembly was then scaffolded with Hi-C data (Rao et al., 2014) using YaHS (Zhou et al., 2023). The assembly was checked for contamination and corrected using the gEVAL system (Chow et al., 2016) as described previously (Howe et al., 2021). Manual curation was performed using gEVAL, HiGlass (Kerpedjiev et al., 2018) and Pretext (Harry, 2022). The mitochondrial genome was assembled using MitoHiFi (Uliano-Silva et al., 2022), which runs MitoFinder (Allio et al., 2020) or MITOS (Bernt et al., 2013) and uses these annotations to select the final mitochondrial contig and to ensure the general quality of the sequence. To evaluate the assembly, MerquryFK was used to estimate consensus quality (QV) scores and k-mer completeness (Rhie et al., 2020). The genome was analysed within the BlobToolKit environment (Challis et al., 2020) and BUSCO scores (Manni et al., 2021;Simão et al., 2015) were calculated. Table 3 contains a list of software tool versions and sources.     The genome sequence is released openly for reuse. The Udea olivalis genome sequencing initiative is part of the Darwin Tree of Life (DToL) project. All raw sequence data and the assembly have been deposited in INSDC databases. The genome will be annotated using available RNA-Seq data and presented through the Ensembl pipeline at the European Bioinformatics Institute. Raw data and assembly accession identifiers are reported in Table 1.