The genome sequence of the White-shouldered Marble, Apotomis turbidana (Hübner, 1825)

We present a genome assembly from an individual male Apotomis turbidana (the White-shouldered Marble; Arthropoda; Insecta; Lepidoptera; Tortricidae). The genome sequence is 720.5 megabases in span. Most of the assembly is scaffolded into 28 chromosomal pseudomolecules, including the assembled Z sex chromosome. The mitochondrial genome has also been assembled and is 16.8 kilobases in length. Gene annotation of this assembly on Ensembl identified 22,646 protein coding genes.


Background
Apotomis turbidana (Hübner, 1825) is a moth of the Tortricidae family. A. turbidana is one of many moths in its family which exhibit cryptic colouration, resembling bird droppings, a polyphyletic group known as the 'bird-dropping tortricids'. The forewing markings are a mixture of white and charcoal-grey, and the species exhibits only minor variation in this (Bradley et al., 1979). There is, however, a distinct form from the west of Ireland with silver strigulations and a brownish colour in place of the charcoal-grey markings (Bradley et al., 1979).
Larvae feed on birch (Betula) between April and May, between spun leaves (Bradley et al., 1979;Elliott et al., 2018). Larvae have also been recorded feeding on Salix, Populus, and Quercus, elsewhere in Europe (Bradley et al., 1979;Hancock et al., 2015). Pupation occurs between spun leaves or within the larval habitation in June, and adult moths can be found between June and July (Bradley et al. 1979;Elliott et al., 2018;Hancock et al., 2015). Adults have been found in August and September, suggesting a possible second generation (Elliott et al., 2018). Adult moths fly from before dusk and come to light (Bradley et al., 1979;Elliott et al., 2018;Hancock et al., 2015).
The moth is widespread across Great Britain and Ireland, found in habitats with birch woodland (Bradley et al., 1979). Globally the species is found in northern and central Europe, ranging east to Siberia (Bradley et al., 1979;GBIF Secretariat, 2022;Hancock et al., 2015).
A genome of Apotomis turbidana will facilitate research into the evolution of cryptic colouration in lepidoptera, and its genomic basis. The genome of A. turbidana was sequenced as part of the Darwin Tree of Life Project, a collaborative effort to sequence all named eukaryotic species in the Atlantic Archipelago of Britain and Ireland. Here we present a chromosomally complete genome sequence for A. turbidana, based on one male specimen from Wytham Woods, Oxfordshire, UK.

Genome sequence report
The genome was sequenced from one male Apotomis turbidana specimen (Figure 1) collected from Wytham Woods, Oxfordshire, UK (latitude 51.77, longitude -1.34). A total of 32-fold coverage in Pacific Biosciences single-molecule HiFi long reads and 52-fold coverage in 10X Genomics read clouds was generated. Primary assembly contigs were scaffolded with chromosome conformation Hi-C data. Manual assembly curation corrected 75 missing or mis-joins and removed 22 haplotypic duplications, reducing the assembly length by 1.35% and the scaffold number by 48.81%, and increasing the scaffold N50 by 9.16%.
The final assembly has a total length of 720.5 Mb in 43 sequence scaffolds with a scaffold N50 of 27.1 Mb (Table 1). Most (99.98%) of the assembly sequence was assigned to 28 chromosomal-level scaffolds, representing 27 autosomes, and the Z sex chromosome. Chromosome-scale scaffolds confirmed by the Hi-C data are named in order of size (Figure 2- Figure 5; Table 2). While not fully phased, the assembly deposited is of one haplotype. Contigs corresponding to the second haplotype have also been deposited.

Sample acquisition and nucleic acid extraction
A male Apotomis turbidana specimen (ilApoTurb1) was collected from Wytham Woods, Oxfordshire (biological vice-county: Berkshire), UK (latitude 51.77, longitude -1.34) on 13 June 2020. The specimen was taken from woodland habitat by Douglas Boyes (University of Oxford) using a light trap. The specimen was identified by the collector and snap-frozen on dry ice.
DNA was extracted at the Tree of Life laboratory, Wellcome Sanger Institute (WSI). The ilApoTurb1 sample was weighed and dissected on dry ice with tissue set aside for Hi-C sequencing. Whole organism tissue was disrupted using a Nippi Powermasher fitted with a BioMasher pestle. High molecular weight (HMW) DNA was extracted using the Qiagen MagAttract HMW DNA extraction kit. Low molecular

Sequencing
Pacific Biosciences HiFi circular consensus and 10X Genomics read cloud DNA sequencing libraries were constructed according to the manufacturers' instructions. DNA sequencing was performed by the Scientific Operations core at the WSI on Pacific Biosciences SEQUEL II (HiFi) and HiSeq X Ten (10X) instruments. Hi-C data were also generated from tissue of ilApoTurb1 using the Arima v2 kit and sequenced on the HiSeq X Ten instrument.

Genome assembly, curation and evaluation
Assembly was carried out with Hifiasm (Cheng et al., 2021) and haplotypic duplication was identified and removed with purge_dups (Guan et al., 2020). One round of polishing was performed by aligning 10X Genomics read data to the assembly with Long Ranger ALIGN, calling variants with FreeBayes (Garrison & Marth, 2012). The assembly was then scaffolded with Hi-C data (Rao et al., 2014) using SALSA2 (Ghurye et al., 2019. The assembly was checked for contamination and corrected using the gEVAL system (Chow et al., 2016) as described previously (Howe et al., 2021).

Genome annotation
The BRAKER2 pipeline (Brůna et al., 2021) was used in the default protein mode to generate annotation for the Apotomis turbidana assembly (GCA_905147355.2). in Ensembl Rapid Release. This data note presents high-quality, foundational genomic resources for the tortrid moth species Apotomis turbidana (commonly known as the White-shouldered Marble). The report is extremely clearly set out, and uses appropriate and cutting-edge tools to support the construction, analysis, annotation and evaluation of the target genome. The quality of the assembly is excellent and the compilation and presentation of key statistics is clear and makes information and genome resources very easily available to the research community. The information provided in the natural history section of the introduction, whilst concise, is nonetheless comprehensive, relevant and engaging. The sole error I identified is typographical and minor-a stray "." in the genome annotation methods section after the accession number.

Is the rationale for creating the dataset(s) clearly described? Yes
Are the protocols appropriate and is the work technically sound? Yes

Are sufficient details of methods and materials provided to allow replication by others? Yes
Are the datasets clearly presented in a useable and accessible format? Yes expertise to confirm that it is of an acceptable scientific standard.