The genome sequence of the Riband Wave, Idaea aversata (Linnaeus, 1758)

We present a genome assembly from an individual male Idaea aversata (the Riband Wave; Arthropoda; Insecta; Lepidoptera; Geometridae). The genome sequence is 437 megabases in span. The whole assembly is scaffolded into 30 chromosomal pseudomolecules, including the assembled Z sex chromosome. The mitochondrial genome has also been assembled and is 17.5 kilobases in length. Gene annotation of this assembly on Ensembl identified 10,165 protein coding genes.


Background
The Riband Wave Idaea aversata (Linnaeus, 1758) is a small member of the Sterrhinae subfamily of geometrid moths (wingspan up to 30 mm). It has been recorded from across the Palearctic region and throughout the UK, where it is commonly attracted to light traps from June to September. The larvae feed on low lying plants such as bedstraw, dandelion, dock, and knotgrass, and overwinter while still relatively small.
Like other members of the subfamily, the Riband Wave has wave-like patterns on the wings, and this species is readily distinguished from similar species by a distinct kink in the third (most distal) line near the leading edge of the forewing. Overall colouration is quite variable, with most individuals grey or sandy coloured, but with some red/orange examples. The species occurs in two principal recognised forms: a banded form (sometimes referred to as form aversata), which has a dark ribbon-like crossband running across the fore-and hindwings; and form remutata, which lacks the dark band and shows only the wave-like pattern of crosslines. Whilst often claimed to be present in equal numbers, Leverton found the highest frequency of the banded (aversata) form to be around 30% in 2001, and only then in Kent, with a frequency nearer 15% in Yorkshire and a clear decline in frequency with higher latitude (Leverton, 2001). In the 1950's, Ford claimed a frequency of "about 5%" for the banded form (Ford, 1955), and Bergmann considered the unbanded form to be "common" (Bergmann, 1938). Breeding experiments in the 1930s and 1940s showed that a single locus determines banding, and that the banded f. aversata is dominant to the unbanded f. remutata (Bergmann, 1938;Ford, 1953;Hawkins, 1937;Hawkins, 1952). Of the three possible genotypes, AA and Ab produce banded individuals and are generally indistinguishable from each other (although homozygotes may have a slightly darker band (Ford, 1953), and bb individuals are unbanded. Ford (1955) suggested that there is heterozygote advantage. Other experimenters have looked at other aspects of colouration: Bergmann, who considered a reddish base colour to be recessive to the dominant grey (Bergmann, 1938); Cockayne, who considered a pink form to be dominant to grey (Cockayne, 1945); and Knill-Jones, who showed that an orange base colour was only found in the banded form, and that a reddish form was dominant (Knill-Jones, 2002).
A genome assembly for the Riband Wave I. aversata will be invaluable to future work aimed at determining the genetic basis of colour polymorphism, and will contribute to the growing set of genomic resources for studying lepidopteran ecology and evolution.

Genome sequence report
The genome was sequenced from one male I. aversata of the unbanded form ( Figure 1) collected in Wytham Woods (latitude 51.77, longitude -1.34). A total of 56-fold coverage in Pacific Biosciences single-molecule HiFi long reads and 86-fold coverage in 10X Genomics read clouds were generated. Primary assembly contigs were scaffolded with chromosome conformation Hi-C data. Manual assembly curation corrected 4 missing joins or mis-joins and removed one haplotypic duplication, reducing the assembly length by 1.7% and the scaffold number by 9.09%, and reducing the scaffold N50 by 1.4%.
The final assembly has a total length of 436.7 Mb in 30 sequence scaffolds with a scaffold N50 of 15.2 Mb ( Table 1). The whole assembly sequence was assigned to 30 chromosomal-level scaffolds, representing 29 autosomes and the Z sex chromosome. Chromosome-scale scaffolds confirmed by the Hi-C data are named in order of size (Figure 2- Figure 5; Table 2). The assembly has a BUSCO v5.3.2 (Manni et al., 2021) completeness of 98.3% (single 97.9%, duplicated 0.4%) using the lepidoptera_odb10 reference set (n = 5,286). While not fully phased, the assembly deposited is of one haplotype. Contigs corresponding to the second haplotype have also been deposited.
Genome annotation report Annotation of the GCA_907269075.1 assembly was generated using the Ensembl genome annotation pipeline. The resulting annotation includes 10165 protein coding genes with an average length of 12352.21 and an average coding length   of 1502.47, and 962 non-protein coding genes. There is an average of 7.08 exons and 6.08 introns per canonical protein coding transcript, with an average intron length of 1676.92. A total of 4000 gene loci have more than one associated transcript. The annotation identified a repeat content of 44.89%.

Sample acquisition and nucleic acid extraction
A male Idaea aversata (ilIdaAver1) was collected from Wytham Woods, Oxfordshire (biological vice-county: Berkshire), UK (latitude 51.77, longitude -1.34) on 29 June 2019 using a light trap. The specimen was collected and identified by Douglas Boyes (University of Oxford) and snap-frozen on dry ice by Peter Holland. The specimen is of the unbanded form remutata.
DNA was extracted at the Tree of Life laboratory, Wellcome Sanger Institute (WSI). The ilIdaAver1 sample was weighed and dissected on dry ice with tissue set aside for Hi-C sequencing. Whole organism tissue was disrupted using a Nippi Powermasher fitted with a BioMasher pestle. High molecular weight (HMW) DNA was extracted using the Qiagen MagAttract HMW DNA extraction kit. Low molecular weight DNA was removed from a 20 ng aliquot of extracted DNA using 0.8X AMpure XP purification kit prior to 10X Chromium sequencing; a minimum of 50 ng DNA was submitted for 10X sequencing. HMW DNA was sheared into an average fragment size of 12-20 kb in a Megaruptor 3 system with speed setting 30. Sheared DNA was purified by solid-phase reversible immobilisation using AMPure PB beads with a 1.8X ratio of beads to sample to remove the shorter fragments and concentrate the DNA sample.
The concentration of the sheared and purified DNA was assessed using a Nanodrop spectrophotometer and Qubit Fluorometer and Qubit dsDNA High Sensitivity Assay kit. Fragment size distribution was evaluated by running the sample on the FemtoPulse system.

Sequencing
Pacific Biosciences HiFi circular consensus and 10X Genomics read cloud DNA sequencing libraries were constructed according to the manufacturers' instructions. DNA sequencing was performed by the Scientific Operations core at the WSI on Pacific Biosciences SEQUEL II (HiFi) and HiSeq X Ten (10X) instruments. Hi-C data were also generated from tissue of ilIdaAver1 using the Arima v2 kit and sequenced on the Illumina NovaSeq 6000 instrument.   and Pretext (Harry, 2022). The mitochondrial genome was assembled using MitoHiFi (Uliano-Silva et al., 2022), which performed annotation using MitoFinder (Allio et al., 2020). The genome was analysed and BUSCO scores generated within the BlobToolKit environment (Challis et al., 2020). Table 3 contains a list of all software tool versions used, where appropriate.

Genome annotation
The Ensembl gene annotation system (Aken et al., 2016) was used to generate annotation for the I. aversata assembly (GCA_907269075.1). Annotation was created primarily through alignment of transcriptomic data to the genome, with gap filling via protein to-genome alignments of a select set of proteins from UniProt (UniProt Consortium, 2019).

Amanda Markee
McGuire Center for Lepidoptera and Biodiversity, University of Florida, Gainesville, Florida, USA The authors here provide a well documented chromosome level assembly for Idaea aversata, a geometrid moth found throughout the UK. This species has unique genotypes related to wing pattern variation, making the full genome useful for studying color polymorphism in Lepidoptera. The methods outlined for isolating nucleic acids were clear and reproducible, I would however add catalog numbers for each kit that was mentioned, if this is in line with the format for this journal. Sequencing methods and materials were clearly explained, and downstream bioinformatic methods were written in chronology of the workflow, making it easy to follow. In the genome sequence report section, similar to the genome annotation report section, I recommend summarizing more information from the figures, including other genome metrics like L50, BlobPlot results, and GC content. Data availability and reuse potential are clearly outlined in the manuscript. My last recommendation is to clarify which select set of proteins were used from UniProt database for the protein annotation (i.e. from insects, arthropods, or geometrids specifically).

Is the rationale for creating the dataset(s) clearly described? Yes
Are the protocols appropriate and is the work technically sound? Yes

Are sufficient details of methods and materials provided to allow replication by others? Yes
Are the datasets clearly presented in a useable and accessible format? Yes Competing Interests: No competing interests were disclosed.

Reviewer Expertise: Lepidoptera, genomics, silk evolution
Reviewer Expertise: Evolutionary biology, population genomics I confirm that I have read this submission and believe that I have an appropriate level of expertise to confirm that it is of an acceptable scientific standard.