The genome sequence of Svensson’s copper underwing, Amphipyra berbera Rungs, 1949 [version 1; peer review: 1 approved]

We present a genome assembly from an individual male Amphipyra berbera (Svensson’s copper underwing; Arthropoda; Insecta; Lepidoptera; Noctuidae). The genome sequence is 582 megabases in span. The majority (99.97%) of the assembly is scaffolded into 31 chromosomal pseudomolecules, with the Z sex chromosome assembled.


Background
Amphipyra berbera (Svensonn's copper underwing) is a large noctuid moth with broad, brown forewings patterned with pale zigzags and hindwings suffused with copper brown. The moth has been recorded across much of Eurasia and North Africa; in the UK it is common across England and Wales with scattered records from Scotland. Svensonn's copper underwing is very similar morphologically to A. pyramidea (copper underwing) and was initially considered a subspecies A. pyramidea berbera (Rungs, 1949) until recognised as a separate species in 1968 (Fletcher, 1968). Adults can be distinguished by the extent of the copper colouration on the underside of the hindwing and larvae can be separated by the colour of the dorsal point on abdominal segment eight. The status of A. berbera as a distinct species is supported by mitochondrial COI barcode data (Chen et al., 2013). Larvae of A. berbera feed on the leaves of several deciduous trees, frequently oak (Quercus). In the UK, adults fly from June to September and are attracted to sweet substances including tree sap. The overwintering stage is as an ovum and pupation occurs underground.

Genome sequence report
The genome was sequenced from one male A. berbera ( Figure 1) collected from Wytham Woods, Oxfordshire (biological vice-county: Berkshire), UK (latitude 51.772, longitude -1.338). A total of 41-fold coverage in Pacific Biosciences single-molecule long reads and 71-fold coverage in 10X Genomics read clouds were generated. Primary assembly contigs were scaffolded with chromosome conformation Hi-C data. Manual assembly curation corrected 8 missing/misjoins, reducing the assembly length by 0.01% and the scaffold number by 15.38%.
The final assembly has a total length of 582 Mb in 33 sequence scaffolds with a scaffold N50 of 20.1 Mb (Table 1). The majority, 99.97%, of the assembly sequence was assigned to 31 chromosomal-level scaffolds, representing 30 autosomes (numbered by sequence length), and the Z sex chromosome (Figure 2- Figure 5; Table 2). The assembly has a BUSCO v5.1.2 (Manni et al., 2021) completeness of 98.9% (single 98.5%, duplicated 0.5%) using the lepidoptera_odb10 reference set. While not fully phased, the assembly deposited is of one haplotype. Contigs corresponding to the second haplotype have also been deposited.

Methods
Sample acquisition and DNA extraction A single male A. berbera (ilAmpBerb1) was collected from Wytham Woods, Oxfordshire (biological vice-county: Berkshire), UK (latitude 51.772, longitude -1.338) by Douglas Boyes,  UKCEH, using a light trap. The sample was identified by the same individual, and preserved on dry ice.
DNA was extracted at the Tree of Life laboratory, Wellcome Sanger Institute. The ilAmpBerb1 sample was weighed and dissected on dry ice with tissue set aside for Hi-C sequencing. Abdomen tissue was cryogenically disrupted to a fine powder using a Covaris cryoPREP Automated Dry Pulveriser, receiving multiple impacts. Fragment size analysis of 0.01-0.5 ng of DNA was then performed using an Agilent FemtoPulse. High molecular weight (HMW) DNA was extracted using the Qiagen MagAttract HMW DNA extraction kit. Low molecular weight DNA was removed from a 200-ng aliquot of extracted DNA using 0.8X AMpure XP purification kit prior to 10X Chromium sequencing; a minimum of 50 ng DNA was submitted for 10X sequencing. HMW DNA was sheared into an average fragment size between 12-20 kb in a Megaruptor 3 system with speed setting 30. Sheared DNA was purified by solid-phase reversible immobilisation using AMPure PB beads with a 1.8X ratio of beads to sample to remove the shorter fragments and concentrate the DNA sample. The concentration of the sheared and purified DNA was assessed using a Nanodrop spectrophotometer and Qubit Fluorometer and Qubit dsDNA High Sensitivity Assay kit. Fragment size distribution was evaluated by running the sample on the FemtoPulse system.

Sequencing
Pacific Biosciences HiFi circular consensus and 10X Genomics read cloud DNA sequencing libraries were constructed according to the manufacturers' instructions. Sequencing was performed by the Scientific Operations core at the Wellcome Sanger Institute on Pacific Biosciences SEQUEL II and Illumina HiSeq X instruments. Hi-C data were generated from head/thorax tissue using the Arima v2 Hi-C kit and sequenced on an Illumina NovaSeq 6000 instrument.

Genome assembly
Assembly was carried out with Hifiasm (Cheng et al., 2021); haplotypic duplication was identified and removed with purge_dups (Guan et al., 2020). One round of polishing was performed by aligning 10X Genomics read data to the assembly with longranger align, calling variants with freebayes (Garrison & Marth, 2012). The assembly was then scaffolded with Hi-C data (Rao et al., 2014) using SALSA2 (Ghurye et al., 2019. The assembly was checked for contamination and corrected using the gEVAL system (Chow et al., 2016) as described previously (Howe et al., 2021). Manual curation (Howe et al., 2021) was performed using gEVAL, HiGlass (Kerpedjiev et al., 2018) and Pretext. The mitochondrial genome was assembled using MitoHiFi (Uliano-Silva et al., 2021). The genome was analysed and BUSCO scores generated within the BlobToolKit environment (Challis et al., 2020). Table 3 contains a list of all software tool versions used, where appropriate.   expertise to confirm that it is of an acceptable scientific standard.