The genome sequence of the Leopard Moth, Zeuzera pyrina (Linnaeus, 1761)

We present a genome assembly from an individual male Zeuzera pyrina (the Leopard Moth, Arthropoda; Insecta; Lepidoptera; Cossidae). The genome sequence is 687 megabases in span. Most of the assembly is scaffolded into 31 chromosomal pseudomolecules, including the assembled Z sex chromosome. The mitochondrial genome has also been assembled and is 15.3 kilobases in length. Gene annotation of this assembly on Ensembl identified 22,738 protein coding genes.


Background
The Leopard Moth Zeuzera pyrina is a large (40-60 mm wingspan) moth in the family Cossidae.The common name derives from the 'leopard-spotted' pattern of black and grey blotches on white, partly translucent forewings.The species is distributed widely across Europe, with additional scattered records from Russia, the middle East, Japan and North Africa (GBIF Secretariat, 2022).The moth is also found in eastern Canada and north-eastern regions of the United States where it was accidently introduced in the nineteenth century (GBIF Secretariat, 2022;Solomon, 1995).In the UK, the species has been recorded across southern counties of England and south-east Wales but is not found in northern counties (Randle et al., 2019).A record from Scotland in 2022 was almost certainly via a single larva inside a garden shrub transported in the horticultural trade (Eagleson, 2022).The larvae of Z. pyrina bore inside the trunks and branches of living deciduous trees where they live for two or three years, tunnelling and feeding on wood, before pupating underneath the bark.Digestion of lignocellulose seems to be aided by production of cellulase enzymes by bacteria in the larval gut (Dehghanikhah et al., 2020).The adults do not feed.
The polyphagous wood-boring habit has allowed Z. pyrina to reach pest status in many countries, causing damage and yield loss to commercial crops such as nuts, olives and fruit.Examples include damage to olive plantations in Italy and Egypt (Guario et al., 2002;Hegazi et al., 2015), walnut trees in Iran (Saeidi et al., 2022) and apple orchards in Greece, Bulgaria and Italy (Haniotakis et al., 1999;Kutinkova et al., 2006;Pasqualini & Natale, 1999).Control measures that have been attempted include application of insect growth inhibitors, organophosphate pesticides, pheromone traps and entomopathogenic nematodes (Ashtari et al., 2011;Guario et al., 2002;Salari et al., 2021).
A genome sequence for Z. pyrina will be of great interest in understanding the interactions between insects and their bacterial symbionts, and may facilitate development of targeted pest control methods.

Genome sequence report
The genome was sequenced from one male Z. pyrina specimen (Figure 1) collected in Wytham Woods, UK (latitude 51.77, longitude -1.33).A total of 38-fold coverage in Pacific Biosciences single-molecule HiFi long reads and 68-fold coverage in 10X Genomics read clouds were generated.Primary assembly contigs were scaffolded with chromosome conformation Hi-C data.Manual assembly curation corrected three missing or mis-joins and removed three haplotypic duplications, reducing the scaffold number by 14.29%.
The final assembly has a total length of 686.9 Mb in 36 sequence scaffolds with a scaffold N50 of 24.6 Mb (Table 1).The whole assembly sequence was assigned to 31 chromosomal-level scaffolds, representing 30 autosomes and the Z sex chromosome.Chromosome-scale scaffolds confirmed by the Hi-C data are named in order of size (Figure 2-Figure 5; Table 2).The assembly has a BUSCO v5.3.2 (Manni et al., 2021) completeness of 98.7% (single 98.4%, duplicated 0.3%) using the lepidop-tera_odb10 reference set.While not fully phased, the assembly deposited is of one haplotype.Contigs corresponding to the second haplotype have also been deposited.

Sample acquisition and nucleic acid extraction
Two Z. pyrina specimens (ilZeuPyri1 and ilZeuPyri2) were collected in Wytham Woods, Oxfordshire (biological vicecounty: Berkshire), UK (latitude 51.77, longitude -1.33) on 25 June 2020.The specimens were caught in woodland habitat using a light trap.Both specimens were collected and identified by Douglas Boyes (University of Oxford) and snap-frozen on dry ice.
DNA was extracted at the Tree of Life laboratory, Wellcome Sanger Institute (WSI).The ilZeuPyri1 sample was weighed and dissected on dry ice with head and thorax tissue set aside for Hi-C sequencing.Abdomen tissue was disrupted using a Nippi Powermasher fitted with a BioMasher pestle.High molecular weight (HMW) DNA was extracted using the Qiagen MagAttract HMW DNA extraction kit.Low molecular weight DNA was removed from a 20 ng aliquot of extracted  RNA was extracted from abdomen tissue of ilZeuPyri2 in the Tree of Life Laboratory at the WSI using TRIzol, according to the manufacturer's instructions.RNA was then eluted in 50 μl RNAse-free water and its concentration assessed using a Nanodrop spectrophotometer and Qubit Fluorometer using the Qubit RNA Broad-Range (BR) Assay kit.Analysis of the integrity of the RNA was done using Agilent RNA 6000 Pico Kit and Eukaryotic Total RNA assay.

Sequencing
Pacific Biosciences HiFi circular consensus and 10X Genomics read cloud DNA sequencing libraries were constructed according to the manufacturers' instructions.Poly(A) RNA-Seq libraries were constructed using the NEB Ultra II RNA Library Prep kit.DNA and RNA sequencing were performed by the Scientific Operations core at the WSI on Pacific Biosciences SEQUEL II (HiFi) and Illumina NovaSeq 6000 (RNA-Seq and 10X) instruments.Hi-C data were also generated from head and thorax tissue of ilZeuPyri1 using the Arima v2 kit and sequenced on the Illumina NovaSeq 6000 instrument.et al., 2022), which performed annotation using MitoFinder (Allio et al., 2020).The genome was analysed and BUSCO scores generated within the Blob-ToolKit environment (Challis et al., 2020).Table 3 contains a list of all software tool versions used, where appropriate.

Genome annotation
The BRAKER2 pipeline (Brůna et al., 2021) was used in the default protein mode to generate annotation for the Zeuzera pyrina assembly (GCA_907165235.1) in Ensembl Rapid Release.

Ethics and compliance issues
The materials that have contributed to this genome note have been supplied by a Darwin Tree of Life Partner.The submission of materials by a Darwin Tree of Life Partner is subject to the Darwin Tree of Life Project Sampling Code of Practice.By agreeing with and signing up to the Sampling Code of Practice, the Darwin Tree of Life Partner agrees they will meet the legal and ethical requirements and standards set out within this document in respect of all samples acquired for, and

Lars Höök
Uppsala University, Uppsala, Sweden The data note presents the genome assembly of the Leopard moth, Zeuzera pyrina.The species has a relatively large genome of 687 mb, assembled into 31 chromosome-sized scaffolds.
Production of the assembly is well motivated and will benefit several research areas.
All steps of the methods are explained in sufficient detail for reproducibility and use appropriate protocols and software.
Is the rationale for creating the dataset(s) clearly described?Yes Are the protocols appropriate and is the work technically sound?Yes

Are sufficient details of methods and materials provided to allow replication by others? Yes
Are the datasets clearly presented in a useable and accessible format?Yes Competing Interests: No competing interests were disclosed.
Reviewer Expertise: Evolutionary genetics, Sex chromosome evolution I confirm that I have read this submission and believe that I have an appropriate level of expertise to confirm that it is of an acceptable scientific standard.

Figure 2 .
Figure 2. Genome assembly of Zeuzera pyrina, ilZeuPyri1.1:metrics.The BlobToolKit Snailplot shows N50 metrics and BUSCO gene completeness.The main plot is divided into 1,000 size-ordered bins around the circumference with each bin representing 0.1% of the 686,903,256 bp assembly.The distribution of scaffold lengths is shown in dark grey with the plot radius scaled to the longest scaffold present in the assembly (29,234,349 bp, shown in red).Orange and pale-orange arcs show the N50 and N90 scaffold lengths (24,575,201 and 16,131,496 bp), respectively.The pale grey spiral shows the cumulative scaffold count on a log scale with white scale lines showing successive orders of magnitude.The blue and pale-blue area around the outside of the plot shows the distribution of GC, AT and N percentages in the same bins as the inner plot.A summary of complete, fragmented, duplicated and missing BUSCO genes in the lepidoptera_odb10 set is shown in the top right.An interactive version of this figure is available at https://blobtoolkit.genomehubs.org/view/ilZeuPyri1.1/dataset/CAJRBB01.1/snail.

Figure 5 .
Figure 5. Genome assembly of Zeuzera pyrina, ilZeuPyri1.1:Hi-C contact map.Hi-C contact map of the ilZeuPyri1.1 assembly, visualised using HiGlass.Chromosomes are shown in order of size from left to right and top to bottom.An interactive version of this figure may be viewed at https://genome-note-higlass.tol.sanger.ac.uk/l/?d=br9HVfEoSM-9LbyeJXfSQQ.

Table 2 . Chromosomal pseudomolecules in the genome assembly of Zeuzera pyrina, ilZeuPyri1. INSDC accession Chromosome Size (Mb) GC%
supplied to, the Darwin Tree of Life Project.All efforts are undertaken to minimise the suffering of animals used for sequencing.Each transfer of samples is further undertaken according to a Research Collaboration Agreement or Material Transfer Agreement entered into by the Darwin Tree of Life Partner, Genome Research Limited (operating as the Wellcome Sanger Institute), and in some circumstances other Darwin Tree of Life collaborators.The genome sequence is released openly for reuse.The Zeuzera pyrina genome sequencing initiative is part of the Darwin Tree of Life (DToL) project.All raw sequence data and the assembly have been deposited in INSDC databases.Raw data and assembly accession identifiers are reported in Table1.

Table 3 . Software tools and versions used.
The only issue I noted was in the third sentence of the "Genome sequence report" section where I wasn't sure what the authors meant by "...three missing or mis-joins..."

Is the rationale for creating the dataset(s) clearly described? Yes Are the protocols appropriate and is the work technically sound? Yes Are sufficient details of methods and materials provided to allow replication by others? Yes Are the datasets clearly presented in a useable and accessible format? Yes Competing Interests:
No competing interests were disclosed.

have read this submission and believe that I have an appropriate level of expertise to confirm that it is of an acceptable scientific standard.
This is an open access peer review report distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.