The genome sequence of the Common Pug, Eupithecia vulgata (Haworth, 1809)

We present a genome assembly from an individual male Eupithecia vulgata (the Common Pug; Arthropoda; Insecta; Lepidoptera; Geometridae). The genome sequence is 454.7 megabases in span. Most of the assembly is scaffolded into 31 chromosomal pseudomolecules, including the assembled Z sex chromosome. The mitochondrial genome has also been assembled and is 17.1 kilobases in length.


Background
The Common Pug is a small (15-18 mm wingspan) Geometrid moth, common across the UK and wider Palearctic, originally named as Phalaena vulgata by Adrian Hardy Haworth.Three subspecies are typically recognised in the UK: the widespread E. vulgata vulgata; E. vulgata scotia from Scotland (Cockayne, 1951); and E. vulgata clarensis from County Clare (Huggins, 1962), although some authors do not consider the latter two subspecies as valid and propose instead that they should be considered forms (Riley & Prior, 2003).Common Pugs are readily attracted to light, especially males, and peak flight time in the UK is between mid-May to mid-July, although some individuals have been reported as early as March or as late as September (NBN Atlas Partnership, 2021), and there can be a second emergence in August, particularly in the south.Larvae are polyphagous, and consume a range of deciduous trees including hawthorn, sallow, and oak, and shrubs and herbaceous plants including bramble, ragworts, hogweed and dandelion.E. vulgata was listed as 'Least concern' in a recent review of macro-moth status in Great Britain, based on records from 1594 hectads (10 km × 10 km grid squares), far exceeding the ≥15 hectads required to achieve this classification (Fox et al., 2019).
As with other Pugs, the forewings are held at right angles to the body when at rest, and the hindwings are covered by the forewings.Colouration is variable, with a typically reddishbrown base colour which may or may not include a whitish spot in the trailing corner and a darker discal spot, and usually with pale cross-lines angled at the leading edge.Identification is sometimes complicated by the co-occurrence of several colour morphs, including a melanic form (f. atropicta Dietze 1910) and another that lacks cross-lines but maintains the overall ground colour (f.unicolor Lempke 1951).As with other melanic moth species, it is possible that the cortex gene underlies the melanic form (van't Hof et al., 2019).The genome assembly reported here will aid the testing of this hypothesis and facilitate study of the genetic basis of the widespread colour variation.

Genome sequence report
The genome was sequenced from one male Eupithecia vulgata (Figure 1) collected from Wytham Woods, Oxfordshire, UK (latitude 51.77, longitude -1.32).A total of 44-fold coverage in Pacific Biosciences single-molecule HiFi long reads was generated.Primary assembly contigs were scaffolded with chromosome conformation Hi-C data.Manual assembly curation corrected nine missing or mis-joins and removed one haplotypic duplication, reducing the scaffold number by 2.44%.
The final assembly has a total length of 454.7 Mb in 40 sequence scaffolds with a scaffold N50 of 16.1 Mb (Table 1).Most (99.92%) of the assembly sequence was assigned to 31 chromosomal-level scaffolds, representing 30 autosomes and the Z sex chromosome.Chromosome-scale scaffolds confirmed by the Hi-C data are named in order of size (Figure 2-Figure 5; Table 2).While not fully phased, the assembly deposited is of one haplotype.Contigs corresponding to the second haplotype have also been deposited.The estimated Quality Value (QV) of the final assembly is 68.5 with k-mer completeness of 100%, and the assembly has a BUSCO v5.3.2 (Manni et al., 2021) completeness of 97.8% (single 97.1%, duplicated 0.7%) using the lepidoptera_odb10 reference set (n = 5,286).

Sample acquisition and nucleic acid extraction
Two Eupithecia vulgata specimens (ilEupVulg1 and ilEupVulg2) were collected from Wytham Woods, Oxfordshire (biological vice-county: Berkshire), UK (latitude 51.77, longitude -1.32) on 28 May 2021 and 16 June 2021 respectively.The specimens were taken from woodland habitat by Douglas Boyes (University of Oxford) using a light trap.The specimens were identified by the collector and snap-frozen on dry ice.
DNA was extracted at the Tree of Life laboratory, Wellcome Sanger Institute (WSI).The ilEupVulg1 sample was weighed and dissected on dry ice.Whole organism tissue was disrupted using a Nippi Powermasher fitted with a BioMasher pestle.High molecular weight (HMW) DNA was extracted using the Qiagen MagAttract HMW DNA extraction kit.HMW DNA was sheared into an average fragment size of 12-20 kb in a Megaruptor 3 system with speed setting 30.Sheared DNA  was purified by solid-phase reversible immobilisation using AMPure PB beads with a 1.8X ratio of beads to sample to remove the shorter fragments and concentrate the DNA sample.The concentration of the sheared and purified DNA was assessed using a Nanodrop spectrophotometer and Qubit Fluorometer and Qubit dsDNA High Sensitivity Assay kit.
Fragment size distribution was evaluated by running the sample on the FemtoPulse system.    3 contains a list of software tool versions and sources.

Ethics and compliance issues
The materials that have contributed to this genome note have been supplied by a Darwin Tree of Life Partner.The submission

Is the rationale for creating the dataset(s) clearly described? Yes
Are the protocols appropriate and is the work technically sound?Yes

Are sufficient details of methods and materials provided to allow replication by others? Yes
Are the datasets clearly presented in a useable and accessible format?Yes Competing Interests: No competing interests were disclosed.
Reviewer Expertise: Molecular ecology I confirm that I have read this submission and believe that I have an appropriate level of expertise to confirm that it is of an acceptable scientific standard.This paper presents the genome assembly of the common pug, Eupithecia vulga.The quality of the raw data is excellent and the sequencing and assembly methods employed are appropriate.Consequently, the resulting assembly reachs a high level of quality.
However, I have a few minor concerns regarding the manuscript that require further details for a more comprehensive evaluation: It would be beneficial to include information about existing genomics resources for the Eupitheciae genus.For instance, acknowledging that the genome of Eupithecia dodoneata has already been sequenced by DToL would enhance the context.

○
Although 31 chromosomes were recovered, it remains unclear if this corresponds with the expected number determined by alternative techniques or in comparison to close species.

○
Providing summary statistics for the generated raw HiFi data, such as the number of reads, N50, mean, and median size, would assist in assessing the raw data's quality.

○
The depth of coverage for Hi-C data should be explicitly mentioned.

○
Clarification is needed on how k-mer completeness was calculated, and provide corresponding plots or statistics.Also an estimation of heterozygosity level would be informative and should be included.
○ Specify the methodology employed to select the Z chromosome from the assembly.If based on read coverage, include this data in Table 2.

○
To verify if the chromosomes reach the telomere, search for telomere motifs at the ends of the scaffolds.

○
Explain why the mitochondrial genome was not included in the primary assembly, and needed to be assembled from reads apart with MitoHiFi.
○ Scrutinize the scaffolds with higher GC content and lower coverage, labeled as arthropod in the blobtoolkit Figure 3, and conduct, if possible, more investigation in order to precise their origin (for instance colinearity with other pugs genome).

○
For reproductibility, please provide the parameters used for each software run, if any ; otherwise write 'default parameters'.

○
Lastly, consider complementing the genome with annotation data, including RNASeq, and conducting comparisons with other pug genomes, such as the oak-tree pug.This additional information would enrich the manuscript.

Is the rationale for creating the dataset(s) clearly described? Yes
Are the protocols appropriate and is the work technically sound?

Are sufficient details of methods and materials provided to allow replication by others? Partly
Are the datasets clearly presented in a useable and accessible format?Yes Competing Interests: No competing interests were disclosed.
I confirm that I have read this submission and believe that I have an appropriate level of expertise to confirm that it is of an acceptable scientific standard, however I have significant reservations, as outlined above.
Reviewer Report 24 January 2024 https://doi.org/10.21956/wellcomeopenres.21329.r72068 The manuscript by Boyes and colleagues reports the genome sequence of the Common Pug Eupithecia vulgate, which is part of the Darwing Tree of Life project.As such, the manuscript reports a genome assembly of high quality, obtained with a highly standardized and reproducible pipeline.The methodologies are clear and reproducible and the report of assembly metrics is straightforward and easy to follow.I only have a very few minor comments that the authors may choose to address at their discretion.
"31 chromosomal-level scaffolds, representing 30 autosomes and the Z sex chromosome" -> was this in line with previous cytogenetic estimates (if available?) As a general comment I usually always make to any genome paper, having a rough k-mer based estimate of heterozygosity would be useful for anybody interested in planning future population genomics studies on this species.

Is the rationale for creating the dataset(s) clearly described? Yes
Are the protocols appropriate and is the work technically sound?Yes

Are sufficient details of methods and materials provided to allow replication by others? Yes
Are the datasets clearly presented in a useable and accessible format?Yes

Figure 2 .
Figure 2. Genome assembly of Eupithecia vulgata, ilEupVulg1.1:metrics.The BlobToolKit Snailplot shows N50 metrics and BUSCO gene completeness.The main plot is divided into 1,000 size-ordered bins around the circumference with each bin representing 0.1% of the 454,699,389 bp assembly.The distribution of scaffold lengths is shown in dark grey with the plot radius scaled to the longest scaffold present in the assembly (24,908,255 bp, shown in red).Orange and pale-orange arcs show the N50 and N90 scaffold lengths (16,073,052 and 10,404,322 bp), respectively.The pale grey spiral shows the cumulative scaffold count on a log scale with white scale lines showing successive orders of magnitude.The blue and pale-blue area around the outside of the plot shows the distribution of GC, AT and N percentages in the same bins as the inner plot.A summary of complete, fragmented, duplicated and missing BUSCO genes in the lepidoptera_odb10 set is shown in the top right.An interactive version of this figure is available at https://blobtoolkit.genomehubs.org/view/ilEupVulg1.1/dataset/CAMLCU01/snail.

Figure 3 .
Figure 3. Genome assembly of Eupithecia vulgata, ilEupVulg1.1:GC coverage.BlobToolKit GC-coverage plot.Scaffolds are coloured by phylum.Circles are sized in proportion to scaffold length.Histograms show the distribution of scaffold length sum along each axis.An interactive version of this figure is available at https://blobtoolkit.genomehubs.org/view/ilEupVulg1.1/dataset/CAMLCU01/blob.

Figure 4 .
Figure 4. Genome assembly of Eupithecia vulgata, ilEupVulg1.1:cumulative sequence.BlobToolKit cumulative sequence plot.The grey line shows cumulative length for all scaffolds.Coloured lines show cumulative lengths of scaffolds assigned to each phylum using the buscogenes taxrule.An interactive version of this figure is available at https://blobtoolkit.genomehubs.org/view/ilEupVulg1.1/dataset/ CAMLCU01/cumulative.

Figure 5 .
Figure 5. Genome assembly of Eupithecia vulgata, ilEupVulg1.1:Hi-C contact map.Hi-C contact map of the ilEupVulg1.1 assembly, visualised using HiGlass.Chromosomes are shown in order of size from left to right and top to bottom.An interactive version of this figure may be viewed at https://genome-note-higlass.tol.sanger.ac.uk/l/?d=K1u-LjpZRhitcCllQNqeCA.

Table 3 . Software tools: versions and sources. Software tool Version of
materials by a Darwin Tree of Life Partner is subject to the Darwin Tree of Life Project Sampling Code of Practice.By agreeing with and signing up to the Sampling Code of Practice, the Darwin Tree of Life Partner agrees they will meet the legal and ethical requirements and standards set out within this document in respect of all samples acquired for, and supplied to, the Darwin Tree of Life Project.All efforts are undertaken to minimise the suffering of animals used for sequencing.Each transfer of samples is further undertaken according to a Research Collaboration Agreement or Material Transfer Agreement entered into by the Darwin Tree of Life Partner, Genome Research Limited (operating as the Wellcome Sanger Institute), and in some circumstances other Darwin Tree of Life collaborators.
This is an open access peer review report distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.