The genome sequence of the Pied Smudge, Ypsolopha sequella (Clerck, 1759)

We present a genome assembly from an individual Ypsolopha sequella (the Pied Smudge; Arthropoda; Insecta; Lepidoptera; Ypsolophidae). The genome sequence is 867 megabases in span. Most of the assembly is scaffolded into 30 chromosomal pseudomolecules with the Z sex chromosome assembled. The mitochondrial genome has also been assembled and is 15.3 kilobases in length. Gene annotation of this assembly on Ensembl identified 20,394 protein coding genes.


Background
Ypsolopha sequella (Clerck, 1759) is a micro moth of the family Yponomeutidae and the genus Ypsolopha.Adults in this group have relatively elongated forewings that are held close to the body at rest, which is often in a declining position.Of the 13 species in the genus known in Great Britain and Ireland (Agassiz et al., 2013), the adult Y. sequella is distinctive for the strongly contrasting black and white colouring in most specimens.Some have a suffusion of black scales in among the white ground colour, and can be quite dark, but the black markings along the dorsum remain visible in most specimens, including a rabbit-shaped blotch below the thorax, for which this moth is often affectionately known as the 'bunny moth' (Wheeler, n.d.), although (Porter, 2002) gives it the vernacular 'Pied Smudge'.
In the UK, adults are on the wing between July and October (Sterling & Parsons, 2018), with a peak in sightings in August (Wheeler, no date).Adults are mostly nocturnal and are attracted to light, though rarely in numbers (Sterling & Parsons, 2012).They can also occasionally be found around the larval foodplants by day, resting high up on the leaves (Langmaid et al., 2018), or dislodged lower down on trunks (Agassiz, 1996).These foodplants are primarily Field maple (Acer campestre) but also Sycamore (Acer psudoplantus).While some Ypsolopha species overwinter as adults, Y. sequella hibernates as an egg on the twigs of these trees (Agassiz, 1996).The greenish larva, which has the characteristic spindle shape of the genus, can then be found in a flimsy spinning on the leaves in May and June (Agassiz, 1996;Langmaid et al., 2018).
Both A. campestre and A. pseudoplantus are frequently grown ornamentally, and the moth can therefore be found where they occur in woodland and where they have been planted in suburban areas (Agassiz, 1996).This planting may have aided the recent expansion of the moth, as it is thought to have for other species feeding on the same plants, such as Maple Prominent (Ptilodon cucullina) (Randle et al., 2019;Waring et al., 2017).Indeed, Y. sequella is one of several moth species expanding rapidly northwards into Scotland (Emmet et al., no date).Having first been recorded in the country in 1975and then again in 1997(Bland, 1998)), it is now resident in a few southern counties (Knowler, 2012).Field Maple is scarcer in Scotland than in southern England, and here Y. sequella may depend on the more widespread Sycamore (Sterling & Parsons, 2018).Away from Scotland, the moth is common in southern England (Davis, 2012) and found throughout northern and central Europe into the Middle East (Agassiz, 1996).
The genome of Y. sequella was sequenced as part of the Darwin Tree of Life Project, a collaborative effort to sequence all named eukaryotic species in the Atlantic Archipelago of Britain and Ireland.Here we present a chromosomally complete genome sequence for Ypsolopha sequella, based on two specimens from Wytham Woods, Oxfordshire, UK.

Genome sequence report
The genome was sequenced from one male Y. sequella (Figure 1) collected from Wytham, Oxfordshire (biological vice-county: Berkshire), UK (latitude 51.77, longitude -1.33).A total of 27-fold coverage in Pacific Biosciences singlemolecule HiFi long reads was generated.Primary assembly contigs were scaffolded with chromosome conformation Hi-C data.Manual assembly curation corrected 39 missing or misjoins and removed seven haplotypic duplications, reducing the assembly length by 0.53% and the scaffold number by 6.83%.
The final assembly has a total length of 866.9 Mb in 150 sequence scaffolds with a scaffold N50 of 28.9 Mb (Table 1).Most (96.87%) of the assembly sequence was assigned to 30 chromosomal-level scaffolds, representing 29 autosomes and the Z sex chromosome (Figure 2-Figure 5; Table 2).Chromosome-scale scaffolds confirmed by the Hi-C data are named in order of size.The assembly has a BUSCO v5.3.2 (Manni et al., 2021) completeness of 98% (single 96.9%, duplicated 1.2%) using the OrthoDB v10 lepidoptera reference set.While not fully phased, the assembly deposited is of one haplotype.Contigs corresponding to the second haplotype have also been deposited.

20,394
* Assembly metric benchmarks are adapted from column VGP-2020 of "Table 1: Proposed standards and metrics for defining genome assembly quality" from (Rhie et al. 2021).
** BUSCO scores based on the lepidoptera_odb10 BUSCO set using v5.

Sample acquisition and nucleic acid extraction
Two Y. sequella specimens (ilYpsSequ1 and ilYpsSequ2) were collected from the main track in Wytham, Oxfordshire (biological vice-county: Berkshire), UK (latitude 51.77, longitude -1.33) by Douglas Boyes, using a light trap.The specimens were identified by Douglas Boyes and snap-frozen on dry ice.
DNA was extracted at the Tree of Life laboratory, Wellcome Sanger Institute (WSI).The ilYpsSequ1 and ilYpsSequ2    Hi-C data were also generated from ilYpsSequ1 using the Arima v2 kit and sequenced on the HiSeq X Ten instrument.

Genome assembly
Assembly was carried out with Hifiasm (Cheng et al., 2021) and haplotypic duplication was identified and removed with purge_dups (Guan et al., 2020) et al., 2020).The genome was analysed and BUSCO scores generated within the BlobToolKit environment (Challis et al., 2020).Table 3 contains a list of all software tool versions used, where appropriate.

Genome annotation
The BRAKER2 pipeline (Brůna et al., 2021) was used in the default protein mode to generate annotation for the Ypsolopha sequella assembly (GCA_934047225.1) in Ensembl Rapid Release.

Ethics/compliance issues
The

Michal Rindoš
Czech University of Life Sciences Prague, Prague, Czech Republic The species description contains the basic information on the distribution and biology of the sequenced species.The description of the methods used to create this genome is practically the same as in the other species descriptions.The quality of the genome will most likely be excellent, as has been the case with other genomes.
Is the rationale for creating the dataset(s) clearly described?

Guang Yang
Fujian Agriculture and Forestry University, Fuzhou, Fujian, China In the present article, the author provided the genome assembly of Ypsolopha sequella (the Pied Smudge; Arthropoda, Insecta, Lepidoptera, Ypsolophidae).The total genome size is 867 megabases with 30 chromosomes.Overall, the article is well-written and provides valuable information that could be further used.I just have a few minor suggestions.
It would be better to provide some more interpretation for the genome annotation either in terms of table(s) or figure(s). 1.
It would be great if the author could provide the evolutionary relationship of Ypsolopha sequella with other lepidopteran species. 2.
In the Background, please provide details on why the Ypsolopha sequella genome sequencing is necessary.

3.
Please be consistent in using species names in the text, better to use abbreviated names such as Y. sequella.

4.
Figures are blurry, better to use high-resolution pictures.5.

Are the datasets clearly presented in a useable and accessible format? Yes
Competing Interests: No competing interests were disclosed.
Reviewer Expertise: insect molecular biology I confirm that I have read this submission and believe that I have an appropriate level of expertise to confirm that it is of an acceptable scientific standard.

Kay Lucek
University of Basel, Basel, Switzerland The authors present the chromosome level genome assembly of the micro moth Ypsolopha sequella.The assembly consists of 30 chromosomes including the Z chromosome, suggesting that a male individual was sequenced.Interestingly, a different individual was used to generate the Hi-C and 10X Chromium data.My only question about the presented note concerns this second individual -was it also a male and did the use of a different individual interfere with the assembly?
Besides the W chromosome, the assembly is highly complete as suggested by the high BUSCO score.Overall, the presented assembly will be of great value to study genome as well as genome size evolution in Lepidoptera.

Is the rationale for creating the dataset(s) clearly described? Yes
Are the protocols appropriate and is the work technically sound?Yes

Are sufficient details of methods and materials provided to allow replication by others? Yes
Are the datasets clearly presented in a useable and accessible format?Yes Competing Interests: No competing interests were disclosed.
Reviewer Expertise: Speciation genomics I confirm that I have read this submission and believe that I have an appropriate level of expertise to confirm that it is of an acceptable scientific standard.

Figure 2 .
Figure 2. Genome assembly of Ypsolopha sequella, ilYpsSequ2.1:metrics.The BlobToolKit Snailplot shows N50 metrics and BUSCO gene completeness.The main plot is divided into 1,000 size-ordered bins around the circumference with each bin representing 0.1% of the 866,879,050 bp assembly.The distribution of sequence lengths is shown in dark grey with the plot radius scaled to the longest sequence present in the assembly (51,978,369 bp, shown in red).Orange and pale-orange arcs show the N50 and N90 sequence lengths (28,919,749 and 20,718,692 bp), respectively.The pale grey spiral shows the cumulative sequence count on a log scale with white scale lines showing successive orders of magnitude.The blue and pale-blue area around the outside of the plot shows the distribution of GC, AT and N percentages in the same bins as the inner plot.A summary of complete, fragmented, duplicated and missing BUSCO genes in the lepidoptera_odb10 set is shown in the top right.An interactive version of this figure is available at https://blobtoolkit.genomehubs.org/view/ilYpsSequ2.1/dataset/CAKOHE01/snail.

Figure 5 .
Figure 5. Genome assembly of Ypsolopha sequella, ilYpsSequ2.1:Hi-C contact map.Hi-C contact map of the ilYpsSequ2.1 assembly, visualised using HiGlass.Chromosomes are shown in order of size from left to right and top to bottom.An interactive version of this figure may be viewed at https://genome-note-higlass.tol.sanger.ac.uk/l/?d=A3UWotGzQ8WyAcqeFNsymw.

Reviewer
Report 02 June 2023 https://doi.org/10.21956/wellcomeopenres.20809.r57965© 2023 Lucek K.This is an open access peer review report distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Table 2 . Chromosomal pseudomolecules in the genome assembly of Ypsolopha sequella, ilYpsSequ2. INSDC accession Chromosome Size (Mb) GC%
materials that have contributed to this genome note have been supplied by a Darwin Tree of Life Partner.The submission of materials by a Darwin Tree of Life Partner is subject to the Darwin Tree of Life Project Sampling Code of Practice.By agreeing with and signing up to the Sampling Code of Practice, the Darwin Tree of Life Partner agrees they will meet the legal and ethical requirements and standards set out within this document in respect of all samples acquired for, and supplied to, the Darwin Tree of Life Project.Each transfer of samples is further undertaken according to a Research Collaboration Agreement or Material Transfer Agreement entered into by the Darwin Tree of Life Partner, Genome Research Limited (operating as the Wellcome Sanger Institute), and in some circumstances other Darwin Tree of Life collaborators.Members of the Tree of Life Core Informatics collective are listed here: https://doi.org/10.5281/zenodo.5013541.Members of the Darwin Tree of Life Consortium are listed here: https://doi.org/10.5281/zenodo.4783558.

Yes Are the protocols appropriate and is the work technically sound? Yes Are sufficient details of methods and materials provided to allow replication by others? Yes Are the datasets clearly presented in a useable and accessible format? Yes Competing Interests:
No competing interests were disclosed.

have read this submission and believe that I have an appropriate level of expertise to confirm that it is of an acceptable scientific standard.
This is an open access peer review report distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.