The genome sequence of the Large Longhorn, Nematopogon swammerdamella (Linnaeus, 1758)

We present a genome assembly from an individual male Nematopogon swammerdamella (the Large Longhorn; Arthropoda; Insecta; Lepidoptera; Adelidae). The genome sequence is 699.5 megabases in span. Most of the assembly is scaffolded into 31 chromosomal pseudomolecules, including the Z sex chromosome. The mitochondrial genome has also been assembled and is 16.46 kilobases in length.


Background
The Longhorn moths (Adelidae family) are so named because their antennae are up to four times as long as the forewings.Nematopogon swammerdamella, the Large Longhorn, is the largest of the plain-coloured Longhorns in Britain, with a forewing length of 8 to 11 mm (wingspan 18-22 mm), and in this species the antennae are almost twice (females) and two-anda-half times (males) the length of the forewings (Sterling & Parsons, 2018).
Nematopogon swammerdamella is widespread in the western Palaearctic, extending into the eastern Palaearctic, from northern and eastern Ireland to southern Fenno-Scandinavia and the northern Mediterranean.There are relatively few records in eastern Europe and Italy.In the Atlantic Archipelago, it is common in England but less so in Scotland and Ireland (GBIF Secretariat, 2023).
The habitat of the Large Longhorn is woodland and parks with deciduous trees.This moth is univoltine, and adults fly in May and June.Adult females oviposit eggs in plant stems.The larvae feed on dead leaves and decaying plant matter, and construct a portable case (Sterling & Parsons, 2018).Older caterpillars live in a bivalved case on the ground; they hibernate twice and pupate inside the case (Flemish Entomological Society, 2023).
The genome of the Large Longhorn, Nematopogon swammerdamella, was sequenced as part of the Darwin Tree of Life Project, a collaborative effort to sequence all named eukaryotic species in the Atlantic Archipelago of Britain and Ireland.
Here we present a chromosomally complete genome sequence for Nematopogon swammerdamella, based on one male specimen from Wytham Woods, Oxfordshire, UK.

Genome sequence report
The genome was sequenced from one male Nematopogon swammerdamella (Figure 1) collected from Wytham Woods, Oxfordshire, UK (51.77,.A total of 37-fold coverage in Pacific Biosciences single-molecule HiFi long reads was generated.Primary assembly contigs were scaffolded with chromosome conformation Hi-C data.Manual assembly curation corrected 34 missing joins or mis-joins and removed 11 haplotypic duplications, reducing the assembly length by 0.81% and the scaffold number by 38.82%, and increasing the scaffold N50 by 1.54%. The final assembly has a total length of 699.5 Mb in 40 sequence scaffolds with a scaffold N50 of 24.0 Mb (Table 1).A summary of the assembly statistics is shown in Figure 2, while the distribution of assembly scaffolds on GC proportion and coverage is shown in Figure 3.The cumulative assembly plot in Figure 4 shows curves for subsets of scaffolds assigned to different phyla.Most (99.97%) of the assembly sequence was assigned to 31 chromosomal-level scaffolds, representing 30 autosomes and the Z sex chromosome.Chromosomescale scaffolds confirmed by the Hi-C data are named in order of size (Figure 5; Table 2).While not fully phased, the assembly deposited is of one haplotype.Contigs corresponding to the second haplotype have also been deposited.The mitochondrial genome was also assembled and can be found as a contig within the multifasta file of the genome submission.
Metadata for specimens, spectral estimates, sequencing runs, contaminants and pre-curation assembly statistics can be found at https://links.tol.sanger.ac.uk/species/753375.

Sample acquisition and nucleic acid extraction
Two male Nematopogon swammerdamellai were netted in Wytham Woods, Oxfordshire (biological vice-county Berkshire), UK (latitude 51.77, longitude -1.34) on 2021-05-11 by netting.The specimen was collected by Will Langdon (University of Oxford) and Cass Baumberg and identified by Will Langdon and preserved on dry ice.The specimen with ID Ox001344 (ToLID ilNemSwae1) was used for DNA sequencing, while the specimen with ID Ox001345 (ToLID ilNemSwae2) was  used for Hi-C scaffolding.The species identification was confirmed by DNA barcoding.
The ilNemSwae1 sample was prepared for DNA extraction at the Tree of Life laboratory, Wellcome Sanger Institute (WSI).
The sample was weighed and dissected on dry ice with tissue set aside for Hi-C sequencing.Tissue from the whole organism was disrupted using a Nippi Powermasher fitted with a BioMasher pestle.DNA was extracted at the Wellcome Sanger Institute (WSI) Scientific Operations core using the Qiagen MagAttract HMW DNA kit, according to the manufacturer's instructions.

Sequencing
Pacific Biosciences HiFi circular consensus DNA sequencing libraries were constructed according to the manufacturers' instructions.DNA sequencing was performed by the Scientific Operations core at the WSI on a Pacific Biosciences SEQUEL IIe (HiFi) instrument.Hi-C data were also generated from whole organism tissue of ilNemSwae2 using the Arima2 kit and sequenced on the Illumina NovaSeq 6000 instrument.

Software tool Version
they will meet the legal and ethical requirements and standards set out within this document in respect of all samples acquired for, and supplied to, the Darwin Tree of Life Project.
Further, the Wellcome Sanger Institute employs a process whereby due diligence is carried out proportionate to the nature of the materials themselves, and the circumstances under which they have been/are to be collected and provided for use.The purpose of this is to address and mitigate any potential legal and/or ethical implications of receipt and use of the materials as part of the research project, and to ensure that in doing so we align with best practice wherever possible.The overarching areas of consideration are: • Ethical review of provenance and sourcing of the material 1.For the reasons mentioned above, I think that the Introduction would benefit from a short overview of the phylogenetic context of the taxon.The references below are two recent phylogenetic studies looking at the early evolution of Lepidoptera, and thus include information about the relationships between Adelidae and other early-diverging Lepidoptera lineages.
2. I was surprised to see the relatively high number of missing BUSCOs (6.7%), compared to other assemblies with similar sequencing coverage.Without being an expert in this clade, I presume the the set of conserved Lepidoptera genes in lepidoptera_odb10 come primarily for studies in Dytrisia, which encompasses the vast majority of species (and research) in Lepidoptera.I thus wonder if some of these "missing" genes, partly indicate a assembly completeness, but also could tell us about genes that evolved in Dytrisia, and are therefore absent in Nematopogon.I'm not suggesting to include a comparative analysis here, but I think, once again given the ancestral origin of this clade, that the relative large fraction of "missing" genes is noteworthy.
3. Was MitoHiFi run with MitoFinder, MITOS, or both?This is not entirely clear in the methods?
4. The accessions for the individual chromosomes are provided in Table 2, but I'd suggest adding the accession links for the main assembly and haplotigs.
5. Pleas clarify how the Z chromosome was identified.

Jurate De Prins
Royal Belgian Institute of Natural Sciences, Brussels, Belgium It is a great pleasure to see one more successful trial to study the full genomes of European Microlepidoptera.I am very pleased that this approach gets the needed attention and support.Microlepidoptera have a very long and complex evolutionary history, many of them are narrow host-specific and are external or internal feeders of different parts of plants.Therefore, they can be dispersed very easily to all parts of the world by human activities.The long-sequence genetic information, obtained from the reliably identified Microlepidoptera species is a needed step forward and I congratulate the authors for doing so.This is the major positive point of the article, and that's why this manuscript is approved by me .I am not a molecular specialist, my background is in karyology and taxonomy, therefore, I will touch on those aspects of the article with deeper insight.
No doubt that studying the full genomes of European microlepidoptera species is a long timeconsuming and financial commitment.There is one very strong demand from the community of researchers and from future generations of entomologists that the identification of species should be correct.In this particular case, Nematopogon swammerdamella (Linnaeus, 1758) is identified correctly, however, in the majority of cases, Microlepidoptera specimens are identified not only from the external characters but from internal characteristics of male and female genitalia.Therefore, the management of the project should ensure the community that the identification of samples is correct.

Chen Wu
The New Zealand Institute of Plant and Food Research Limited, Auckland, New Zealand Summary: paper described a male Nematopogon swammerdamellari genome assembly sequenced and assembled from HiFi data.The assembly was built from typical HiFi assembler and assessed quality using typical quality qc tools.However, there are a few things required to be put in detail.1. paper described this is a male genome assembly including the Z chromosome assembled in the abstract, but no detail regarding to how they identified Z chromosome from the scaffold pool and what it looks like.I feel some sex chromosome descriptions are also required in the background session.
2. I found it is hard to reproduce with method description, such as parameters (whether default settings or with additional option alterations).
3. Some raw data information are missing, such as Kmer spectrum and long-read length distribution.what does '37-fold coverage' come from?do you have genome size estimated using some methods?4. paper only reports primary assembly, but Hifiasm is able to produce two haplotype assemblies as well.It would be better to include some descriptions and release of those, which might be useful for other researchers in future.Reviewer Expertise: bioinformatics, genomics, transcriptomics

Figure 2 .
Figure 2. Genome assembly of Nematopogon swammerdamella, ilNemSwae1.1:metrics.The BlobToolKit Snailplot shows N50 metrics and BUSCO gene completeness.The main plot is divided into 1,000 size-ordered bins around the circumference with each bin representing 0.1% of the 699,520,461 bp assembly.The distribution of scaffold lengths is shown in dark grey with the plot radius scaled to the longest scaffold present in the assembly (31,243,614 bp, shown in red).Orange and pale-orange arcs show the N50 and N90 scaffold lengths (23,985,694 and 15,279,983 bp), respectively.The pale grey spiral shows the cumulative scaffold count on a log scale with white scale lines showing successive orders of magnitude.The blue and pale-blue area around the outside of the plot shows the distribution of GC, AT and N percentages in the same bins as the inner plot.A summary of complete, fragmented, duplicated and missing BUSCO genes in the lepidoptera_odb10 set is shown in the top right.An interactive version of this figure is available at https://blobtoolkit.genomehubs.org/view/ilNemSwae1.1/dataset/CAMPPU01/snail.

Figure 5 .
Figure 5. Genome assembly of Nematopogon swammerdamella, ilNemSwae1.1:Hi-C contact map of the ilNemSwae1.1 assembly, visualised using HiGlass.Chromosomes are shown in order of size from left to right and top to bottom.An interactive version of this figure may be viewed at https://genome-note-higlass.tol.sanger.ac.uk/l/?d=bZaoAHCdQ96v1XkBk4NOgg.

References 1 .
Rota J, Twort V, Chiocchio A, Peña C, et al.: The unresolved phylogenomic tree of butterflies and moths (Lepidoptera): Assessing the potential causes and consequences.Systematic Entomology.2022; 47 (4): 531-550 Publisher Full Text 2. Liao C, Yagi S, Chen L, Chen Q, et al.: Higher-level phylogeny and evolutionary history of nonditrysians (Lepidoptera) inferred from mitochondrial genome sequences.Zoological Journal of the Linnean Society.2023; 198 (2): 476-493 Publisher Full Text Is the rationale for creating the dataset(s) clearly described?Yes Are the protocols appropriate and is the work technically sound?Yes Are sufficient details of methods and materials provided to allow replication by others?Partly Are the datasets clearly presented in a useable and accessible format?Yes Competing Interests: No competing interests were disclosed.Reviewer Expertise: evolution of sex-limited polymorphisms in insects I confirm that I have read this submission and believe that I have an appropriate level of expertise to confirm that it is of an acceptable scientific standard.Reviewer Report 11 May 2024 https://doi.org/10.21956/wellcomeopenres.22349.r70476© 2024 De Prins J.This is an open access peer review report distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
5. Would be good to have a synteny analysis with a closely related genomeIs the rationale for creating the dataset(s) clearly described?PartlyAre the protocols appropriate and is the work technically sound?YesAre sufficient details of methods and materials provided to allow replication by others?NoAre the datasets clearly presented in a useable and accessible format?YesCompeting Interests: No competing interests were disclosed.

Table 3
contains a list of relevant software tool versions and sources.Wellcome Sanger Institute -Legal and GovernanceThe materials that have contributed to this genome note have been supplied by a Darwin Tree of Life Partner.The submission of materials by a Darwin Tree of Life Partner is subject to the '

Darwin Tree of Life Project Sampling Code of Practice
', which can be found in full on the Darwin Tree of Life website here.By agreeing with and signing up to the Sampling Code of Practice, the Darwin Tree of Life Partner agrees

Is the rationale for creating the dataset(s) clearly described? Yes Are the protocols appropriate and is the work technically sound? Yes Are sufficient details of methods and materials provided to allow replication by others? Yes Are the datasets clearly presented in a useable and accessible format? Yes
Very happy that the Tree of Life Consortium started to pay attention to the chromosomes.I worked on Lepidoptera chromosomes many years ago and learned what a complex evolutionary history they passed until we see the present pattern.Very happy that my article published in 2001 was recently cited by the team of the Tree of Life in top quality journal Nature Ecology and Evolution see https://www.nature.com/articles/s41559-024-02329-4The sex mechanism ZZ or Z0 is important for interspecific possible hybridization, and I hope that now when full genomes are available the authors will clarify it in the future.
Reviewer Expertise: Taxonomy, systematics, biodiversity data management I confirm that I

have read this submission and believe that I have an appropriate level of expertise to confirm that it is of an acceptable scientific standard.
This is an open access peer review report distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.