The genome sequence of the iron prominent, Notodonta dromedarius (Linnaeus, 1767)

We present a genome assembly from an individual male Notodonta dromedarius (iron prominent; Arthropoda; Insecta; Lepidoptera; Notodontidae). The genome sequence is 342 megabases in span. The majority of the assembly, 99.35%, is scaffolded into 31 chromosomal pseudomolecules, with the Z sex chromosome assembled.


Background
Notodonta dromedarius (iron prominent) has rust-coloured wing markings that give the moth its common name.The species is widely distributed across Europe and is common throughout the UK; however, abundance has greatly decreased at monitored sites over the past 50 years (Randle et al., 2019).There are two broods of N. dromedarius in the south of England flying in May/June and August, but usually a single brood in the north of England and in Scotland (Randle et al., 2019).The moth was one of the first members of the Notodontidae to have the sex pheromone chemical identified (Bestmann et al., 1993).The genome of N. dromedarius was sequenced as part of the Darwin Tree of Life Project, a collaborative effort to sequence all of the named eukaryotic species in the Atlantic Archipelago of Britain and Ireland.Here we present a chromosomally complete genome sequence for N. dromedarius, based on one male specimen from Wytham Woods, Oxfordshire, UK.

Genome sequence report
The genome was sequenced from a single male N. dromedarius (Figure 1) collected from Wytham Woods, Oxfordshire, UK (latitude 51.772, longitude -1.338).A total of 77-fold coverage in Pacific Biosciences single-molecule long reads (N50 13 kb) and 112-fold coverage in 10X Genomics read clouds were generated.Primary assembly contigs were scaffolded with chromosome conformation Hi-C data.Manual assembly curation corrected 6 missing/misjoins and removed 56 haplotypic duplications, reducing the assembly length by 0.83% and the scaffold number by 28.57%, and increasing the scaffold N50 by 3.08%.
The final assembly has a total length of 342 Mb in 145 sequence scaffolds with a scaffold N50 of 12.1 Mb (Table 1).Of the assembly sequence, 99.35% was assigned to 31 chromosomal-level scaffolds, representing 30 autosomes (numbered by sequence length), and the Z sex chromosome (Figure 2-Figure 5; Table 2).The assembly has a BUSCO v5.1.2(Simão et al., 2015) completeness of 98.9% (single 98.6%, duplicated 0.3%) using the lepidoptera_odb10 reference set.While not fully phased, the assembly deposited is of one haplotype.Contigs corresponding to the second haplotype have also been deposited.

Sample acquisition and nucleic acid extraction
A single male N. dromedarius (ilNotDrom1) was collected from Wytham Woods, Oxfordshire, UK (latitude 51.772, longitude -1.338) by Douglas Boyes, UKCEH, using a light trap.The specimen was identified by the same individual and preserved on dry ice.
DNA was extracted from head/thorax tissue at the Wellcome Sanger Institute (WSI) Scientific Operations core from the whole organism using the Qiagen MagAttract HMW DNA kit, according to the manufacturer's instructions.RNA was extracted (also from head/thorax tissue) in the Tree of Life Laboratory at the WSI using TRIzol (Invitrogen), according to the manufacturer's instructions.RNA was then eluted in 50 μl RNAse-free water and its concentration RNA assessed using a Nanodrop spectrophotometer and Qubit Fluorometer using the Qubit RNA Broad-Range (BR) Assay kit.Analysis  of the integrity of the RNA was done using Agilent RNA 6000 Pico Kit and Eukaryotic Total RNA assay.

Sequencing
Pacific Biosciences HiFi circular consensus and 10X Genomics read cloud DNA sequencing libraries, in addition to PolyA RNA-Seq libraries, were constructed according to the manufacturers' instructions.DNA and RNA sequencing was performed by the Scientific Operations core at the WSI on Pacific Biosciences SEQUEL II (HiFi), Illumina HiSeq X (10X) and Illumina HiSeq 4000 (RNA-Seq) instruments.Hi-C data were generated from abdomen tissue of the same specimen using the Arima v1 Hi-C kit and sequenced on HiSeq X.

Genome assembly
Assembly was carried out with HiCanu (Nurk et al., 2020); haplotypic duplication was identified and removed with purge_dups (Guan et al., 2020).One round of polishing was performed by aligning 10X Genomics read data to the assembly with longranger align, calling variants with freebayes (Garrison & Marth, 2012).The assembly was then scaffolded with Hi-C data (Rao et al., 2014) using SALSA2 (Ghurye et al., 2019).The  MitoFinder (Allio et al., 2020).The genome was analysed and BUSCO scores generated within the BlobToolKit environment (Challis et al., 2020).Table 3 contains a list of all software tool versions used, where appropriate.

Ethics/compliance issues
The materials that have contributed to this genome note have been supplied by a Darwin Tree of Life Partner.The submission of materials by a Darwin Tree of Life Partner is subject to

Background section:
It would be good to provide some information on the ecology of the species.For example, in which habitat can it be found, what is the host plant of the caterpillar, are adults pollinators? Methods: There is an incongruence in this sentence: "DNA was extracted from head/thorax tissue at the Wellcome Sanger Institute (WSI) Scientific Operations core from the whole organism".Was DNA extracted from the whole organism or just the head and thorax?
Hi-C sequencing yield (number of reads) should be mentioned.Ideally also the percentage of read pairs mapping to a different contig prior to scaffolding.Some steps in the genome assembly could be better explained, for example, how were the 31 chromosomes inferred from the 146 scaffolds?Was it just a matter of size?How was the Z chromosome identified?Simply using depth of coverage from the PacBio reads mapped back to the assembly?
Most importantly, there is no mention of genome annotation (except a small note in the Data availability statement), while RNA sequencing is well described, and annotation is reported online as "complete".An entire section about genome annotation should be added, including methods used, number of protein-coding genes annotated, etc.

Luc Swevers
National Center for Scientific Research Demokritos, Athens, Greece More background information can be presented (ecology, habitat).Although very few articles can be found of the presented species, much more information exists with respect to the family of the Notodontidae and their characteristics (e.g. defense substances, coloration).
In Table 1, the author of the species name can be mentioned.
It can be explicitly stated that "iron prominent" is the common name.
The system of sex chomosomes can be mentioned.
Sequencing results are of high quality but can be explained better for the non-specialist.The purpose of RNA-seq can be explained.
In Figure 2, up to 31, each scaffold represents a chromosome (if this reviewer understands correctly).It can be mentioned that the sex chromosome represents the second largest scaffold.

Is the rationale for creating the dataset(s) clearly described? Partly
Are the protocols appropriate and is the work technically sound?Yes

Are sufficient details of methods and materials provided to allow replication by others? Yes
Are the datasets clearly presented in a useable and accessible format?Yes Competing Interests: No competing interests were disclosed.
Reviewer Expertise: Insect molecular biology and biotechnology I confirm that I have read this submission and believe that I have an appropriate level of expertise to confirm that it is of an acceptable scientific standard, however I have significant reservations, as outlined above.

Jason Hill
Uppsala University, Uppsala, Sweden In this study the authors describe the genome assembly and annotation of the European moth Notodonta dromedarius.The methods employed represent the current highest standard for assembly and annotation projects.This represents an additional excellent addition to the knowledge of the genomic architecture of Lepidoptera and will be useful for deeper studies of Notodonta dromedarius and for comparative genomics work in broader context.

Is the rationale for creating the dataset(s) clearly described? Yes
Are the protocols appropriate and is the work technically sound?Yes

Are sufficient details of methods and materials provided to allow replication by others? Yes
Are the datasets clearly presented in a useable and accessible format?Yes

Figure 1 .
Figure 1.Image of the ilNotDrom1 specimen taken prior to preservation and processing.Specimen shown next to FluidX storage tube, 43.9 mm in length.

Figure 2 .
Figure 2. Genome assembly of Notodonta dromedarius, ilNotDrom1.1:metrics.The BlobToolKit Snailplot shows N50 metrics and BUSCO gene completeness.The main plot is divided into 1,000 size-ordered bins around the circumference with each bin representing 0.1% of the 341,992,784 bp assembly.The distribution of chromosome lengths is shown in dark grey with the plot radius scaled to the longest chromosome present in the assembly (14,515,539 bp, shown in red).Orange and pale-orange arcs show the N50 and N90 chromosome lengths (12,059,830 and 8,218,830 bp), respectively.The pale grey spiral shows the cumulative chromosome count on a log scale with white scale lines showing successive orders of magnitude.The blue and pale-blue area around the outside of the plot shows the distribution of GC, AT and N percentages in the same bins as the inner plot.A summary of complete, fragmented, duplicated and missing BUSCO genes in the lepidoptera_odb10 set is shown in the top right.An interactive version of this figure is available at https://blobtoolkit.genomehubs.org/view/ilNotDrom1.1/dataset/CAJHVG01/snail.

Figure 5 .
Figure 5. Genome assembly of Notodonta dromedarius, ilNotDrom1.1:Hi-C contact map.Hi-C contact map of the ilNotDrom1.1 assembly, visualised in HiGlass.Chromosomes are shown in order of size from left to right and top to bottom.

Reviewer
Report 22 May 2024 https://doi.org/10.21956/wellcomeopenres.19339.r83090© 2024 Hill J.This is an open access peer review report distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Table 3 . Software tools used.
within this document in respect of all samples acquired for, and supplied to, the Darwin Tree of Life Project.Each transfer of samples is further undertaken according to a Research Collaboration Agreement or Material Transfer Agreement entered into by the Darwin Tree of Life Partner, Genome Research Limited (operating as the Wellcome Sanger Institute), and in some circumstances other Darwin Tree of Life collaborators. out

Is the rationale for creating the dataset(s) clearly described? Partly Are the protocols appropriate and is the work technically sound? Yes Are sufficient details of methods and materials provided to allow replication by others? Partly Are the datasets clearly presented in a useable and accessible format? Yes Competing Interests:
No competing interests were disclosed.

have read this submission and believe that I have an appropriate level of expertise to confirm that it is of an acceptable scientific standard, however I have significant reservations, as outlined above.
This is an open access peer review report distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.