The genome sequence of the Buff Ermine, Spilarctia lutea (Hufnagel, 1766)

We present a genome assembly from an individual female Spilarctia lutea (the Buff Ermine; Arthropoda; Insecta; Lepidoptera; Erebidae). The genome sequence is 584.8 megabases in span. Most of the assembly is scaffolded into 32 chromosomal pseudomolecules, including the assembled Z and W sex chromosomes. The mitochondrial genome has also been assembled and is 15.4 kilobases in length. Gene annotation of this assembly on Ensembl identified 18,304 protein coding genes.


Background
The Tiger moths, Footmen, Cinnabar moths and several Ermine moths comprise a taxonomic group of Lepidoptera characterised by distinctly 'hairy' larvae with long setae, an ultrasonic sound-producing organ on the thorax of the adult and, in many cases, bright colours and production of toxic chemicals.For many years these moths were placed in their own family, Arctiidae, but this is now considered a subfamily Arctiinae of the family Erebidae.The Buff Ermine Spilarctia lutea (sometimes placed in the genus Spilosoma) is a widely distributed example.Found across Europe and east into Russia and Mongolia, the species is common across England, Wales, Northern Ireland and western counties of Scotland, with scattered records from Ireland and central Scotland (GBIF Secretariat, 2022;NBN Atlas, 2022).
The forewings of the adult are a sandy buff colour, often paler in females, with a broken diagonal lines of black spots running from the apex to the middle of trailing wing edge.The extent of the black markings can vary greatly, from being almost absent to being enlarged into pronounced streaks.The darker variants of S. lutea are uncommon in the wild, but have been studied in laboratory crosses that suggest several genetic loci contribute to wing patterning (South, 1961).In all cases, the abdomen is bright yellow dorsally with black patches.The adult moths are conspicuous visually suggesting they may be aposematic.It has been noted they are unpalatable to some birds and their bodies have (relatively low) concentrations of pharmacologically-active substances (Rothschild, 1963).An intriguing hypothesis, proposed by Miriam Rothschild, is that the Buff Ermine S. lutea is a mimic of the more poisonous White Ermine moth S. lubricipeda, sitting somewhere on a spectrum between Batesian and Müllerian mimicry (Rothschild, 1963).
In the UK, adults of S. lutea are on the wing in June and July, laying eggs on the leaves of larval food plants including dandelion, dock, plantain and birch.If disturbed, the larvae drop to the ground and curl into a ring as a defence reaction, exposing their dense covering of hairs (Brooks, 1991).There is one generation per year with the pupal stage overwintering.
A genome sequence of S. lutea will be useful for comparing the molecular basis of toxin production between species and for understanding the genetic basis of wing patterning.

Genome sequence report
The genome was sequenced from a female Spilarctia lutea specimen (Figure 1) collected from Wytham Woods, UK (latitude 51.77, longitude -1.34)A total of 28-fold coverage in Pacific Biosciences single-molecule HiFi long reads and 116-fold coverage in 10X Genomics read clouds were generated.Primary assembly contigs were scaffolded with chromosome conformation Hi-C data.Manual assembly curation corrected 12 missing joins or mis-joins and removed one haplotypic duplication, reducing the scaffold number by 12.77%.
The final assembly has a total length of 584.8 Mb in 41 sequence scaffolds with a scaffold N50 of 20.2 Mb (Table 1).Most (99.98%) of the assembly sequence was assigned to 32 chromosomal-level scaffolds, representing 30 autosomes, and the Z and W sex chromosomes.Chromosome-scale scaffolds confirmed by the Hi-C data are named in order of size (Figure 2-Figure 5; Table 2).The assembly has a BUSCO v5.3.2 (Manni et al., 2021) completeness of 98.9% (single 98.1%, duplicated 0.9%) using the lepidoptera_odb10 reference set (n = 5286).While not fully phased, the assembly deposited is of one haplotype.Contigs corresponding to the second haplotype have also been deposited.

Sample acquisition and nucleic acid extraction
One female S. lutea specimen (ilSpiLutu1) was collected in Wytham Woods, Oxfordshire (biological vice-county:3 Berkshire), UK (latitude 51.77, longitude -1.34) on 22 May 2020.The specimen was caught in woodland habitat using a light trap.The specimen was collected and identified by Douglas Boyes (University of Oxford) and snap-frozen on dry ice.
DNA was extracted at the Tree of Life laboratory, Wellcome Sanger Institute (WSI).The ilSpiLutu1 sample was weighed  and dissected on dry ice with thorax tissue set aside for RNA and Hi-C sequencing.Abdomen tissue was disrupted using a Nippi Powermasher fitted with a BioMasher pestle.High molecular weight (HMW) DNA was extracted using the  and Qubit dsDNA High Sensitivity Assay kit.Fragment size distribution was evaluated by running the sample on the FemtoPulse system.
RNA was extracted from thorax tissue of ilSpiLutu1 in the Tree of Life Laboratory at the WSI using TRIzol, according to the manufacturer's instructions.RNA was then eluted in 50 μl RNAse-free water and its concentration assessed using a Nanodrop spectrophotometer and Qubit Fluorometer using the Qubit RNA Broad-Range (BR) Assay kit.Analysis of the integrity of the RNA was done using Agilent RNA 6000 Pico Kit and Eukaryotic Total RNA assay.(Ghurye et al., 2019).The assembly was checked for contamination as described previously (Howe et al., 2021).

Genome annotation
The BRAKER2 pipeline (Brůna et al., 2021) was used in the default protein mode to generate annotation for the Spilarctia lutea assembly (GCA_916048165.1) in Ensembl Rapid Release.

Ethics and compliance issues
The materials that have contributed to this genome note have been supplied by a Darwin   The BUSCO pipeline indicated a high completeness of the genome with 99.98% of complete genes found, additional descriptive analyses are indicated in Table 1.Genome annotation predicted 18,304 protein-coding genes.Standard bioinformatic pipelines were performed and software versions are indicated in Table 3.This high quality assembly is a great contribution that will allow uncovering a wide range of questions related to Lepidoptera genomics and evolution.
My only concern to mention would be that they sequenced RNA-Seq data but didn't include it when using the BRAKER pipeline to improve the certainty of the prediction, or at least it is not mentioned in the methods.

Zhaofu Yang
Northwest A&F University, Yangling, Shaanxi, China This paper describes the genome assembly of Spilarctia lutea and provides important information about its size (~584 Mb) and scaffolding (32 chromosomes).The mitochondrial genome has also been assembled.The methods for the sequencing and assembly used are appropriate and technically sound.This assembly provides valuable genomic resource for further investigations in the evolutionary pattern of Lepidoptera.
The distributional range of Spilarctia lutea should be rectified as it has been reported in Northeast China.The generic placement of this species is likely confusing because it subordinate to the genus Spilosoma Butler 1875 in LepIndex (https://www.nhm.ac.uk/ourscience/data/lepindex/search/).I suggest authors consult some folks working on this group to confirm its status.
Is the rationale for creating the dataset(s) clearly described?Yes Are the protocols appropriate and is the work technically sound?Yes

Are sufficient details of methods and materials provided to allow replication by others? Yes
Are the datasets clearly presented in a useable and accessible format?Yes Competing Interests: No competing interests were disclosed.
Reviewer Expertise: Systematics, DNA barcoding, Lepidoptera, Taxonomy I confirm that I have read this submission and believe that I have an appropriate level of expertise to confirm that it is of an acceptable scientific standard.

Figure 2 .
Figure 2. Genome assembly of Spilarctia lutea, ilSpiLutu1.1:metrics.The BlobToolKit Snailplot shows N50 metrics and BUSCO gene completeness.The main plot is divided into 1,000 size-ordered bins around the circumference with each bin representing 0.1% of the 584,787,352 bp assembly.The distribution of scaffold lengths is shown in dark grey with the plot radius scaled to the longest scaffold present in the assembly (28,626,442 bp, shown in red).Orange and pale-orange arcs show the N50 and N90 scaffold lengths(20,170,318 and 13,734,192 bp), respectively.The pale grey spiral shows the cumulative scaffold count on a log scale with white scale lines showing successive orders of magnitude.The blue and pale-blue area around the outside of the plot shows the distribution of GC, AT and N percentages in the same bins as the inner plot.A summary of complete, fragmented, duplicated and missing BUSCO genes in the lepidoptera_odb10 set is shown in the top right.An interactive version of this figure is available at https://blobtoolkit.genomehubs.org/view/ilSpiLutu1.1/dataset/CAJZHC01.1/snail.

Figure 5 .
Figure 5. Genome assembly of Spilarctia lutea, ilSpiLutu1.1:Hi-C contact map.Hi-C contact map of the ilSpiLutu1.1 assembly, visualised using HiGlass.Chromosomes are shown in order of size from left to right and top to bottom.An interactive version of this figure may be viewed at https://genome-note-higlass.tol.sanger.ac.uk/l/?d=GwgyGmFqQMGAPGyNYPeTXw.
The paper presents a genome assembly from a individual female of Spilarctia lutea.The study was well done, and the results were clearly explained.The figures and tables are very clear, the images provided are resolutive and the captions are extremely informative.Downloadable files are available in the repository.The techniques used were sufficient for that proposed by the authors.I recommend the indexing of the data paper.Is the rationale for creating the dataset(s) clearly described?YesAre the protocols appropriate and is the work technically sound?YesAre sufficient details of methods and materials provided to allow replication by others?YesAre the datasets clearly presented in a useable and accessible format?YesCompeting Interests: No competing interests were disclosed.Reviewer Expertise: Animal genetics I confirm that I have read this submission and believe that I have an appropriate level of expertise to confirm that it is of an acceptable scientific standard.Reviewer Report 22 August 2023 https://doi.org/10.21956/wellcomeopenres.21138.r64914© 2023 Escuer P.This is an open access peer review report distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.Paula Escuer University of Neuchâtel, Neuchâtel, Canton of Neuchâtel, Switzerland The authors presented the chromosome level genome sequence of a female individual of the Buff Ermine (Spilarctia lutea).The final assembly shows a total length of 584.8 Mb in 41 scaffolds with a scaffold N50 of 20.2 Mb.It is assembled with 28-fold coverage of HiFi long reads and 116-fold coverage of 10X Genomics read clouds and chromosome conformation Hi-C data.Mitochondrial assembly is also sequenced and available.The 99.98% of the assembly sequence is distributed in 32 chromosomal-level scaffolds, representing 30 autosomes plus the Z and W sex chromosomes.

Table 2 . Chromosomal pseudomolecules in the genome assembly of Spilarctia lutea, ilSpiLutu1.
Tree of Life Partner.The submission of materials by a Darwin Tree of Life Partner is subject to the Darwin Tree of Life Project Sampling Code of Practice.By agreeing with and signing up to the Sampling Code of Practice, the Darwin Tree of Life Partner agrees they will meet the legal and ethical requirements and standards set out within this document in respect of all samples acquired for, and supplied to, the Darwin Tree of Life Project.All efforts are undertaken to minimise the suffering of animals used for sequencing.Each transfer of samples is further undertaken according to a Research Collaboration Agreement or Material Transfer Agreement entered into by the Darwin Tree of Life Partner, Genome Research Limited (operating as the Wellcome Sanger Institute), and in some circumstances other Darwin Tree of Life collaborators.Members of the Darwin Tree of Life Barcoding collective are listed here: https://doi.org/10.5281/zenodo.4893703.Members of the Tree of Life Core Informatics collective are listed here: https://doi.org/10.5281/zenodo.5013541.Members of the Darwin Tree of Life Consortium are listed here: https://doi.org/10.5281/zenodo.4783558.

Is the rationale for creating the dataset(s) clearly described? Yes Are the protocols appropriate and is the work technically sound? Yes Are sufficient details of methods and materials provided to allow replication by others? Yes Are the datasets clearly presented in a useable and accessible format? Yes Competing Interests:
No competing interests were disclosed.

have read this submission and believe that I have an appropriate level of expertise to confirm that it is of an acceptable scientific standard.
This is an open access peer review report distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.