The genome sequence of the Dark Crimson Underwing moth, Catocala sponsa Linnaeus, 1767

We present a genome assembly from an individual female Catocala sponsa (the Dark Crimson Underwing; Arthropoda; Insecta; Lepidoptera; Erebidae). The genome sequence spans 803.70 megabases. Most of the assembly is scaffolded into 32 chromosomal pseudomolecules, including the Z and W sex chromosomes. The mitochondrial genome has also been assembled and is 15.57 kilobases in length. Gene annotation of this assembly on Ensembl identified 13,493 protein-coding genes.


Background
Catocala sponsa, Dark Crimson Underwing, is a rather large, cryptically coloured moth, except when it shows its bright red hind wing.Previously considered a very localised speciality of the New Forest, with cycles of abundance and rarity (South, 1907), this is a species which seems to be increasing in range and increasingly arriving in Britain as a migrant (Randle et al., 2019).In Kent, C. sponsa has been found breeding in oak woodland since 2019 (Perry, no date).It was still a lovely surprise when GRB found one, a presumed immigrant, in his Kent garden.Despite its large size, care is needed when identifying C. sponsa, to differentiate it in particular from the similar C. promissa (Denis & Schiffermüller), Light Crimson Underwing.In C. sponsa the fore wing has a contrastingly paler patch against a more uniformly dark background and the hind wing has a sharply zigzagged black line within the red area (see Waring et al., 2017).
The larvae of C. sponsa are wonderfully camouflaged as oak twigs, feeding on Quercus robur, Pedunculate Oak, from April to June (Henwood et al., 2020).They are specialised oak feeders, adapted to cope with tannins (Roslin & Salminen, 2008).Adults are on the wing mainly in July and August in Britain and the eggs over-winter.Ranging widely across Europe and into Central Asia (GBIF Secretariat, 2024), C. sponsa seems to be increasing in the northern edge of its range, such as in Britain and in Sweden (e.g., Franzén, 2004).
The species name 'sponsa', from the Latin for 'promised in marriage' is one of a series of playful names which Linnaeus (1767) used for the red and blue 'underwings', which became the genus Catocala; Emmet (Emmet, 1991) speculates on whether Linnaeus was referencing the flash of colour of otherwise hidden bridal underwear.
Here we present a chromosomally complete genome sequence for Catocala sponsa, based on one female specimen from Kent, England.

Genome sequence report
The genome of an adult female Catocala sponsa (Figure 1) was sequenced using Pacific Biosciences single-molecule HiFi long reads, generating a total of 61.86 Gb (gigabases) from 6.59 million reads, providing approximately 76-fold coverage.Primary assembly contigs were scaffolded with chromosome conformation Hi-C data, which produced 90.89 Gbp from 601.93 million reads, yielding an approximate coverage of 113-fold.Specimen and sequencing information is summarised in Table 1.
Manual assembly curation corrected 7 missing joins or mis-joins, reducing the scaffold number by 3.33%.The final assembly has a total length of 803.70 Mb in 57 sequence scaffolds with a scaffold N50 of 27.1 Mb (Table 2).The total count of gaps in the scaffolds is 64.The snail plot in Figure 2 provides a summary of the assembly statistics, while the distribution of assembly scaffolds on GC proportion and coverage is shown in Figure 3.The cumulative assembly plot in Figure 4 shows curves for subsets of scaffolds assigned to different phyla.Most (99.89%) of the assembly sequence was assigned to 32 chromosomal-level scaffolds, representing 30 autosomes and the Z and W sex chromosomes.Chromosomescale scaffolds confirmed by the Hi-C data are named in order of size (Figure 5; Table 3).Chromosome Z was identified by synteny to Catocala fraxini (GCA_930367265.1).Chromosome W was assigned by read coverage statistics.While not fully phased, the assembly deposited is of one haplotype.Contigs corresponding to the second haplotype have also been deposited.The mitochondrial genome was also assembled and can be found as a contig within the multifasta file of the genome submission.

Genome annotation report
The Catocala sponsa genome assembly (GCA_963564715.1)was annotated at the European Bioinformatics Institute (EBI) on Ensembl Rapid Release.The resulting annotation includes 25,901 transcribed mRNAs from 13,493 protein-coding and

Sample acquisition
An adult female Catocala sponsa (specimen ID NHMUK010884569, ToLID ilCatSpon1) was collected from Tonbridge, Kent, England, UK (latitude 51.19, longitude 0.29) on 2022-07-30, using actinic light.The specimen was collected and identified by Gavin Broad (Natural History Museum) and preserved by dry freezing at -80 °C.
The initial identification was verified by an additional DNA barcoding process according to the framework developed by Twyford et al. (2024).A small sample was dissected from the specimens and stored in ethanol, while the remaining parts of the specimen were shipped on dry ice to the Wellcome Sanger Institute (WSI).The tissue was lysed, the COI marker region was amplified by PCR, and amplicons were sequenced and compared to the BOLD database, confirming the species identification (Crowley et al., 2023).Following whole genome sequence generation, the relevant DNA barcode region is also used alongside the initial barcoding data for sample tracking at the WSI (Twyford et al., 2024).The standard operating procedures for Darwin Tree of Life barcoding have been deposited on protocols.io(Beasley et al., 2023).

Nucleic acid extraction
The workflow for high molecular weight (HMW) DNA extraction at the Wellcome Sanger Institute (WSI) Tree of Life Core Laboratory includes a sequence of core procedures: sample preparation; sample homogenisation, DNA extraction, fragmentation, and clean-up.In sample preparation, the ilCatSpon1 sample was weighed and dissected on dry ice (Jay et al., 2023).Tissue from the abdomen was homogenised using a PowerMasher II tissue disruptor (Denton et al., 2023a).HMW DNA was extracted at the WSI Scientific Operations core using the Automated MagAttract v2 protocol (Oatley et al., 2023).The DNA was sheared into an average fragment size of 12-20 kb in a Megaruptor 3 system with speed setting 31 (Bates et al., 2023).Sheared DNA was purified by solid-phase reversible immobilisation (Strickland et al., 2023): in brief, the method employs a 1.8X ratio of AMPure PB beads to sample to eliminate shorter fragments and concentrate the DNA.The concentration of the sheared and purified DNA was assessed using a Nanodrop spectrophotometer and Qubit Fluorometer using the Qubit dsDNA High Sensitivity Assay kit.Fragment size distribution was evaluated by running the sample on the FemtoPulse system.
RNA was extracted from abdomen tissue of ilCatSpon1 in the Tree of Life Laboratory at the WSI using the RNA Extraction: Automated MagMax™ mirVana protocol (do Amaral et al., 2023).The RNA concentration was assessed using a Nanodrop spectrophotometer and a Qubit Fluorometer using

Sequencing
Pacific Biosciences HiFi circular consensus DNA sequencing libraries were constructed according to the manufacturers' instructions.Poly(A) RNA-Seq libraries were constructed using the NEB Ultra II RNA Library Prep kit.DNA and RNA sequencing was performed by the Scientific Operations core at the WSI on Pacific Biosciences Revio (HiFi) and Illumina NovaSeq X (RNA-Seq) instruments.Hi-C data were also generated from abdomen tissue of ilCatSpon1 using the Arima-HiC v2 kit.The Hi-C sequencing was performed using paired-end sequencing with a read length of 150 bp on the Illumina NovaSeq 6000 instrument.

Assembly
The original assembly of HiFi reads was performed using Hifiasm (Cheng et al., 2021) with the --primary option.The mitochondrial genome was assembled using MitoHiFi (Uliano- Silva et al., 2023), which runs MitoFinder (Allio et al., 2020) or MITOS (Bernt et al., 2013) and uses these annotations to select the final mitochondrial contig and to ensure the general quality of the sequence.

Assembly curation
The assembly was decontaminated using the Assembly Screen for Cobionts and Contaminants

Genome annotation
The Ensembl Genebuild annotation system (Aken et al., 2016) was used to generate annotation for the Catocala sponsa assembly (GCA_963564715.1) in Ensembl Rapid Release  Further, the Wellcome Sanger Institute employs a process whereby due diligence is carried out proportionate to the nature of the materials themselves, and the circumstances under which they have been/are to be collected and provided for use.
The purpose of this is to address and mitigate any potential legal and/or ethical implications of receipt and use of the materials as part of the research project, and to ensure that in doing so we align with best practice wherever possible.The overarching areas of consideration are: • Ethical review of provenance and sourcing of the material The authors have given the genus name Catocala sponsa somewhere complete form and someplace abbreviated form.The genus name may be given in full the first time, and then in abbreviated form later on, such as C. sponsa.
○ Above all, I confirm that the manuscript meets the necessary scientific standard and is suitable for indexing.

Is the rationale for creating the dataset(s) clearly described? Yes
Are the protocols appropriate and is the work technically sound?Yes

Are sufficient details of methods and materials provided to allow replication by others? Yes
Are the datasets clearly presented in a useable and accessible format?Yes In my view, this genome report is well done and ready for approval.I just have a few very minor comments: 1) The Background passage starts by mentioning the status of this species in 'New Forest', which must be somewhere in the UK.However, I personally have no notion of this place, and was initially missing more general distribution information relevant to the world-wide readership.The Background section later gives more general indication, but the author could consider starting this section with general information on the species' ecology and distribution, and only then move to the specific details relevant to the UK?
2) The Assembly section mixes past and present tense.This is again not a crucial issue, but I would recommend to change all of this passage to past tense.
3) The Assembly curation passage mentions that "The sex chromosome was identified by synteny …".I proposed to change this to "The sex chromosomeS WERE identified...", since this concerns both the W and Z, right?
Is the rationale for creating the dataset(s) clearly described?

Yes
Are the protocols appropriate and is the work technically sound?Yes Are sufficient details of methods and materials provided to allow replication by others?Yes Are the datasets clearly presented in a useable and accessible format?Yes Competing Interests: No competing interests were disclosed.
Reviewer Expertise: evolution; ecology; population genomics I confirm that I have read this submission and believe that I have an appropriate level of expertise to confirm that it is of an acceptable scientific standard.

Figure 3 .
Figure 3. Genome assembly of Catocala sponsa, ilCatSpon1.1:BlobToolKit GC-coverage plot.Sequences are coloured by phylum.Circles are sized in proportion to sequence length.Histograms show the distribution of sequence length sum along each axis.An interactive version of this figure is available at https://blobtoolkit.genomehubs.org/view/Catocala_sponsa/dataset/GCA_963564715.1/blob.

Figure 4 .
Figure 4. Genome assembly of Catocala sponsa ilCatSpon1.1:BlobToolKit cumulative sequence plot.The grey line shows cumulative length for all sequences.Coloured lines show cumulative lengths of sequences assigned to each phylum using the buscogenes taxrule.An interactive version of this figure is available at https://blobtoolkit.genomehubs.org/view/Catocala_sponsa/dataset/GCA_963564715.1/cumulative.
For the three domain-level BUSCO lineage, the pipeline aligns the BUSCO genes to the Uniprot Reference Proteomes database(Bateman et al., 2023) with DIAMOND (Buchfink et al., 2021)  blastp.The genome is also split into chunks according to the density of the BUSCO genes from the closest taxonomically lineage, and each chunk is aligned to the Uniprot Reference Proteomes database with DIAMOND blastx.Genome sequences that have no hit are then chunked with seqtk and aligned to the NT database with blastn(Altschul et al., 1990).All those outputs are combined with the blobtools suite into a blobdir for visualisation.The genome assembly and evaluation pipelines were developed using the nf-core tooling(Ewels et al., 2020), use MultiQC(Ewels et al., 2016), and make extensive use of the Conda package manager, the Bioconda initiative(Grüning et al., 2018), the Biocontainers infrastructure (da VeigaLeprevost et al., 2017), and the Docker (Merkel, 2014) and Singularity(Kurtzer et al., 2017)  containerisation solutions.

Figure 5 .
Figure 5. Genome assembly of Catocala sponsa ilCatSpon1.1:Hi-C contact map of the ilCatSpon1.1 assembly, visualised using HiGlass.Chromosomes are shown in order of size from left to right and top to bottom.An interactive version of this figure may be viewed at https://genome-note-higlass.tol.sanger.ac.uk/l/?d=LcOTwqcVQf6BFBvXAUEJ1Q.

Table 2 . Genome assembly data for Catocala sponsa, ilCatSpon1.1. Genome assembly
* BUSCO scores based on the lepidoptera_odb10 BUSCO set using version 5.4.3.C = complete [S = single copy, D = duplicated], F = fragmented, M = missing, n = number of orthologues in comparison.A full set of BUSCO scores is available at https://blobtoolkit.genomehubs.org/view/Catocala_sponsa/dataset/GCA_963564715.1/busco.theQubitRNABroad-RangeAssay kit.Analysis of the integrity of the RNA was done using the Agilent RNA 6000 Pico Kit and Eukaryotic Total RNA assay.Protocols developed by the WSI Tree of Life laboratory are publicly available on protocols.io(Dentonetal., 2023b).

Darwin Tree of Life Project Sampling Code of
Practice', which can be found in full on the Darwin Tree of Life website here.By agreeing with and signing up to the Sampling Code of Practice, the Darwin Tree of Life Partner agrees they will meet the legal and ethical requirements and standards set out within this document in respect of all samples acquired for, and supplied to, the Darwin Tree of Life Project.