The genome sequence of the satellite, Eupsilia transversa (Hufnagel, 1766)

We present a genome assembly from an individual female Eupsilia transversa (the satellite; Arthropoda; Insecta; Lepidoptera; Noctuidae). The genome sequence is 467 megabases in span. The entire assembly (100%) is scaffolded into 32 chromosomal pseudomolecules with the W and Z sex chromosomes assembled. The complete mitochondrial genome was also assembled and is 15.5 kilobases in length. Gene annotation of this assembly on Ensembl has identified 18,065 protein coding genes.


Background
The satellite, Eupsilia transversa (Hufnagel, 1766), is a medium-sized Noctuid moth, typically with a red-brown ground colour and a white to orange, reniform stigma on each wing.Each stigma has a small, diagnostic "satellite" dot on either side of it, giving the moth its vernacular name.The species shows a large degree of colour variation throughout its range and several aberrations have been named, mainly based on ground colour and colour of the stigmata (Heath & Emmett, 1983).
The Satellite is found throughout Eurasia (Heath & Emmett, 1983); in Britain it is widespread and common throughout, and is also widespread but more localised in Ireland.They occur in one generation, emerging in late September or October and overwintering, flying on milder nights until late April (Waring & Townsend, 2017).
The larvae, which can be found between April and July in a variety of habitats, are omnivorous, feeding on a wide range of trees as shrubs at first as well as other larvae and aphids when they are larger.The larvae themselves are brown to blue-black with orange or yellow dorsal and subdorsal lines on the first and last body segments, as well as faint dorsal and subdorsal lines along the other segments.They often show white blotches and dashes along the subspiracular line.The larvae feed at night and hide in spun leaves by day, before forming a cocoon on the ground (Henwood & Sterling, 2020).
The adults can be attracted to light traps, but are more frequently encountered at 'sugar' (strong, sweet solutions painted onto tree trunks, fence posts, etc.).The satellite has been recorded feeding on ivy blossom, birch sap and sallow, and they have also been noted feeding on berries including those of Guelder-rose (Gordon, 1913;Waring & Townsend, 2017).

Genome sequence report
The genome was sequenced from a single female E. transversa collected from Wytham Woods, Berkshire, UK (Figure 1).A total of 29-fold coverage in Pacific Biosciences singlemolecule HiFi long reads and 53-fold coverage in 10X Genomics read clouds were generated.Primary assembly contigs were scaffolded with chromosome conformation Hi-C data.Manual assembly curation corrected 12 misjoins, reducing the scaffold number by 27.27%, and increasing the scaffold N50 by 3.29%.
The final assembly has a total length of 467 Mb in 32 sequence scaffolds with a scaffold N50 of 15.8 Mb (Table 1).The entire assembly sequence (100%) was assigned to 32 chromosomal-level scaffolds, representing 30 autosomes (numbered by sequence length) and the W and Z sex chromosome (Figure 2-Figure 5; Table 2).The assembly has a BUSCO v5.3.2 (Manni et al., 2021) completeness of 99.2% (single 98.8%, duplicated 0.4%) using the lepidoptera_odb10 reference set (n=5,286).While not fully phased, the assembly deposited is of one haplotype.Contigs corresponding to the second haplotype have also been deposited.

Sample acquisition and nucleic acid extraction
A single female E. transversa specimen (ilEupTran1; genome assembly, Hi-C) was collected using a light trap from Wytham Woods, Berkshire, UK (latitude 51.774, longitude -1.331) by Liam Crowley (University of Oxford).The specimen was identified by Liam Crowley and snap-frozen on dry ice.
A single E. transversa specimen (ilEupTran2; RNA-Seq) of unknown sex was collected using a light trap from Lucas Road, High Wycombe, Buckinghamshire, UK (latitude 51.63, longitude -0.74) by David Lees (Natural History Museum).
The specimen was identified by David Lees and dry frozen at -80 degrees.
DNA was extracted at the Tree of Life laboratory, Wellcome Sanger Institute.The ilEupTran1 sample was weighed and dissected on dry ice with tissue set aside for Hi-C sequencing.Thorax and abdomen tissue was cryogenically disrupted to a fine powder using a High molecular weight (HMW) DNA was extracted using the Qiagen MagAttract HMW DNA  cryoPREP Automated Dry Pulveriser, receiving multiple Fragment size analysis of 0.01-0.5 ng of DNA was then performed using an Agilent FemtoPulse.
RNA was extracted from thorax tissue of ilEupTran2 in the Tree of Life Laboratory at the WSI using TRIzol, according to the manufacturer's instructions.RNA was then eluted in 50 μl RNAse-free water and its concentration RNA assessed using a Nanodrop spectrophotometer and Qubit Fluorometer using the Qubit RNA Broad-Range (BR) Assay kit.Analysis of the integrity of the RNA was done using Agilent RNA 6000 Pico Kit and Eukaryotic Total RNA assay.Sequencing Pacific Biosciences HiFi circular consensus 10X Genomics Chromium read cloud sequencing libraries were constructed according to the manufacturers' instructions.Sequencing was performed by the Scientific Operations core at the Wellcome Sanger Institute on Pacific Biosciences SEQUEL II (HiFi), Illumina NovaSeq 6000 (10X) and Illumina HiSeq 4000 (RNA-Seq) instruments.Hi-C data were generated in the Tree of Life laboratory from head tissue of ilEupTran1 using the Arima v2 kit and sequenced on a NovaSeq 6000 instrument.
The mitochondrial genome was assembled using MitoHiFi (Uliano- Silva et al., 2021), which performs annotation using MitoFinder (Allio et al., 2020).The genome was analysed and BUSCO scores generated within the BlobToolKit environment (Challis et al., 2020).Table 3 contains a list of all software tool versions used, where appropriate.

Genome annotation
The Ensembl gene annotation system (Aken et al., 2016)   I understand there is no discussion in these reports, but a section outlining how this genome compares to others of related species would be useful.

Is the rationale for creating the dataset(s) clearly described? No
Are the protocols appropriate and is the work technically sound?Yes

Are sufficient details of methods and materials provided to allow replication by others? Yes
Are the datasets clearly presented in a useable and accessible format?

Figure 1 .
Figure 1.Image of the female Eupsilia transversa specimen (ilEupTran1) taken prior to preservation and processing.

Figure 2 .
Figure 2. Genome assembly of Eupsilia transversa, ilEupTran1.1:metrics.The BlobToolKit Snailplot shows N50 metrics and BUSCO gene completeness.The main plot is divided into 1,000 size-ordered bins around the circumference with each bin representing 0.1% of the 466,922,763 bp assembly.The distribution of chromosome lengths is shown in dark grey with the plot radius scaled to the longest chromosome present in the assembly (20,783,224 bp, shown in red).Orange and pale-orange arcs show the N50 and N90 chromosome lengths (15,778,023 and 10,879,162 bp), respectively.The pale grey spiral shows the cumulative chromosome count on a log scale with white scale lines showing successive orders of magnitude.The blue and pale-blue area around the outside of the plot shows the distribution of GC, AT and N percentages in the same bins as the inner plot.A summary of complete, fragmented, duplicated and missing BUSCO genes in the lepidoptera_odb10 set is shown in the top right.An interactive version of this figure is available at https://blobtoolkit.genomehubs.org/view/ilEupTran1.1/dataset/ilEupTran1_1.1/snail.

Table 2 . Chromosomal pseudomolecules in the genome assembly of Eupsilia transversa, ilEupTran1.1. INSDC accession Chromosome Size (Mb) GC%
was used to generate annotation for the Eupsilia transversa assembly (GCA_914767815.1).Annotation was created primarily through alignment of transcriptomic data to the genome, with gap filling via protein-to-genome alignments of a select set of proteins from UniProt (UniProt Consortium, 2019).Darwin Tree of Life Partner.The submission of materials by a Darwin Tree of Life Partner is subject to the Darwin Tree of Life Project Sampling Code of Practice.By agreeing with and signing up to the Sampling Code of Practice, the Darwin Tree of Life Partner agrees they will meet the legal and ethical requirements and standards set out within this document in respect of all samples acquired for, and supplied to, the Darwin Tree of Life Project.Each transfer of samples is further undertaken according to a Research Collaboration Agreement or Material Transfer Agreement entered into by the Darwin Tree of Life Partner, Genome Research Limited (operating as the Wellcome Sanger Institute), and in some circumstances other Darwin Tree of Life collaborators.