The genome sequence of the Large Skipper, Ochlodes sylvanus, (Esper, 1777)

We present a genome assembly from an individual female Ochlodes sylvanus, the Large Skipper (Arthropoda; Insecta; Lepidoptera; Hesperiidae). The genome sequence is 380 megabases in span. Most of the assembly (99.97%) is scaffolded into 30 chromosomal pseudomolecules, including the assembled W and Z sex chromosomes. The mitochondrial genome has also been assembled and is 17.1 kilobases in length. Gene annotation of this assembly on Ensembl identified 13,451 protein coding genes.


Background
The Large Skipper, Ochlodes sylvanus, is a Palearctic butterfly in the family Hesperiidae, found from western Europe to eastern Russia and Japan. The species used to also be referred to as O. venatus, which however is now generally considered a similar species occurring in East Asia (Devyatkin, 1997). In the British Isles, the Large Skipper is restricted to England and Wales and southern parts of Scotland (and absent from Ireland). The species is currently undergoing a northwards range expansion in the UK, most likely in response to climate change.
The Large Skipper is one of the most common skippers in Europe and a species of Least Concern both on the IUCN Red List of Europe (van Swaay et al., 2010) and the revised Red List for the UK (Fox et al., 2022). O. sylvanus shows a faint chequered pattern on both sides of the wings. Males can be differentiated from females by the presence of a conspicuous black sex brand on the forewing). The Large Skipper is univoltine in most of its range and overwinters as larva. In the UK, adults are active from the end of May to mid-August (Newland & Tomlinson, 2010). In the southern areas it is bivoltine and it can be found as adult from end of April to October (Vila et al., 2018). O. sylvanus is common in rough grassland around woodland edges, forest clearings and urban parks. The main habitat requirement is the presence of uncut, tall grasses. Because of this, populations are negatively affected by intensive agriculture (Newland & Tomlinson, 2010).
Caterpillars feed on a wide range of grasses (Robinson et al., 2010;Savela, 2022). They roll grass blades and use silken threads to build a shelter where they spend most of their lives, leaving only to feed. Females lay the dome-shaped, pearl-white eggs individually on the underside of grass blades.
The Large Skipper has a haploid chromosome number of 29 (Federley, 1938& Lorkovic, 1941, and its genome size has been estimated as 345 Mb using flow cytometry (Mackintosh et al., 2019). Here we present a chromosome level assembly based on a female sample from Carrifran Wildwood, Scotland.

Genome sequence report
The genome was sequenced from one female O. sylvanus ( Figure 1) collected from Carrifan Wildwood, Scotland,UK (55.40,. A total of 23-fold coverage in Pacific Biosciences single-molecule HiFi long reads and 84-fold coverage in 10X Genomics read clouds were generated. Primary assembly contigs were scaffolded with chromosome conformation Hi-C data. Manual assembly curation corrected 101 missing joins and misjoins and removed 12 haplotypic duplications, reducing the assembly length by 1.06% and the scaffold number by 64.52%, and increasing the scaffold N50 by 8.56%. The final assembly has a total length of 380 Mb in 33 sequence scaffolds with a scaffold N50 of 14 Mb (Table 1). Most (99.97%) of the assembly sequence was assigned to 30 chromosomal-level scaffolds, representing 28 autosomes and the W and Z sex chromosomes (Figure 2- Figure 5; Table 2). Chromosome-scale scaffolds confirmed by the Hi-C data are named in order of size. While not fully phased, the assembly deposited is of one haplotype. Contigs corresponding to the second haplotype have also been deposited. The assembly has a BUSCO v5.3.2 (Manni et al., 2021) completeness of 98.4% (single 97.9%, duplicated 0.5%) using the lepidoptera_odb10 reference set.

Genome annotation report
The ilOchSylv3.1 assembly was annotated using the Ensembl rapid annotation pipeline (Table 1; GCA_905404295). The resulting annotation includes 28,177 transcribed mRNAs from 13,451 protein-coding and 3,418 non-coding genes. There are, on average, 7.20 exons and 6.20 introns per protein-coding transcript.     HiSeq X Ten (10X) instruments. Hi-C data were also generated from whole organism tissue of ilOchSylv4 using the Arima v2 kit and sequenced on the HiSeq X Ten instrument.

Genome assembly
Assembly was carried out with Hifiasm (Cheng et al., 2021) and haplotypic duplication was identified and removed with purge_dups (Guan et al., 2020). One round of polishing was    et al., 2019). The assembly was checked for contamination and corrected using the gEVAL system (Chow et al., 2016) as described previously (Howe et al., 2021). Manual curation was performed using gEVAL, HiGlass (Kerpedjiev et al., 2018) and Pretext (Harry, 2022). The mitochondrial genome was assembled using MitoHiFi (Uliano-Silva et al., 2021), which performed annotation using MitoFinder (Allio et al., 2020). The genome was analysed and BUSCO scores generated within the BlobToolKit environment (Challis et al., 2020). Table 3 contains a list of all software tool versions used, where appropriate.

Genome annotation
The Ensembl gene annotation system (Aken et al., 2016) was used to generate annotation for the O. sylvanus assembly (GCA_905404295.1). Annotation was created primarily through alignment of transcriptomic data to the genome, with gap filling via protein to-genome alignments of a select set of proteins from UniProt (UniProt Consortium, 2019).

Data availability
European

Li-Wei Wu
Tunghai University, Taichung, China The report presents the genomic sequences of Ochlodes sylvanus, with a genome size of 380 megabases (99.97% assembly). The species has a total of 32 pseudomolecules (including W and Z sex chromosomes) and a mitochondrial genome of 17.1 kilobases. This is a standard report to describe a genome data, characterized by its comprehensible analysis and writing style. It serves as an excellent reference for subsequent researchers. I only have minor suggestions regarding the readability of the content while reading: The "Scaffold statistics" and "Scale" used in Figure 2 are not entirely clear. For instance, the light gray color indicates scaffold count, while the dark gray color represents length. It's not evident how this figure effectively portrays the count and length of scaffolds. Additionally, in the lower-left section of Figure 2, there are two scales mentioned, 380M and 18.3M. It's unclear where these scales are positioned. I suggest to make slight modifications to the presentation of Figure 2 to enhance reader comprehension.

1.
In Figure 3, the labeling "Scaffolds are coloured by phylum" might be redundant. In addition, Figure 4 seems somewhat unnecessary, as the accompanying text already indicates that the genome is 380M in size. Perhaps it could be merged with Figure 3 or just removed it out for concising the reading.

Is the rationale for creating the dataset(s) clearly described? Yes
Are the protocols appropriate and is the work technically sound? Yes

Are sufficient details of methods and materials provided to allow replication by others? Yes
Are the datasets clearly presented in a useable and accessible format?