The genome sequence of the grizzled skipper, Pyrgus malvae (Linnaeus, 1758)

We present a genome assembly from an individual male Pyrgus malvae (the grizzled skipper; Arthropoda; Insecta; Lepidoptera; Hesperiidae). The genome sequence is 725 megabases in span. The majority (99.97%) of the assembly is scaffolded into 31 chromosomal pseudomolecules, with the Z sex chromosome assembled.


Background
The grizzled skipper, Pyrgus malvae, is a small butterfly, characteristic of chalk downland and woodland clearings, and other grassland habitats. Not to be confused with the term 'grisly' (i.e. extremely unpleasant or gruesome), P. malvae gets its common name from the tufts of long grey hair that cover its body and inner wings. Its wings bear a striking black and white checkerboard pattern, with alternating black and white stripes on the wing fringes and antennae. Notoriously difficult to follow, P. malvae has a fast and darting, low flight pattern. Pyrgus malvae is found throughout Europe, except for northern Scandinavia, several Mediterranean Islands and Iberia, southern France and Italy (where it is replaced by its sister species P. malvoides), with a range that extends eastwards across temperate Asia to Northern China and Korea (Tolman & Lewington, 2008). In the UK the species is found mainly in central and southern England, with a patchy distribution in Wales and the southwest. P. malvae typically exists in small populations (<100 adults) that are thought to form metapopulations across its range (Asher et al., 2001).
In the UK, P. malvae typically emerges in April and flies until June, although the date of first emergence is advancing, and in warm years may occur as early as March . It is univoltine in northern Europe and at higher altitudes, but is bivoltine elsewhere, and in the north it may be bivoltine when weather conditions are particularly favourable (Asher et al., 2001).
Pyrgus malvae larvae feed on a variety of host plants in the Rosaceae family, particularly agrimony (Agrimonia eupatoria), creeping cinquefoil (Potentilla reptans) and wild strawberry (Fragaria vesca) (Asher et al., 2001). When fully-grown, the larva constructs a cocoon at the base of low vegetation, where it overwinters as a pupa. Adults feed on a wide variety of nectar sources, including Bird's foot trefoil (Lotus corniculatus), bugle (Ajuga reptans), buttercup (Ranunculus species), daisy (Bellis perennis), and dandelion (Taraxacum officinale). Males are territorial, and exhibit either perching or patrolling behaviour according to habitat type (Brereton et al., 1998), and have two scent organs: the forewing costal fold and tibial tufts composed of specialised setae on the hind leg, which appear to be used to waft pheromones towards the female during courtship (Hernández-Roldán et al., 2014). Eggs are laid singly on the leaf underside of larval host plants, with the majority deposited on short vegetation in locations with a favourably warm microclimate and/or elevated nutritional content (Brereton et al., 1998).
Populations of P. malvae in the UK have declined markedly in the twentieth century (Brereton et al., 1998) and the species is a conservation priority in the UK (Brig, 2007). Encouragingly, P. malvae appears to be positively associated with grazed vegetation, and implementing grazing in habitat restoration regimes may offer a means to help reverse population declines (Wallis De Vries & Raemakers, 2001). At European level this species is listed as Least Concern in the IUCN Red List (Van Swaay et al., 2010). Pyrgus malvae has been reported as having 33 (Bigger, 1960;England) and 31 (Federley, 1938; Finland) chromosome pairs. The assembly described herein contains 31 chromosome pairs.

Genome sequence report
The genome was sequenced from a single male P. malvae ( Figure 1) collected from Suatu, Cluj County, Romania (latitude 46.7648, longitude 23.9845). A total of 69-fold coverage in Pacific Biosciences single-molecule circular consensus (HiFi) long reads and 49-fold coverage in 10X Genomics read clouds were generated. Primary assembly contigs were scaffolded with chromosome conformation Hi-C data. Manual assembly curation corrected 2 missing/misjoins and removed 1 haplotypic duplication, reducing the assembly length by 0.01% and the scaffold number by 7.69%.
The final assembly has a total length of 725 Mb in 36 sequence scaffolds with a scaffold N50 of 27.0 Mb ( Table 1). The majority, 99.97%, of assembly sequence was assigned to 31 chromosomal-level scaffolds, representing 30 autosomes (numbered by sequence length), and the Z sex chromosome (Figure 2- Figure 5; Table 2). The assembly has a BUSCO v5.1.2 (Manni et al., 2021) completeness of 98.8% (single 98.3%, duplicated 0.4%) using the lepidoptera_odb10 reference set (n=5286). While not fully phased, the assembly deposited is of one haplotype. Contigs corresponding to the second haplotype have also been deposited.

Genome annotation report
The ilPyrMalv3.1 genome has been annotated using the Ensembl rapid annotation pipeline (Table 1; https://rapid. ensembl.org/Pyrgus_malvae_GCA_911387765.1/). The resulting annotation includes 23,484 transcribed mRNAs from 12,096 protein-coding and 2,976 non-coding genes. There are 1.66 coding transcripts per gene and 7.83 exons per transcript.

Methods
Sample acquisition and nucleic acid extraction A male P. malvae specimen (ilPyrMalv3, male, genome assembly) was collected from Suatu, Cluj County, Romania (latitude 46.7648, longitude 23.9845) using a net by Konrad Lohse, Alex Hayward Dominik Laetsch and Roger Vila, who also identified the sample. A further two specimens (ilPyrMalv2, unknown sex, Hi-C; ilPyrMalv1, RNA-Seq) were collected from Baciu, Cluj County, Romania (latitude 46.8, longitude 23.5) using a net and were identified by the same team. All samples were snap-frozen at -80°C.
DNA was extracted from the whole organism of ilPyrMalv3 at the Wellcome Sanger Institute (WSI) Scientific Operations core from the whole organism using the Qiagen MagAttract HMW DNA kit, according to the manufacturer's instructions. RNA (from the whole organism of ilPyrMalv1) was extracted Figure 1. Fore and hind wings of the Pyrgus malvae specimen from which the genome was sequenced. Top: Dorsal (left) and ventral (right) surface view of wings from specimen RO_PM_973 (ilPyrMalv3) from Suatu, Romania, used to generate Pacific Biosciences and 10X genomics data. Bottom: Dorsal (left) and ventral (right) surface view of wings from specimen RO_PM_838 (ilPyrMalv2) from Cluj-Napoca, Romania, used to generate Hi-C data.

SEQUEL II (HiFi), Illumina HiSeq X (10X) and Illumina
HiSeq 4000 (RNA-Seq) instruments. Hi-C data were also generated from whole organism tissue of ilPyrMalv2 using the Arima v1 Hi-C kit and sequenced on an Illumina NovaSeq 6000 instrument.

Genome assembly
Assembly was carried out with Hifiasm (Cheng et al., 2021); haplotypic duplication was identified and removed with purge_dups (Guan et al., 2020). One round of polishing was performed by aligning 10X Genomics read data to the   et al., 2020). Table 3 contains a list of all software tool versions used, where appropriate.

Genome annotation
The Ensembl gene annotation system (Aken et al., 2016) was used to generate annotation for the Pyrgus malvae assembly (GCA_911387765.1). Annotation was created primarily through alignment of transcriptomic data to the genome, with gap filling via protein-to-genome alignments of a select set of proteins from UniProt (UniProt Consortium, 2019).
The genome sequence is released openly for reuse. The P. malvae genome sequencing initiative is part of the Darwin Tree of Life (DToL) project. All raw sequence data and the assembly have been deposited in INSDC databases. Raw data and assembly accession identifiers are reported in Table 1.

Partly
Competing Interests: No competing interests were disclosed.
Reviewer Expertise: genomics, Lepidoptera, phylogenetics, genetics, evo-devo I confirm that I have read this submission and believe that I have an appropriate level of expertise to confirm that it is of an acceptable scientific standard, however I have significant reservations, as outlined above.