The genome sequences of the male and female green-veined white, Pieris napi (Linnaeus, 1758)

We present genome assemblies from a male and female Pieris napi (the green-veined white; Arthropoda; Insecta; Lepidoptera; Pieridae). The genome sequences of the male and female are 320 and 319 megabases in span, respectively. The majority of the assembly (99.79% of the male assembly, 99.88% of the female) is scaffolded into 24 autosomal pseudomolecules, with the Z sex chromosome assembled for the male and Z and W chromosomes assembled for the female. Gene annotation of the male assembly on Ensembl has identified 13,221 protein coding genes.


Introduction
Pieris napi, green-veined white, is a small circumboreal butterfly that is widespread throughout the British Isles apart from Shetland and parts of the Scottish highlands. Adults can be found laying eggs on wild brassicas over several generations from spring to the beginning of autumn. P. napi has seen recent increases in abundance in the UK (Fox et al., 2015) and is listed as Least Concern in the IUCN Red List (Europe) (van Swaay et al., 2009). This species has been used to investigate evolutionary dynamics in insect immune system genes, which were shown to harbour elevated genetic diversity and signals of either balancing or positive selection (Keehnen et al., 2018). P. napi has 25 pairs of chromosomes, a genome size of 349.8 Mb (Hill et al., 2019), and is female heterogametic (WZ). We note the recent production of a high-quality genome assembly for P. napi (Hill et al., 2019), and believe the sequence described here, generated as part of the Darwin Tree of Life project, will further aid understanding of the biology and ecology of this butterfly. Both male and female assemblies were produced to enable correct identification of and discrimination between the sex chromosomes.

Genome sequence report
The genomes were sequenced from a single male P. napi, ilPieNapi4, and single female, ilPieNapi1, collected from Carrifran Wildwood, Scotland (latitude 55.400132, longitude -3.3352) ( Figure 1). Hi-C data for both assemblies were generated from a second male P. napi, ilPieNapi5, collected from the same location ( Figure 1). A total of 91-fold coverage in Pacific Biosciences single-molecule long reads and 107-fold coverage in 10X Genomics read clouds were generated for the male assembly; 60-and 52-fold coverage were generated using the Pacific Biosciences and 10X Genomics technologies for the female assembly. Primary assembly contigs were scaffolded with chromosome conformation Hi-C data. Manual assembly curation of the male assembly corrected 33 missing/misjoins and removed seven haplotypic duplications, reducing the assembly size by 0.71% and scaffold number by 25.00%, and increasing the scaffold N50 by 3.75%. Manual assembly curation of the female assembly corrected 105 missing/misjoins and removed 28 haplotypic duplications, reducing the assembly size by 1.22% and scaffold number by 27.59%, and increasing the scaffold N50 by 1.98%.
The final male assembly has a total length of 320 Mb in 49 sequence scaffolds with a scaffold N50 of 13 Mb; the final female assembly has a total length of 319 Mb in 43 sequence scaffolds with a scaffold N50 of 13 Mb (Table 1). Of the male assembly sequence, 99.79% was assigned to 25 chromosomal-level scaffolds, representing 24 autosomes (numbered by synteny to the female assembly), and the Z sex chromosome; of the female assembly sequence, 99.88% was assigned to 26 chromosomal-level scaffolds, representing 24 autosomes (numbered by sequence length) and the W and Z chromosomes (Figure 2- Figure 5; Table 2). The assemblies have a BUSCO (Simão et al., 2015) v5.1.2 completeness of 99.1% (single 98.5%, duplicated 0.5%, fragmented 0.2%, missing 0.7%; male) and 99.0% (single 98.4%, duplicated 0.6%, fragmented 0.2%, missing 0.8%; female) using the lepidoptera_odb10 reference set. While not fully phased, the assemblies deposited are of one haplotype. Contigs corresponding to the second haplotype for each assembly have also been deposited.

Gene annotation
The Ensembl gene annotation system (Aken et al., 2016) was used to generate annotation for the male Pieris napi assembly ilPieNapi4.1 (GCA_905231885.1, see https://rapid.ensembl. org/Pieris_napi_GCA_905231885.1; Table 1). The annotation was created primarily through alignment of transcriptomic data to the genome, with gap filling via protein to-genome alignments of a select set of proteins from UniProt (UniProt Consortium, 2019) and OrthoDB (Kriventseva et al., 2008). Prediction tools, CPC2 (Kang et al., 2017) and RNAsamba (Camargo et al., 2020), were used to aid determination of protein coding genes.

Sample acquisition and nucleic acid extraction
Three male (ilPieNapi4, genome assembly; ilPieNapi5, Hi-C; ilPieNapi6, RNAseq) and one female (ilPieNapi1, genome assembly) P. napi specimens were collected from Carrifran Wildwood, Scotland (latitude 55.400132, longitude -3.3352) by Konrad Lohse, University of Edinburgh, who also identified the specimens. A second female P. napi specimen (ilPieNapi9, RNA-Seq) was collected by Alex Hayward, University of Exeter, who also identified the specimen. All specimens were caught with a handnet and were snap-frozen in liquid nitrogen.
DNA was extracted from the whole organisms of ilPieNapi1 and ilPieNapi4 at the Wellcome Sanger Institute (WSI) Scientific Operations core from the whole organism using the Qiagen MagAttract HMW DNA kit, according to the manufacturer's instructions. RNA (from the whole organisms of ilPieNpi6 and ilPieNapi9) was extracted in the Tree of Life Laboratory at the WSI using TRIzol, according to the manufacturer's instructions. RNA was then eluted in 50 μl RNAse-free water and its concentration RNA assessed using a Nanodrop spectrophotometer and Qubit Fluorometer using the Qubit RNA Broad-Range (BR) Assay kit. Analysis of the integrity of the RNA was done using Agilent RNA 6000 Pico Kit and Eukaryotic Total RNA assay.

Sequencing
Pacific Biosciences HiFi circular consensus and 10X Genomics read cloud DNA sequencing libraries were constructed according to the manufacturers' instructions. Poly(A) RNA-Seq libraries were constructed using the NEB Ultra II RNA Library Prep kit. DNA and RNA sequencing was performed by the       HiSeq 4000 (RNA-Seq) instruments. Hi-C data were generated from the whole organism of ilPieNapi5 using the Arima v1.0 kit and sequenced on HiSeq X.

Genome assembly
Assembly of both genomes was carried out with HiCanu (Nurk et al., 2020). Haplotypic duplication was identified and removed with purge_dups (Guan et al., 2020). One round of polishing was performed by aligning 10X Genomics read data to the assembly with longranger align, calling variants with freebayes (Garrison & Marth, 2012). The assemblies were then scaffolded with Hi-C data (Rao et al., 2014) using SALSA2 (Ghurye et al., 2019. The assemblies were checked for contamination and corrected using the gEVAL system (Chow et al., 2016) as described previously (Howe et al., 2021). Manual curation was performed using gEVAL, HiGlass (Kerpedjiev et al., 2018) and Pretext. The mitochondrial genomes were assembled using MitoHiFi (Uliano-Silva et al., 2021). The genomes were analysed and BUSCO scores generated within the BlobToolKit environment (Challis et al., 2020). Table 3 contains a list of all software tool versions used, where appropriate.

Ethics and compliance issues
The materials that have contributed to this genome note were supplied by a Tree of Life collaborator. The Wellcome Sanger Institute employs a process whereby due diligence is carried out proportionate to the nature of the materials themselves, and the circumstances under which they have been/are to be collected and provided for use. The purpose of this is to address and mitigate any potential legal and/or ethical implications of receipt and use of the materials as part of the research project, and to ensure that in doing so we align with best practice wherever possible.
The overarching areas of consideration are: • Ethical review of provenance and sourcing of the material; • Legality of collection, transfer and use (national and international).