The genome sequence of the Adonis blue, Lysandra bellargus (Rottemburg, 1775)

We present a genome assembly from an individual female Lysandra bellargus (the Adonis blue; Arthropoda; Insecta; Lepidoptera; Lycaenidae). The genome sequence is 529 megabases in span. The majority of the assembly (99.93%) is scaffolded into 46 chromosomal pseudomolecules with the W and Z sex chromosomes assembled. The complete mitochondrial genome was also assembled and is 15.6 kilobases in length. Gene annotation of this assembly on Ensembl has identified 13,249 protein coding genes.


Background
The Adonis blue, Lysandra bellargus (Rottemburg, 1775), is a butterfly belonging to the gossamer-wing butterfly family Lycaenidae. It inhabits the western Palearctic region, being found most commonly in southern and central Europe, western Russia, Turkey, Transcaucasia, Caucasus, north Iraq and Iran. Its presence has been confirmed in northern Morocco and found occasionally as far north as southern Sweden (Raper, 2021). While L. bellargus is listed as a species of Least Concern on the IUCN Red List of Europe (van Swaay et al., 2010), it is considered a vulnerable species in the UK (Fox et al., 2022). Within the UK, the Adonis blue is at its northern range limit and exists primarily in southern counties, such as Dorset, Wiltshire, Kent, Sussex and Surrey, as well as the Isle of Wight. Populations in the UK have been in general decline since the 1950s (Thomas, 1983) and were severely reduced in the late 1970s after a drought caused widespread damage to the host plant (Harper et al., 2003). However, there is evidence of colonies of less than 50 individuals recovering to populations of 60,000 over five years (Bourn et al., 1998).
The Adonis blue is bivoltine, prefers calcareous grassland in sheltered, warm conditions and lays its eggs on horseshoe vetch (Hippocrepis comosa). The Adonis blue is named after the sky-blue wings of the males, which are surrounded by a black rim with a white margin. The female is chocolate brown with a small number of blue scales near the base of the wings (Still, 1996). The caterpillar is dark green with spines and has yellow stripes lining its back and sides (Carter & Hargreaves, 1986).
The karyotype of the Adonis blue is unusual in that it has 45 chromosomes, while most butterflies in Lycaenidae have a conserved haploid chromosome number of either 23 or 24 (Lorkovic, 1990). The genome sequence will offer further insight into lepidopteran genome evolution.

Genome sequence report
The genome was sequenced from a single female L. bellargus collected from El Brull, Catalunya, Spain (Figure 1). A total of 52-fold coverage of Pacific Biosciences single-molecule HiFi long reads and 65-fold coverage of 10X Genomics read clouds were generated. Primary assembly contigs were scaffolded with chromosome conformation Hi-C data. Manual assembly curation corrected 207 missing/misjoins and removed 13 haplotypic duplications, reducing the assembly size by 0.53% and the scaffold number by 47.35%, and increasing the scaffold N50 by 72.56%.
The final assembly has a total length of 529 Mb in 139 sequence scaffolds with a scaffold N50 of 11.2 Mb ( Table 1). The majority, 99.93%, of the assembly sequence was assigned to 46 chromosomal-level scaffolds, representing 44 autosomes (numbered by sequence length) and the W and Z sex chromosomes (Figure 2- Figure 5; Table 2). The putative W chromosome contains multiple separate regions with homology to microsporidian, insect and insect mitochondrial sequences, also displaying features consistent with its being a lepidopteran W chromosome (Hi-C interactions with lepidopteran chromosomes, telomeric repeats consistent with a lepidopteran origin, high repeat density, reduced coverage). We observed no evidence of extant infection with a microsporidian parasite. This suggests horizontal gene transfer from a microsporidian

Genome annotation report
The ilLysBell1.1 genome was annotated using the Ensembl rapid annotation pipeline (Table 1; Lysandra bellargus Ensembl page). The resulting annotation includes 24,348 transcribed mRNAs from 13,249 protein-coding and 2,895 non-coding genes. There is an average of 7.00 exons and 6.00 introns per canonical protein coding transcript, with an average intron length of 2,238.35 bases.    extraction kit. Low molecular weight DNA was removed from a 200-ng aliquot of extracted DNA using 0.8X AMpure XP purification kit prior to 10X Chromium sequencing; a minimum of 50 ng DNA was submitted for 10X sequencing. HMW DNA was sheared into an average fragment size between 12-20 kb in a Megaruptor 3 system with speed setting 30. Sheared DNA was purified by solid-phase reversible immobilisation using AMPure PB beads with a 1.8X ratio of beads to sample to remove the shorter fragments and concentrate the DNA sample. The concentration of the sheared

Sequencing
Pacific Biosciences HiFi circular consensus and 10X Genomics Chromium read cloud sequencing libraries were constructed according to the manufacturers' instructions. Sequencing was performed by the Scientific Operations core at the Wellcome Sanger Institute on Pacific Biosciences SEQUEL II (HiFi), Illumina HiSeq X Ten and Illumina HiSeq 4000 (RNA-Seq) instruments. Hi-C data were generated in the Tree of Life laboratory from remaining whole organism tissue of ilLysBell1 using the Arima v2 kit and sequenced on a NovaSeq 6000 instrument.

Genome assembly
Assembly was carried out with Hicanu (Nurk et al., 2020); haplotypic duplication was identified and removed with purge_dups (Guan et al., 2020). One round of polishing was performed by aligning 10X Genomics read data to the assembly with longranger align, calling variants with freebayes (Garrison & Marth, 2012). The assembly was then scaffolded with Hi-C data (Rao et al., 2014) using SALSA2 (Ghurye et al., 2019. The assembly was checked for contamination and corrected using the gEVAL system (Chow et al., 2016) as described previously (Howe et al., 2021). Manual curation (Howe et al., 2021) was performed using gEVAL, HiGlass (Kerpedjiev et al., 2018) and Pretext. The mitochondrial genome was assembled using MitoHiFi (Uliano-Silva et al., 2021), which performs annotation using MitoFinder (Allio et al., 2020). The genome was analysed and BUSCO scores generated within the BlobToolKit environment (Challis et al., 2020). Table 3 contains a list of all software tool versions used, where appropriate.

Genome annotation
The Ensembl gene annotation system (Aken et al., 2016) was used to generate annotation for the Lysandra bellargus assembly (GCA_905333045.1). Annotation was created primarily through alignment of transcriptomic data to the genome, with gap filling via protein-to-genome alignments of a select set of proteins from UniProt (UniProt Consortium, 2019).

Data availability
European Nucleotide Archive: Lysandra bellargus (Adonis blue). Accession number PRJEB43534; https://identifiers.org/ ena.embl/PRJEB43534 The genome sequence is released openly for reuse. The L. bellargus genome sequencing initiative is part of the Darwin Tree of Life (DToL) project. All raw sequence data and the assembly have been deposited in INSDC databases. Raw data and assembly accession identifiers are reported in Table 1.  This is an open access peer review report distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Maria de la Paz Celorio-Mancera
Stockholm University, Stockholm, Sweden The authors present the genome assembly for the Adonis blue, Lysandra bellargus. The species is of interest since it is considered vulnerable in the UK and because of two other observations: its potential capacity of population recovery from a limited number of individuals and its unusual number of chromosomes when compared to that observed among the family Lycaenidae. In general, the article reports a well conducted effort to obtain the genome of this species of butterfly. My concerns are related to the rationale and the protocols used for the study.
First, if the species is of interest due to its chromosome number, would the authors have observed or expect intraspecific variability in the karyotype of the species? This exceptional although possible circumstance is not addressed. Unfortunately, since the study does not provide a chromosome level genome (no linkage mapping done), the authors can not conclude the actual number of chromosomes found in their specimen.
Second, there is a general effort in the community to provide genomes that are as homozygous as possible. I understand that inbreeding and laboratory breeding is difficult in some species. However, I miss the effort from the authors to provide a somewhat homozygous genome and therefore nucleotide diversity as a potential concern if using this genome as a reference.
I found the BlobToolKit a very nice tool for the visualisation of genome data.

Is the rationale for creating the dataset(s) clearly described? Partly
Are the protocols appropriate and is the work technically sound? Partly

Are sufficient details of methods and materials provided to allow replication by others? Yes
Maybe it could be stated more clearly what reduced coverage means for W (and Z) chromosomes. Are the differences shown in the Figure 3? ○ How much did the 10x data polishing improve the genome? Did the Hifi reads / assembled contigs have worse base qualities than obtained by Illumina sequencing? Are the 10x UMIs used in the process? They could be used for scaffolding and phasing? ○ There is no mention of the use of RNAseq data.

Is the rationale for creating the dataset(s) clearly described? Yes
Are the protocols appropriate and is the work technically sound? Partly

Are sufficient details of methods and materials provided to allow replication by others? Partly
Are the datasets clearly presented in a useable and accessible format? Yes