The genome sequence of the Chalkhill Blue, Lysandra coridon (Poda, 1761)

We present a genome assembly from an individual male Lysandra coridon (the Chalkhill Blue; Arthropoda; Insecta; Lepidoptera; Lycaenidae). The genome sequence is 541 megabases in span. Most of the assembly is scaffolded into 90 chromosomal pseudomolecules, including the assembled Z sex chromosome. The mitochondrial genome has also been assembled and is 15.4 kilobases in length. Gene annotation of this assembly on Ensembl identified 13,334 protein coding genes.


Background
The Chalkhill Blue (Lysandra coridon) is a species of butterfly that typically inhabits calcareous grasslands throughout Europe.In the UK, L. coridon is considered vulnerable (Fox et al., 2022), however, it is listed as Least Concern in the IUCN Red List (Europe) (Van Swaay et al., 2010).
Males possess pale metallic blue upperside wings while females are usually dark brown, although female blue forms exist (f.syngrapha).Both sexes have chequered wing fringes and a brown margin, with brown spots encircled with white, most visible on the hindwings (Schmitt, 2015).The underside wing of both sexes has multiple black spots with a white margin and row of submarginal orange markings, on a variable background ranging from whitish or grey to brownish.This species is sedentary, staying largely within local areas, which can reach high population density (Asher et al., 2001;Schmitt et al., 2006).A single brood flies between mid-June and September.Larvae feed primarily on horse-shoe vetch Hippocrepis comosa, and have a myrmecophilous relationship with ants (Fiedler & Maschwitz, 1988).
Allozyme and mitochondrial gene studies have demonstrated the existence of two major genetic lineages of L. coridon (Dapporto et al., 2022;Schmitt & Seitz, 2001;Talavera et al., 2013): a Western lineage that inhabits the UK, Spain, France, Italy, much of the Alps and most of Germany, and an Eastern lineage which is found in the Balkans, Poland, northern Germany and the rest of eastern Europe.Interestingly, L. coridon displays a gradient of populations with increasing chromosome number from west to east, encompassing from 87 to 93 chromosomes (de Lesse, 1969).Here we present a chromosomally complete genome sequence for L. coridon, based on one male specimen from Săcel, Cluj, Romania (Figure 1).

Genome sequence report
The genome was sequenced from one male L. coridon specimen collected from Săcel, Cluj, Romania (latitude 46.61, longitude 23.46).A total of 47-fold coverage in Pacific Biosciences single-molecule HiFi long reads and 60-fold coverage in 10X Genomics read clouds was generated.Primary assembly contigs were scaffolded with chromosome conformation Hi-C data.Manual assembly curation corrected 242 missing or mis-joins and removed 10 haplotypic duplications, reducing the scaffold number by 62.21% and increasing the scaffold N50 by 91.5%.
The final assembly has a total length of 540.7 Mb in 99 sequence scaffolds with a scaffold N50 of 5.9 Mb (Table 1).Most (99.92%) of the assembly sequence was assigned to 90 chromosomal-level scaffolds, representing 89 autosomes and the Z sex chromosome.Chromosome-scale scaffolds confirmed by the Hi-C data are named in order of size (Figure 2-Figure 5; Table 2).While not fully phased, the assembly deposited is of one haplotype.Contigs corresponding to the second haplotype have also been deposited.

Genome annotation report
The L. coridon genome assembly GCA_905220515.1 was generated using the Ensembl genome annotation pipeline (Table 1; Accession number GCA_905220515.1).The resulting annotation includes 13,334 protein coding genes with an average length of 16,708.51 and an average coding length of 1,435.28,and 2,742 non-protein coding genes.There is an average of 6.99 exons and 5.99 introns per canonical protein coding transcript, with an average intron length of 2,285.81.A total of 5368 gene loci have more than one associated transcript.

Sample acquisition and nucleic acid extraction
Two adult male L. coridon specimens were collected on 16 July 2018 using a hand net.The specimen that was used     for 10X sequencing.HMW DNA was sheared into an average fragment size of 12-20 kb in a Megaruptor 3 system with speed setting 30.Sheared DNA was purified by solid-phase reversible immobilisation using AMPure PB beads with a 1.8X ratio of beads to sample to remove the shorter fragments and concentrate the DNA sample.The concentration of the sheared and purified DNA was assessed using a Nanodrop spectrophotometer and Qubit Fluorometer and Qubit dsDNA High Sensitivity Assay kit.Fragment size distribution was evaluated by running the sample on the FemtoPulse system.
RNA was extracted from ilLysCori1 in the Tree of Life Laboratory at the WSI using TRIzol, according to the manufacturer's instructions.RNA was then eluted in 50 μL RNAse-free water and its concentration assessed using a Nanodrop spectrophotometer and Qubit Fluorometer using the Qubit RNA Broad-Range (BR) Assay kit.Analysis of the integrity of the RNA was done using Agilent RNA 6000 Pico Kit and Eukaryotic Total RNA assay.

Sequencing
Pacific Biosciences HiFi circular consensus and 10X Genomics read cloud DNA sequencing libraries were constructed according to the manufacturers' instructions.Poly(A) RNA-Seq libraries were constructed using the NEB Ultra II RNA Library Prep kit.DNA and RNA sequencing was performed by the Scientific Operations core at the WSI on Pacific Biosciences SEQUEL II (HiFi), Illumina HiSeq 4000 Q (RNA-Seq) and HiSeq X Ten (10X) instruments.Hi-C data were also generated from whole organism tissue of ilLysCori2 using the Arima v2 kit and sequenced on the HiSeq X Ten instrument.

Duncan Sivell
Natural History Museum, Cromwell Road, London, UK Although the authors successfully present a genome for Lysandra corison they do not explain why they have sequenced a specimen that is genetically different to the target population.Romanian specimens have been used for genome sequencing even though the DToL project is focused on sequencing British material.I assume there may have been extenuating circumstances why this has happened (e.g.COVID lockdowns?)but considering Lysandra coridon is not a difficult species to find on chalk in the south of England it does seem strange that UK material has not been processed.The authors note that populations in western and eastern Europe belong to different genetic lineages, which would seem to support the need to sequence UK butterflies over Romanian ones.
I have no issues with the text or methods in this paper, but the choice of specimens does not fit with the rational of the project.

Is the rationale for creating the dataset(s) clearly described? Partly
Are the protocols appropriate and is the work technically sound?Yes

Are sufficient details of methods and materials provided to allow replication by others? Yes
Are the datasets clearly presented in a useable and accessible format?Yes Competing Interests: No competing interests were disclosed.
Reviewer Expertise: Entomology and ecology.
I confirm that I have read this submission and believe that I have an appropriate level of expertise to confirm that it is of an acceptable scientific standard, however I have significant reservations, as outlined above.

Yaohui Wang
Anhui Agricultural University, Shanghai Institute of Plant Physiology and Ecology, Chinese Academy of Sciences, Shanghai, China

Comments for author
The authors present their work on Lysandra coridon, a species of butterfly that typically inhabits calcareous grasslands through-out Europe.The importance of the subject is very high.It is overall a nice piece of work in Lycaenidae genome assembly and Gene annotation, due to both major ecological importance and the closeness of the species to the butterfly community.I think this will be and example for the importance of gene content change analysis.In summary it is an important genome resource that is critically needed.It is pretty short and well written.

Major issue
As they are currently written, there are also some methods that need to be clarified.the methods do not provide sufficient detail for other scientists who wish to reproduce them.I would suggest that the authors include a greatly expanded methods section in their Supplementary Material, which describes parameter settings for each software used in their data analysis.

Is the rationale for creating the dataset(s) clearly described? Yes
Are the protocols appropriate and is the work technically sound?Yes

Are sufficient details of methods and materials provided to allow replication by others? Partly
Are the datasets clearly presented in a useable and accessible format?Yes Competing Interests: No competing interests were disclosed.
Reviewer Expertise: Omics analysis, Genetic regulation, Gene editing

Figure 1 .
Figure 1.Forewings and hindwings of Lysandra coridon specimens from which the genome was sequenced.A. Dorsal (left) and ventral (right) surface view of wings from specimen RO_LC_853 (ilLysCori1) from Săcel, Cluj, Romania, used for genome sequencing.B. Dorsal (left) and ventral (right) surface view of wings from specimen RO_LS_903 (ilLysCori2) from Rimetea, Romania, used for Hi-C scaffolding.

Figure 2 .
Figure 2. Genome assembly of Lysandra coridon, ilLysCori1.1:metrics.The BlobToolKit Snailplot shows N50 metrics and BUSCO gene completeness.The main plot is divided into 1,000 size-ordered bins around the circumference with each bin representing 0.1% of the 540,734,767 bp assembly.The distribution of scaffold lengths is shown in dark grey with the plot radius scaled to the longest scaffold present in the assembly (34,005,801 bp, shown in red).Orange and pale-orange arcs show the N50 and N90 scaffold lengths (5,931,830 and 4,666,103 bp), respectively.The pale grey spiral shows the cumulative scaffold count on a log scale with white scale lines showing successive orders of magnitude.The blue and pale-blue area around the outside of the plot shows the distribution of GC, AT and N percentages in the same bins as the inner plot.A summary of complete, fragmented, duplicated and missing BUSCO genes in the lepidoptera_odb10 set is shown in the top right.An interactive version of this figure is available at https://blobtoolkit.genomehubs.org/view/ilLysCori1.1/dataset/ CAJNAC01/snail.

Figure 5 .
Figure 5. Genome assembly of Lysandra coridon, ilLysCori1.1:Hi-C contact map.Hi-C contact map of the ilLysCori1.1 assembly, visualised using HiGlass.Chromosomes are shown in order of size from left to right and top to bottom.An interactive version of this figure may be viewed at https://genome-note-higlass.tol.sanger.ac.uk/l/?d=Vbiq6iyHTQC5rAsnRSQCCQ.

Peer Review Current Peer Review Status: Version 1
(Challis et al., 2020)ut withHifiasm (Cheng et al., 2021)and haplotypic duplication was identified and removed with purge_dups(Guan et al., 2020).One round of polishing was performed by aligning 10X Genomics read data to the assembly with Long Ranger ALIGN, calling variants with FreeBayes(Garrison & Marth, 2012).The assembly was then scaffolded with Hi-C data(Rao et al., 2014) using SALSA2  (Ghurye et al., 2019).The assembly was checked for contamination and corrected using the gEVAL system(Chow et al., 2016)as described previously(Howe et al., 2021).Rhie et al., 2020).The genome was analysed and BUSCO scores(Manni et al., 2021; Simão et al., 2015)were calculated within the BlobToolKit environment(Challis et al., 2020).Table3 containsa list of software tool versions and sources.within this document in respect of all samples acquired for, and supplied to, the Darwin Tree of Life Project.Each transfer of samples is further undertaken according to a Research Collaboration Agreement or Material Transfer Agreement entered into by the Darwin Tree of Life Partner, Genome Research Limited (operating as the Wellcome Sanger Institute), and in some circumstances other Darwin Tree of Life collaborators.This is an open access peer review report distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
This is an open access peer review report distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.