The genome sequence of the Arran brown, Erebia ligea (Linnaeus, 1758)

We present a genome assembly from an individual male Erebia ligea (Arran brown; Arthropoda; Insecta; Lepidoptera; Nymphalidae). The genome sequence is 506 megabases in span. The majority (99.92%) of the assembly is scaffolded into 29 chromosomal pseudomolecules, with the Z sex chromosome assembled. The complete mitochondrial genome was also assembled and is 15.2 kilobases in length.


Background
The Arran brown, Erebia ligea, is one of the most widespread species of the genus Erebia, occurring from the Russian Kamchatka Peninsula and Japan in eastern Asia (Dubatolov et al., 1998) to central and northern Europe (Kudrna et al., 2015). Although the species takes its common name from the Isle of Arran in Scotland, where it was first recorded in 1803, the current and historic presence of this butterfly in the British Isles remains disputed (Salmon, 1995). The intraspecific phenotypic diversity present throughout the distribution of E. ligea has triggered the description of several subspecies (Dubatolov et al., 1998;Warren, 1937;Zakharova & Tatarinov, 2016), however, a formal biogeographic assessment remains lacking.
E. ligea is characterised as a woodland species associated with clearings and meadows, and occurs at relatively low altitudes compared to most other Erebia butterflies (Kleckova et al., 2014). Recorded host plants include a variety of grasses (Poaceae) and sedges (Carex, Cyperaceae). It is univoltine and in some northern localities it is recorded only every second year (Tolman & Lewington, 2008). Although E. ligea is considered a species of Least Concern according to the IUCN Red List (Europe) (van Swaay et al., 2010), the species can be locally endangered (Fichefet et al., 2008).
While the first karyotypic analysis suggested that male Erebia ligea from Finland have 29 chromosomes (Federley, 1938), Japanese individuals from Hokkaido were found to have only 28 chromosomes (Saitoh & Abe, 1997). These values are close to the most common and putatively ancestral chromosomal state for Lepidoptera (n=31;Robinson, 1971), although Erebia is one of the most karyologically diverse known genera of butterflies (Robinson, 1971;de Vos et al., 2020).

Genome sequence report
The genome was sequenced from a single male E. ligea ( Figure 1) collected from Borzont, Joseni, Harghita, Romania (latitude 46.664, longitude 25.317). A total of 34-fold coverage of Pacific Biosciences single-molecule circular consensus (HiFi) long reads and 63-fold coverage of 10X Genomics read clouds were generated. Primary assembly contigs were scaffolded with chromosome conformation Hi-C data. Manual assembly curation corrected 47 missing/misjoins and removed 10 haplotypic duplications, reducing the assembly length by 3.59% and the scaffold number by 39.39%, and increased the scaffold N50 by 4.29%.
The final assembly has a total length of 506 Mb in 40 sequence scaffolds, with a scaffold N50 of 19.1 Mb ( Table 1). The majority, 99.92%, of assembly sequence was assigned to 40 chromosomal-level scaffolds, representing 28 autosomes  (numbered by sequence length), and the Z sex chromosome (Figure 2- Figure 5; Table 2  sheared into an average fragment size between 12-20 kb in a Megaruptor 3 system with speed setting 30. Sheared DNA was purified by solid-phase reversible immobilisation using AMPure PB beads with a 1.8X ratio of beads to sample to remove the shorter fragments and concentrate the DNA sample. The concentration of the sheared and purified DNA was assessed using a Nanodrop spectrophotometer and Qubit Fluorometer and Qubit dsDNA High Sensitivity Assay kit. Fragment size distribution was evaluated by running the sample on the FemtoPulse system.

Sequencing
Pacific Biosciences HiFi circular consensus and 10X Genomics read cloud DNA sequencing libraries were constructed according to the manufacturers' instructions. Sequencing was performed by the Scientific Operations core at the WSI on Pacific Biosciences SEQUEL II (HiFi) and Illumina HiSeq X (10X) instruments. Hi-C data were also generated from remaining whole organism tissue of ilEreLige1 using the Arima v1 Hi-C kit and sequenced on an Illumina HiSeq X (10X) instrument.    performed annotation using MitoFinder (Allio et al., 2020). The genome was analysed and BUSCO scores generated within the BlobToolKit environment (Challis et al., 2020). Table 3 contains a list of all software tool versions used, where appropriate.
The genome sequence is released openly for reuse. The E. ligea genome sequencing initiative is part of the Darwin Tree of Life (DToL) project. All raw sequence data and the assembly have been deposited in INSDC databases. Raw data and assembly accession identifiers are reported in Table 1.

Manuela Lopez Villavicencio
Museum National d'Histoire Naturelle Reseau des bibliotheques du museum, Paris, Île-de-France, France This article reports the genome assembly of a male individual of the very widespread butterfly species Erebia ligea. The assembly included the autosomes, the Z sexual chromosome and the mitochondrial genome.
The article is clearly written and pleasant to read. It shows convincing evidences for a high-quality assembly based on BUSCO scores. The methods for genome assembly, quality test and Hi-C scaffolding are relevant and up-to-date. There are sufficient details of methods and materials provided to allow replication by others.
It is important to highlight that the assembly sequence was assigned to 40 chromosomal-level scaffolds, representing 28 autosomes plus the sexual Z chromosome and this is relevant because previous karyotypic analysis have found different chromosomal numbers for this species.
As I am interested in butterfly genomes, I would have loved to have other interesting information as genome-wide level of heterozygosity (estimated with jellyfish and genomescope).
Overall, I think the release of this well-assembled and annotated genome is a very useful contribution and I recommend the indexing of this article.

Is the rationale for creating the dataset(s) clearly described? Yes
Are the protocols appropriate and is the work technically sound? Yes

Are sufficient details of methods and materials provided to allow replication by others? Yes
Are the datasets clearly presented in a useable and accessible format? Yes