The genome sequence of the Brown Argus, Aricia agestis (Denis & Schiffermüller, 1775)

We present genome assemblies from two male Aricia agestis specimens (the Brown Argus; Arthropoda; Insecta; Lepidoptera; Lycaenidae). The genome sequences are 435.3 and 437.4 megabases in span. Each assembly is scaffolded into 23 chromosomal pseudomolecules, including the Z sex chromosome. The mitochondrial genomes were assembled and are 15.47 and 15.45 kilobases in length. Gene annotation of these assemblies on Ensembl identified 12,688 and 12,654 protein coding genes.


Background
The Brown Argus (Aricia agestis) is a small butterfly, 24-32 mm in wingspan (Aagaard et al., 2002).As its name implies, the upper wings of both the male and the female are brown, and orange spots run along all four white wing edges.These resemble the many eyes of Argus, the giant from Greek mythology, which provides the inspiration for its common name.The Brown Argus is a member of the 'blues' or Polyommatinae subfamily of Lycaenidae, so named because their furry bodies resemble that of wolves, but the species is not sexually dimorphic, unlike many of the blues.The Brown Argus is very similar in appearance to the Northern Brown Argus (Aricia artaxerxes), which in the UK occurs only in Scotland and northern England and has distinctive white spots on the upper forewing.F 1 and F 2 hybrids of these two species can be produced in the lab (Jarvis, 1959).A. montensis and A. cramera, also related and hardly distinguishable by external morphology, occur in the Iberian Peninsula, North Africa and, in the case of the latter, in the Balearic Islands and Sardinia.Aricia agestis has a parapatric distribution with all these taxa, and hybrids seem to occur at the contact zones (Sañudo-Restrepo et al., 2013;Vodă et al., 2015).
The range of the Brown Argus centres around the western Palaearctic, reaching southern Scandinavia in the north and Sicily and Crete in the south, but extends as far as Amur and Siberia in the eastern Palaearctic (Tolman & Lewington, 2008).The preferred habitat of the Brown Argus is flowery grasslands on calcareous soil, where its eggs are laid on the upper surface of leaves from Common Rock-rose (Helianthemum nummularium) or certain Geraniaceae species (Tolman & Lewington, 2008).A northward range expansion has been demonstrated in the UK, where populations at the range limit appear to have switched mostly to Geraniaceae host plants (Bridle et al., 2014;Buckley et al., 2012;de Jong et al., 2022;Pateman et al., 2012).Within the southern part of its range, the Brown Argus produces several generations per year, and is possibly trivoltine in the Mediterranean, whereas it is bivoltine or univoltine, at higher latitudes (Bury, 2016).
The caterpillar of the Brown Argus feeds on the host plant where it hatches, with early larval stages cutting distinctive window-shaped holes in the lower leaf surface, while later stages consume the entire leaf blade.The caterpillar is also attended to by ants, most often by Lasius spp. or Myrmica sabuleti (Bury, 2016;Tolman & Lewington, 2008).Symbiosis between ants and lycaenid butterflies is common, and can be obligate and species-specific, or facultative.In the Brown Argus it appears to be the latter (Fiedler & Hummel, 1995).It diapauses as larvae during winter, and then pupate the subsequent spring (Tolman & Lewington, 2008).Pupae are brought underground by ants, but the degree of symbiosis remains unclear (Bury, 2016).The Brown Argus is a common butterfly where it occurs, and is not regarded as under threat by the IUCN.
The number of chromosome pairs in Aricia agestis has been recorded as 23 (Federley, 1938;Robinson, 1971) and 24 (Lorković, 1941); but note that some of these studies may refer to the closely related A. artaxerxes, especially Federley (1938).The genome of the Brown Argus will be useful for further studies on host plant shifts associated with climate change, as well as ecological specialisation and speciation in the genus Aricia.

Genome sequence report
The genome was sequenced from two male Aricia agestis specimens (Figure 1) collected from Romania (46.83,23.63).
For the assembly ilAriAges1.1, a total of 54-fold coverage in Pacific Biosciences single-molecule HiFi long reads and 85fold coverage in 10X Genomics read clouds was generated.Primary assembly contigs were scaffolded with chromosome conformation Hi-C data.Manual assembly curation corrected two missing joins or misjoins.The final ilAriAges1.1 assembly has a total length of 435.3 Mb in 25 sequence scaffolds with a scaffold N50 of 19.0 Mb (Table 1).
For the assembly ilAriAges2.1, a total of 56-fold coverage in Pacific Biosciences single-molecule HiFi long reads and 80fold coverage in 10X Genomics read clouds was generated.Manual assembly curation corrected 15 missing joins or misjoins and removed six haplotypic duplications, reducing the scaffold number by 27.5%, and increasing the scaffold N50 by 0.71%.The final ilAriAges2.1 assembly has a total length of 437.4 Mb in 29 sequence scaffolds with a scaffold N50 of 19.1 Mb (Table 1).
For both assemblies, most of the assembly sequence was assigned to 23 chromosomal-level scaffolds, representing 22 autosomes and the Z sex chromosome.Chromosome-scale scaffolds confirmed by the Hi-C data are named in order of size (Figure 2-Figure 5; Table 2 and Table 3).While not fully phased, each deposited assembly is of one haplotype.Contigs corresponding to the second haplotype have also been deposited.The mitochondrial genome for each specimen was also assembled and can be found as a contig within the multifasta file of the genome submission.
Metadata for specimens, spectral estimates, sequencing runs, contaminants and pre-curation assembly statistics can be found at https://links.tol.sanger.ac.uk/species/91739.

Sample acquisition and nucleic acid extraction
Two male Aricia agestis (ToLIDs ilAriAges1 and ilAriAges2) were collected from Cluj-Napoca, Romania (latitude 46.83, longitude 23.63) on 2018-07-15 during the daytime using a handnet.The collectors were Konrad Lohse (University of Edinburgh), Alex Hayward (University of Exeter), Dominik Laetsch (University of Edinburgh), and Roger Vila (Institut de Biologia Evolutiva).The specimen was identified by Roger Vila and then snap-frozen from live in a dry shipper.
DNA was extracted at the Tree of Life laboratory, Wellcome Sanger Institute (WSI).The ilAriAges1 sample was weighed and dissected on dry ice with tissue set aside for Hi-C sequencing.Tissue from the whole organism tissue was disrupted using a Nippi Powermasher fitted with a Bio-Masher pestle.High molecular weight (HMW) DNA was extracted using the Qiagen MagAttract HMW DNA extraction kit.Low molecular weight DNA was removed from a 20 ng aliquot of extracted DNA using the 0.8X AMpure XP purification kit prior to 10X Chromium sequencing; a minimum of 50 ng DNA was submitted for 10X sequencing.HMW DNA was sheared into an average fragment size of 12-20 kb in a Megaruptor 3 system with speed setting 30.Sheared DNA was purified by solid-phase reversible immobilisation using AMPure PB beads with a 1.8X ratio of beads to sample to remove the shorter fragments and concentrate the DNA sample.The concentration of the sheared and purified DNA was assessed using a Nanodrop spectrophotometer and Qubit Fluorometer and Qubit dsDNA High Sensitivity Assay kit.Fragment size distribution was evaluated by running the sample on the FemtoPulse system.

Sequencing
Pacific Biosciences HiFi circular consensus and 10X Genomics read cloud DNA sequencing libraries were constructed according to the manufacturers' instructions.DNA sequencing was performed by the Scientific Operations core at the WSI on Pacific Biosciences SEQUEL II (HiFi) and HiSeq X Ten (10X) instruments.Hi-C data were also generated from tissue of ilAriAges1,ilAriAges2 using the Arima2 kit and sequenced on the HiSeq X Ten instrument.

Genome assembly, curation and evaluation
For ilAriAges1, assembly was carried out with Hifiasm (Cheng et al., 2021), while for ilAriAges2, assembly was performed using Hicanu (Nurk et al., 2020).The same procedure was followed for the rest of the steps.Haplotypic duplication was identified and removed with purge_dups (Guan et al., 2020).One round of polishing was performed by aligning 10X Genomics read data to the assembly with Long Ranger ALIGN, calling variants with FreeBayes (Garrison & Marth, 2012).The assembly was then scaffolded with Hi-C data (Rao et al., 2014) using SALSA2 (Ghurye et al., 2019).The assembly was checked for contamination and corrected using the gEVAL system (Chow et al., 2016) as described       et al., 2020) andBUSCO scores (Manni et al., 2021;Simão et al., 2015) were calculated.
Table 4 contains a list of relevant software tool versions and sources.

Genome annotation
The Ensembl gene annotation system (Aken et al., 2016) was used to generate annotation for the Aricia agestis assemblies.Annotation was created primarily through alignment of transcriptomic data to the genome, with gap filling via

Xueyan Li
State Key Laboratory of Genetic Resources and Evolution, Kunming Institute of Zoology, Chinese Academy of Sciences, Kunming, China This manuscript provides two high-quality genome assemblies of two male specimens of the butterfly Aricia agestis, which is a very important genome resource of Lepidoptera, especially for further studies on host plant shifts associated with climate change, as well as ecological specialisation and speciation in the genus Aricia.Nevertheless, I have some questions as follows.
What is the aim to sequence and assemble the genomes of two individuals for the same butterfly species? 1.
Please give some explanations for the differences of two assemblies (genome size, protein coding gene number).

2.
Why were two different softwares selected to assemble two specimen genomes? 3.

Is the rationale for creating the dataset(s) clearly described? Yes
Are the protocols appropriate and is the work technically sound?Yes

Are sufficient details of methods and materials provided to allow replication by others? Yes
Are the datasets clearly presented in a useable and accessible format?Yes

Figure 1 .
Figure 1.Photograph of an Aricia agestis (ilAriAges1) specimen used for genome sequencing.(A) Dorsal surface view of wings (B) Ventral surface view of wings.

Figure 2 .
Figure 2. Genome assembly of Aricia agestis: metrics.The BlobToolKit Snailplot shows N50 metrics and BUSCO gene completeness.The main plot is divided into 1,000 size-ordered bins around the circumference with each bin representing 0.1% of the assembly.The distribution of sequence lengths is shown in dark grey with the plot radius scaled to the longest sequence present in the assembly Orange and pale-orange arcs show the N50 and N90 sequence lengths respectively.The pale grey spiral shows the cumulative sequence count on a log scale with white scale lines showing successive orders of magnitude.The blue and pale-blue area around the outside of the plot shows the distribution of GC, AT and N percentages in the same bins as the inner plot.A summary of complete, fragmented, duplicated and missing BUSCO genes in the lepidoptera_odb10 set is shown in the top right for each figure.Interactive versions of these figures are available for ilAriAges1.1 and ilAriAges2.1.

Figure 3 .
Figure 3. Genome assembly of Aricia agestis: BlobToolKit GC-coverage plot.Scaffolds are coloured by phylum.Circles are sized in proportion to scaffold length.Histograms show the distribution of scaffold length sum along each axis.Interactive versions of these figures are available for ilAriAges1.1 and ilAriAges2.1.

Figure 4 .
Figure 4. Genome assembly of Aricia agestis: BlobToolKit cumulative sequence plot.The grey line shows cumulative length for all scaffolds.Coloured lines show cumulative lengths of scaffolds assigned to each phylum using the buscogenes taxrule.Interactive versions of these figures are available for ilAriAges1.1 and ilAriAges2.1.

Figure 5 .
Figure 5. Genome assembly of Aricia agestis Hi-C contact map of the ilAriAges1.1 and ilAriAges2.1 assemblies, visualised using HiGlass.Chromosomes are shown in order of size from left to right and top to bottom.Interactive versions of these figures may be viewed for ilAriAges1.1 and ilAriAges2.1.