The genome sequence of the high brown fritillary, Fabriciana adippe (Dennis & Schiffermüller, 1775)

We present a genome assembly from an individual female Fabriciana adippe (the high brown fritillary; Arthropoda; Insecta; Lepidoptera; Nymphalidae). The genome sequence is 485 megabases in span. Most of the assembly (99.98%) is scaffolded into 29 chromosomal pseudomolecules with the Z sex chromosome assembled. The complete mitochondrial genome was also assembled and is 15.1 kilobases in length. Gene annotation of this assembly in Ensembl identified 13,536 protein coding genes.


Background
The high brown fritillary, Fabriciana adippe (Dennis and Schiffermüller, 1775), is a widespread and common species in Eurasia, but is one of Britain's most endangered butterflies (Fox et al., 2015). It is found in open habitats such as forest clearings, which are often covered in bracken. F. adippe is a univoltine summer species that can be seen on the wing from May to the beginning of September. In the UK, larval host plants are violets (Viola sp.), for example the common dog-violet (Viola riviniana). The species was placed in the genus Argynnis in the past, but recent molecular work suggests that F. adippe and allies are more closely related to the genus Speyeria than to Argynnis, hence its placement in the genus Fabriciana (de Moya et al., 2017).
While F. adippe is considered a species of Least Concern according to the IUCN Red List for Europe (van Swaay et al., 2010), it is listed as endangered on the UK Red List (Fox et al., 2022), with only 37 populations remaining in the UK (Ellis et al., 2019). Decreased coppicing and local conservation efforts to implement changes in forest management have enjoyed some success in protecting this species (Ellis et al., 2019). Today, F. adippe is restricted to a few localities in western England and Wales. It is a large, fast flying species, and can be confused with several other similar sized species, Figure 1. Forewings and hindwings of the female F. adippe specimen from which the genome was sequenced. Dorsal (left) and ventral (right) surface view of wings from specimen RO_FA_930 (ilFabAdip1) from Lupşa, Alba, Romania, used to generate Pacific Biosciences, 10X genomics and Hi-C data.
Fabriciana adippe has 29 chromosome pairs and a ZO sexdetermination system (Federley, 1938;Lorković, 1941). The genome sequencing of the high brown fritillary may help in conservation efforts and in understanding its population structure, which is notably marked. In fact, Iberian populations are strongly differentiated from other populations in both nuclear and mitochondrial genomes, and admixed populations seem to exist at the contact zone (Dapporto et al., 2022;Polic et al., 2022).

Genome sequence report
The genome was sequenced from a single female F. adippe collected from Lupşa, Apuseni Mountains, Alba, Romania ( Figure 1). A total of 49-fold coverage in Pacific Biosciences single-molecule HiFi long reads and 79-fold coverage in 10X Genomics read clouds were generated. Primary assembly contigs were scaffolded with chromosome conformation Hi-C data. Manual assembly curation corrected 26 missing/ misjoins and removed six haplotypic duplications, reducing the assembly size by 1.90% and the scaffold number by 9.62%.
The final assembly has a total length of 485 Mb in 94 sequence scaffolds with a scaffold N50 of 16.7 Mb ( Table 1). Most of the assembly sequence (99.98%) was assigned to 29 chromosomal-level scaffolds, representing 28 autosomes (numbered by sequence length) and the Z sex chromosome (Figure 2- Figure 5; Table 2 using the lepidoptera_odb10 reference set (n = 5,286). While not fully phased, the assembly deposited is of one haplotype. Contigs corresponding to the second haplotype have also been deposited.

Genome annotation report
The F. adippe genome was annotated using the Ensembl rapid annotation pipeline (Table 1; https://rapid.ensembl.org/ Fabriciana_adippe_GCA_905404265.1/). The resulting annotation  includes 35,064 transcribed mRNAs from 13,536 proteincoding and 8,725 non-coding genes.  dissected on dry ice with thorax tissue set aside for Hi-C sequencing. Abdomen tissue was disrupted by manual grinding with a disposable pestle. Fragment size analysis of 0.01-0.5 ng of DNA was then performed using an Agilent FemtoPulse. High molecular weight (HMW) DNA was extracted using the Qiagen MagAttract HMW DNA extraction kit. Low molecular weight DNA was removed from a 200 ng aliquot of extracted DNA using 0.8X AMpure XP purification kit prior to 10X Chromium sequencing. A minimum of 50 ng DNA was submitted for 10X sequencing. HMW DNA was sheared into an average fragment size of 12-20 kb in a Megaruptor 3 system with speed setting 30. Sheared DNA  was purified by solid-phase reversible immobilisation using AMPure PB beads with a 1.8X ratio of beads to sample to remove the shorter fragments and concentrate the DNA. The concentration of the sheared and purified DNA was assessed using a Nanodrop spectrophotometer and Qubit Fluorometer and Qubit dsDNA High Sensitivity Assay kit. Fragment size distribution was evaluated by running the sample on the FemtoPulse system.

Sample acquisition and nucleic acid extraction
RNA was extracted from abdomen tissue of ilFabAdip2 in the Tree of Life Laboratory at the WSI using TRIzol, according to the manufacturer's instructions. RNA was then eluted in 50 μl RNAse-free water and its concentration RNA assessed using a Nanodrop spectrophotometer and Qubit Fluorometer using the Qubit RNA Broad-Range (BR) Assay kit. Analysis of the integrity of the RNA was done using Agilent RNA 6000 Pico Kit and Eukaryotic Total RNA assay.

Sequencing
Pacific Biosciences HiFi circular consensus and 10X Genomics Chromium read cloud sequencing libraries were constructed according to the manufacturers' instructions. Sequencing was performed by the Scientific Operations core at the Wellcome Sanger Institute on Pacific Biosciences SEQUEL II (HiFi), Illumina HiSeq 10X and Illumina HiSeq 4000 (RNA-Seq) instruments. Hi-C data were generated in the Tree of Life laboratory from thorax tissue of ilFabAdip1 using the Arima v2 kit and sequenced on a HiSeq 10X instrument.

Genome assembly
Assembly was carried out with Hifiasm (Cheng et al., 2021); haplotypic duplication was identified and removed with purge_dups (Guan et al., 2020). One round of polishing was performed by aligning 10X Genomics read data to the assembly with longranger align, calling variants with freebayes (Garrison & Marth, 2012). The assembly was then scaffolded with Hi-C data (Rao et al., 2014) using SALSA2 (Ghurye et al., 2019. The assembly was checked for contamination and corrected using the gEVAL system (Chow et al., 2016) as described previously (Howe et al., 2021). Manual curation (Howe et al., 2021) was performed using gEVAL, HiGlass (Kerpedjiev et al., 2018) and Pretext (Harry, 2022). The mitochondrial genome was assembled using MitoHiFi (Uliano-Silva et al., 2021), which performs annotation using MitoFinder (Allio et al., 2020). The genome was analysed and BUSCO scores generated within the BlobToolKit environment (Challis et al., 2020). Table 3 contains a list of all software tool versions used, where appropriate.

Genome annotation
The Ensembl gene annotation system (Aken et al., 2016) was used to generate annotation for the F. adippe assembly (GCA_905404265.1). Annotation was created primarily through alignment of transcriptomic data to the genome, with gap filling via protein-to-genome alignments of a select set of proteins from UniProt (UniProt Consortium, 2019).

Data availability
European  Open Peer Review expertise to confirm that it is of an acceptable scientific standard.
Reviewer Expertise: Genomics in plants and animals, including genome assembly.
I confirm that I have read this submission and believe that I have an appropriate level of expertise to confirm that it is of an acceptable scientific standard.