The genome sequence of the dotted bee-fly, Bombylius discolor (Mikan, 1796)

We present a genome assembly from an individual female Bombylius discolor (the dotted bee-fly; Arthropoda; Insecta; Diptera; Bombyliidae). The genome sequence is 280 megabases in span. Most of the assembly (99.93%) is scaffolded into six chromosomal pseudomolecules, with the X sex chromosome assembled. The mitochondrial genome has also been assembled and is 16.7 kilobases in length. Genome annotation identified 10,411 protein-coding genes.


Background
The dotted bee-fly, Bombylius discolor, is a charismatic fly of early spring with a mainly southern distribution in England and Wales, up into the Midlands (National Biodiversity Atlas (NBN) Atlas, no date), and it appears to be increasing its range. B. discolor resembles the more common B. major, but is darker, with spotted wings, and females have a distinctive line of fuzzy white spots down the mid-line of the abdomen (Stubbs & Drake, 2014). Excellent resources exist for the identification of B. discolor and other bombyliids, including Stubbs and Drake, 2014, Steven Falk's flickr pages and a photo ID guide associated with Bee-fly Watch, a recording initiative in Britain under the auspices of the Soldierflies and Allies Recording Scheme.
This species is widespread across southern and central Europe, and into central Asia. Mainly a species of open ground, B. discolor larvae are parasitoids of mining bees of the genus Andrena, particularly A. flavipes and A. cineraria (Ismay, 1999). Eggs are flicked backwards into the entrances of bee nest burrows, although they are not always accurate and frequently oviposit on non-target substrates (Boesi et al., 2009). Bee-flies are obligate flower visitors as the females require pollen to mature their eggs.
The genome of B. discolor was sequenced as part of the Darwin Tree of Life Project, a collaborative effort to sequence all named eukaryotic species in the Atlantic Archipelago of Britain and Ireland.

Genome sequence report
The genome was sequenced from an individual female B. discolor (Figure 1) collected from a garden in Tonbridge, Kent, UK. A total of 24-fold coverage in Pacific Biosciences single-molecule HiFi long reads was generated. Primary assembly contigs were scaffolded with chromosome conformation Hi-C data. Manual assembly curation corrected 47 missing/ misjoins and removed six haplotypic duplications, reducing the assembly length by 0.07% and the scaffold number by 50% and increasing the scaffold N50 by 31.1%.
The final assembly has a total length of 280 Mb in 17 sequence scaffolds with a scaffold N50 of 53.2 Mb (Table 1). Most (99.93%) of the assembly sequence was assigned to six chromosomal-level scaffolds, representing five autosomes and the X sex chromosome (Figure 2- Figure 5; Table 2). The assembly has a BUSCO 5.3.2 (Manni et al., 2021) completeness of 94.6% using the diptera_odb10 reference set.  While not fully phased, the assembly deposited is of one haplotype. Contigs corresponding to the second haplotype have also been deposited.

Methods
Sample acquisition and nucleic acid extraction A female B. discolor (idBomDisc1) (Figure 1) was collected using a hand net from a garden in Tonbridge, Kent, UK (latitude 51.186304, longitude 0.286534) by Gavin Broad (Natural History Museum), who also identified the species. The sample was preserved by freezing at -80°C.
DNA was extracted from tissue of idBomDisc1 at the Wellcome Sanger Institute (WSI) Scientific Operations core using the Qiagen MagAttract HMW DNA kit, according to the manufacturer's instructions. Head tissue was set aside for Hi-C sequencing.
DNA was extracted at the Tree of Life laboratory, Wellcome Sanger Institute. The idBomDisc1 sample was weighed and dissected on dry ice with tissue set aside for Hi-C sequencing. Tissue was disrupted using a Nippi Powermasher fitted with a BioMasher pestle. Fragment size analysis of 0.01-0.5 ng of DNA was then performed using an Agilent FemtoPulse. High molecular weight (HMW) DNA was extracted using the Qiagen MagAttract HMW DNA extraction kit. Low molecular weight DNA was removed from a 20 ng aliquot of extracted DNA using 0.8X AMpure XP purification kit. HMW DNA was sheared into an average fragment size of 12-20 kb in a Megaruptor 3 system with speed setting 30. Sheared DNA was purified by solid-phase reversible immobilisation using AMPure PB beads with a 1.8X ratio of beads to sample to remove the shorter fragments and concentrate the DNA sample. The concentration of the sheared and purified DNA was assessed using a Nanodrop spectrophotometer and Qubit Fluorometer    et al., 2021). Finally, the primary assembly was analysed and manually improved using gEVAL (Chow et al., 2016). Chromosome-scale scaffolds confirmed by the Hi-C data are named in order of size.
The genome was analysed and BUSCO scores were generated within the BlobToolKit environment (Challis et al., 2020). Table 3 contains a list of all software tool versions used, where appropriate.

Genome annotation
The Ensembl gene annotation system (Aken et al., 2016) was used to generate annotation for the B. discolor assembly (GCA_939192795.1). Annotation was created primarily through alignment of transcriptomic data to the genome, with gap filling via protein-to-genome alignments of a select set of proteins from UniProt (UniProt Consortium, 2019).

Data availability
European Nucleotide Archive: Bombylius discolor (dotted bee fly) Accession number PRJEB50790; https://identifiers. org/ena.embl/PRJEB50790. The genome sequence is released openly for reuse. The Bombylius discolor genome sequencing initiative is part of the Darwin Tree of Life (DToL) project. All raw sequence data and the assembly have been deposited in INSDC databases. Raw data and assembly accession identifiers are reported in Table 1.