The genome sequence of the Common Carder Bee, Bombus pascuorum (Scopoli, 1763)

We present a genome assembly from an individual female Bombus pascuorum (the Common Carder Bee; Arthropoda; Insecta; Hymenoptera; Apidae). The genome sequence is 307.5 megabases in span. Most of the assembly is scaffolded into 17 chromosomal pseudomolecules. The mitochondrial genome has also been assembled and is 21.9 kilobases in length. Gene annotation of this assembly on Ensembl identified 12,999 protein coding genes.


Background
The Common Carder Bee, Bombus pascuorum, is one of the seven most common and widespread species of bumblebees in the UK and the only common and widespread species of the four all-brown carder bees. It has appeared to have replaced other carder bee species in northern Britain across recent decades (Plowright et al., 1997;Plowright & Plowright, 2009). It has a Palearctic distribution, with a range that includes all but the most northerly areas of western Europe through to China. It is found across a wide range of habitats, including gardens, grassland and woodlands.
It visits a wide range of flowers for nectar, especially species with longer corollae such as Fabaceae and Scrophulariacae (Edwards & Jenner, 2005), being able to outcompete shortertongued bee species (Balfour et al., 2013). It is broadly polylectic, but shows a preference for pollen from Fabaceae, Scrophulariacae, Lamiaceae and red-flowered Asteraceae (Edwards & Jenner, 2005). It had been shown to be less consistent in floral choice than other species of bumblebee (Raine & Chittka, 2005). The presence of mass floral resources within the landscape may benefit this species by allowing increased colony sizes (Herrmann et al., 2007). In common with other species in the genus, the high degree of floral visitation undertaken by this species indicates its important role as a pollinator.
Bombus pascuorum is a eusocial species with reproductive queens and males, and non-reproductive workers. Queens, workers and males are all buff/brown all over, although it is highly variable in appearance with differing degrees of dark and pale hairs. It can be distinguished from other all-brown bumblebee species in the UK by the presence of at least some black hairs on the abdomen.
It has an exceptionally long flight period, with queens emerging from winter diapause from March, the first workers from April and reproductives from July. It is possibly bivoltine in the UK, often being the latest flying bumblebee species that does not remain active throughout the winter, with workers present into October.
Nests are constructed above ground, typically in grass tussocks, under plant litter or at the base of shrubs and trees. It is one of four carder bee species, referring to the construction of a moss and dry grass covering to the nest. Fewer workers are produced than other bumblebee species, with nests peaking between 60 and 150 workers (Løken, 1973;von Hagen, 2003). In UK agricultural landscapes, nest density is estimated at 68 nests per km 2 , and minimum estimated maximum foraging range 450 m (Knight et al., 2005).
A complete genome sequence for this species will facilitate studies into the evolution of eusociality, conservation of important pollinator species, reproductive evolution and foraging behaviour.

Genome sequence report
The genome was sequenced from one female Bombus pascuorum specimen collected from Wytham Woods, Oxfordshire, UK (latitude 51.77, longitude -1.34). A total of 28-fold coverage in Pacific Biosciences single-molecule HiFi long reads and 92-fold coverage in 10X Genomics read clouds were generated. Primary assembly contigs were scaffolded with chromosome conformation Hi-C data. Manual assembly curation corrected 36 missing joins or mis-joins and removed three haplotypic duplications, reducing the assembly length by 0.98%% and the scaffold number by 21.36%, and increasing the scaffold N50 by 45.58%.
The final assembly has a total length of 307.5 Mb in 81 sequence scaffolds with a scaffold N50 of 17.6 Mb ( Table 1). Most (87.82%) of the assembly sequence was assigned to 17 chromosomal-level scaffolds. Chromosome-scale scaffolds confirmed by the Hi-C data are named in order of size (Figure 1- Figure 4; Table 2). While not fully phased, the assembly deposited is of one haplotype. Contigs corresponding to the second haplotype have also been deposited.

Genome annotation report
The B. pascuorum genome assembly GCA_905332965.1 was annotated using the Ensembl rapid annotation pipeline (Table  1; https://rapid.ensembl.org/Bombus_pascuorum_ GCA_905332965.1/Info/Index/). The resulting annotation includes 12,999 protein coding genes with an average length of 12,765.73 and an average coding length of 1,417.90, and 5,443 non-protein coding genes. There is an average of 6.13 exons and 5.13 introns per canonical protein coding transcript, with an average intron length of 1646.99. A total of 7659 gene loci have more than one associated transcript.

Sample acquisition and nucleic acid extraction
A female B. pascuorum specimen (iyBomPasc1) was collected from Wytham Woods, Oxfordshire (biological vice-county: Berkshire), UK (latitude 51.77, longitude -1.34) on 7 August 2019. The specimen was taken from woodland by Liam Crowley (University of Oxford) by netting. The specimen was identified by the collector and snap-frozen on dry ice. This specimen was used for genome sequencing and Hi-C scaffolding.
A second female B. pascuorum specimen (iyBomPasc2) was used for RNA sequencing. The iyBomPasc2 specimen was collected by Olga Sivell (Natural History Museum) from woodland edge in Luton, UK (latitude 51.88, longitude -0.37) on 5 May 2020. The specimen was identified by Duncan Sivell (Natural History Museum) and snap-frozen on dry ice.   HiSeq X Ten (10X) instruments. Hi-C data were also generated from tissue of iyBomPasc1 using the Arima v2 kit and sequenced on the HiSeq X Ten instrument.

Genome assembly, curation and evaluation
Assembly was carried out with Hifiasm (Cheng et al., 2021) and haplotypic duplication was identified and removed with  et al., 2020). One round of polishing was performed by aligning 10X Genomics read data to the assembly with Long Ranger ALIGN, calling variants with FreeBayes (Garrison & Marth, 2012). The assembly was then scaffolded with Hi-C data (Rao et al., 2014) using SALSA2 (Ghurye et al., 2019. The assembly was checked for contamination and corrected using the gEVAL system (Chow et al., 2016) as described previously (Howe et al., 2021). Manual curation was performed using gEVAL, HiGlass (Kerpedjiev et al., 2018) and Pretext (Harry, 2022). The mitochondrial genome was assembled using MitoHiFi (Uliano-Silva et al., 2022), which performed annotation using MitoFinder (Allio et al., 2020). To evaluate the assembly, MerquryFK was used to estimate consensus quality (QV) scores and k-mer completeness (Rhie et al., 2020). The genome was analysed and BUSCO scores (Manni et al., 2021;Simão et al., 2015) were calculated within the BlobToolKit environment (Challis et al., 2020). Table 3 contains a list of software tool versions and sources.

Genome annotation
The Ensembl gene annotation system (Aken et al., 2016) was used to generate annotation for the Bombus pascuorum