The genome sequence of the Buff-tailed Bumblebee, Bombus terrestris (Linnaeus, 1758)

We present a genome assembly from an individual female Bombus terrestris (the Buff-tailed Bumblebee; Arthropoda; Insecta; Hymenoptera; Apidae). The genome sequence is 393.0 megabases in span. Most of the assembly is scaffolded into 18 chromosomal pseudomolecules. The mitochondrial genome has also been assembled and is 24.7 kilobases in length. Gene annotation of this assembly on Ensembl identified 14,435 protein coding genes.


Background
The Buff-tailed Bumblebee, Bombus terrestris, is one of the seven most common and widespread species of bumblebees in the UK.It is generally common across its range that includes Europe, north Africa and the Near East (Widmer et al., 1998).It has become an established non-native species in Tasmania, Argentina, Japan and Chile (Inoue et al., 2008;Rendoll-Carcamo et al., 2017;Semmens et al., 1993;Torretta et al., 2006).It is a eusocial species with reproductive queens and males, and non-reproductive workers.It is a large species with queens among the largest bumblebees in the UK reaching up to 22 mm in length.Workers exhibit a large degree of alloethism (Goulson et al., 2002).It is covered in black hairs, with bands of yellow hairs across the pronotum and second tergite, and white-to-buff hairs on the apical abdominal segments.It is part of the Bombus lucorum species aggregate, with workers being largely indistinguishable from those of B. lucorum, B. magnus and B. cryptarum.Queens and males can be distinguished from others in this group be their large size, deeper golden colour of the yellow hairs and buff, rather than white, colouration of the hairs on apical segments of the abdomen (Edwards & Jenner, 2005).Larger workers occasionally have a narrow band of brownish hairs at the base of the white haired apical abdominal segments.
Bombus terrestris is broadly polylectic, visiting a wide range of flowers for pollen and nectar.It is not uncommon for this species to 'nectar rob' from flower species with deeper corollae, whereby a slit is cut in the base of the corolla to access the nectaries without entering the main flower.Nests are constructed underground, typically in old small-mammal burrows.Colonies are large and may reach up to around 500 workers at their peak.In UK agricultural landscapes, nest density is estimated at 29 nests per km 2 , and minimum estimated maximum foraging range of 758 m (Knight et al., 2005).Typically, this species is univoltine with overwintered queens emerging from February to April, the first workers from March and males from July to October.In recent years, however, there have been increasing instances of winter active colonies of B. terrestris across southern UK cities.This may be in some part due to increasing winter temperatures and urban heat island effects, as well as the availability of winter blooming non-native flowers, such as Mahonia, in gardens.
In the UK wild populations are comprised of the subspecies Bombus terrestris audax (Rasmont et al., 2008).Due to its ability to be reared in captivity, this species has become a laboratory model species for bumblebees, and bees more generally.In particular, it has been extensively used in studies investigating the impacts of pesticides on pollinators (e.g.Gill & Raine, 2014) and insect pathology (e.g.Genersch et al., 2006).Furthermore, it is commercially produced for crop pollination, especially in greenhouse systems where it is an effective pollinator of crops such as tomato (Morandin et al., 2001).The subspecies Bombus terrestris dalmitinus is imported to the UK for commercial pollination, leading to some concern regarding the conservation of the native subspecies following escape or release of imported non-natives (Ings et al., 2006).
The genome for this important laboratory model species will have myriad applications for investigations into areas such as the evolution of eusociality, conservation of important pollinator species, reproductive evolution and foraging behaviour.We note that a draft genome sequence for B. terrestris was generated previously (Sadd et al., 2015).Here we present a chromosomally complete genome sequence for B. terrestris, based on one female specimen from Wytham Woods, Oxfordshire, UK.

Genome sequence report
The genome was sequenced from one female Bombus terrestris specimen collected from Wytham Woods, Oxfordshire, UK (latitude 51.77, longitude -1.34).A total of 57-fold coverage in Pacific Biosciences single-molecule HiFi long reads and 71-fold coverage in 10X Genomics read clouds were generated.Primary assembly contigs were scaffolded with chromosome conformation Hi-C data.Manual assembly curation corrected 33 missing joins or mis-joins and removed two haplotypic duplications, reducing the assembly length by 2.03% and the scaffold number by 4.96%, and increasing the scaffold N50 by 13.14%.
The final assembly has a total length of 393.0 Mb in 249 sequence scaffolds with a scaffold N50 of 14.6 Mb (Table 1).Most (73.99%) of the assembly sequence was assigned to 18 chromosomal-level scaffolds.Chromosome-scale scaffolds are named by synteny based on the B. terrestris genome assembly GCA_000214255.1.Many unplaced scaffolds are composed entirely of a 10mer repeat which we note occurs in other Bombus assemblies (Figure 1-Figure 4; Table 2).While not fully phased, the assembly deposited is of one haplotype.Contigs corresponding to the second haplotype have also been deposited.The estimated Quality Value (QV) of the final assembly is 50.1 with k-mer completeness of 99.96%, and the assembly has a BUSCO v5.3.2 (Manni et al., 2021) completeness of 97.5% (single 97.2%, duplicated 0.3%) using the hymenoptera_odb10 reference set (n = 5,991).

Sample acquisition and nucleic acid extraction
A female Bombus terrestris (iyBomTerr1) was collected from Wytham Woods, Oxfordshire (biological vice-county Berkshire), UK (latitude 51.77, longitude -1.34) on 7 August 2019.The specimen was taken from woodland habitat by Liam Crowley (University of Oxford) by netting.The specimen was identified by the collector and snap-frozen on dry ice.
A second female B. terrestris specimen (iyBomTerr2) was used for RNA sequencing.The iyBomTerr2 specimen was collected by Olga Sivell (Natural History Museum) from  into an average fragment size of 12-20 kb in a Megaruptor 3 system with speed setting 30.Sheared DNA was purified by solid-phase reversible immobilisation using AMPure PB beads with a 1.8X ratio of beads to sample to remove the shorter fragments and concentrate the DNA sample.The concentration of the sheared and purified DNA was assessed using a Nanodrop spectrophotometer and Qubit Fluorometer and Qubit dsDNA High Sensitivity Assay kit.Fragment size distribution was evaluated by running the sample on the FemtoPulse system.
RNA was extracted from thorax tissue of iyBomTerr2 in the Tree of Life Laboratory at the WSI using TRIzol, according to the manufacturer's instructions.RNA was then eluted in 50 μl RNAse-free water and its concentration assessed using  a Nanodrop spectrophotometer and Qubit Fluorometer using the Qubit RNA Broad-Range (BR) Assay kit.Analysis of the integrity of the RNA was done using Agilent RNA 6000 Pico Kit and Eukaryotic Total RNA assay.

Sequencing
Pacific Biosciences HiFi circular consensus and 10X Genomics read cloud DNA sequencing libraries were constructed according to the manufacturers' instructions.Poly(A) RNA-Seq libraries were constructed using the NEB Ultra II RNA Library Prep kit.DNA: and RNA sequencing were performed by the Scientific Operations core at the WSI on Pacific Biosciences SEQUEL II (HiFi), Illumina HiSeq 4000 (RNA-Seq) and HiSeq X Ten (10X) instruments.Hi-C data were also generated from tissue of iyBomTerr1 using the Arima v2 kit and sequenced on the HiSeq X Ten instrument.et al., 2020).The genome was analysed and BUSCO scores (Manni et al., 2021;Simão et al., 2015) were generated within the BlobToolKit environment (Challis et al., 2020).Table 3 contains a list of software tool versions and sources.

Genome annotation
The Ensembl gene annotation system (Aken et al., 2016) was used to generate annotation for the Bombus terrestris assembly (GCA_910591885.2).Annotation was created primarily through alignment of transcriptomic data to the genome, with gap filling via protein-to-genome alignments of a select set of proteins from UniProt (UniProt Consortium, 2019).

Ethics and compliance issues
The materials that have contributed to this genome note have been supplied by a Darwin Both nuclear and mitochondrial genomes reported are larger than the previous assemblies for the species.Notably, the nuclear genome has had an increase of over 100Mb of genomic data.This seems like a good addition to the community.Interestingly though, about 100Mb of data was unplaced in chromossomes.So, this increase could be explained by the 10mer repeats of high GC content observed, which is likely composed of repetitive DNA and could be variable across sampled specimens.Maybe this should be mentioned more clearly in the manuscript body, or even better, a brief comparison between the previous assembly and the new one reported should be added so the community can better evaluate how this new assembly is an improvement from the old one and how it could impact future genomic analyses.
I also believe the report could be enhanced by including a results section about the mitochondrial genome assembly, as the total size obtained 24.7kb greatly deviates from the standard bee mitochondrial genome size (~17kb).Is this only due to duplications?Has it increased repetitive content?
Is the rationale for creating the dataset(s) clearly described?Yes Reviewer Expertise: pollinators conservation genetics I confirm that I have read this submission and believe that I have an appropriate level of expertise to confirm that it is of an acceptable scientific standard.

Figure 1 .
Figure 1.Genome assembly of Bombus terrestris, iyBomTerr1.2:metrics.The BlobToolKit Snailplot shows N50 metrics and BUSCO gene completeness.The main plot is divided into 1,000 size-ordered bins around the circumference with each bin representing 0.1% of the 392,986,749 bp assembly.The distribution of scaffold lengths is shown in dark grey with the plot radius scaled to the longest scaffold present in the assembly (25,524,254 bp, shown in red).Orange and pale-orange arcs show the N50 and N90 scaffold lengths (14,593,050 and 1,035,857 bp), respectively.The pale grey spiral shows the cumulative scaffold count on a log scale with white scale lines showing successive orders of magnitude.The blue and pale-blue area around the outside of the plot shows the distribution of GC, AT and N percentages in the same bins as the inner plot.A summary of complete, fragmented, duplicated and missing BUSCO genes in the hymenoptera_odb10 set is shown in the top right.An interactive version of this figure is available at https://blobtoolkit.genomehubs.org/view/iyBomTerr1.2/dataset/CAJUYY02/snail.

Figure 2 .
Figure 2. Genome assembly of Bombus terrestris, iyBomTerr1.2:GC coverage.BlobToolKit GC-coverage plot.Scaffolds are coloured by phylum.Circles are sized in proportion to scaffold length.Histograms show the distribution of scaffold length sum along each axis.An interactive version of this figure is available at https://blobtoolkit.genomehubs.org/view/iyBomTerr1.2/dataset/CAJUYY02/blob.

Figure 3 .
Figure 3. Genome assembly of Bombus terrestris, iyBomTerr1.2:cumulative sequence.BlobToolKit cumulative sequence plot.The grey line shows cumulative length for all scaffolds.Coloured lines show cumulative lengths of scaffolds assigned to each phylum using the buscogenes taxrule.An interactive version of this figure is available at https://blobtoolkit.genomehubs.org/view/iyBomTerr1.2/dataset/ CAJUYY02/cumulative.

Figure 4 .
Figure 4. Genome assembly of Bombus terrestris, iyBomTerr1.2:Hi-C contact map.Hi-C contact map of the iyBomTerr1.2assembly, visualised using HiGlass.Chromosomes are shown in order of size from left to right and top to bottom.An interactive version of this figure may be viewed at https://genome-note-higlass.tol.sanger.ac.uk/l/?d=ZQM9C3stRCyoai0oc0YK3w.
Are the protocols appropriate and is the work technically sound?YesAre sufficient details of methods and materials provided to allow replication by others?YesAre the datasets clearly presented in a useable and accessible format?YesCompeting Interests: No competing interests were disclosed.Reviewer Expertise: Bee genomics I confirm that I have read this submission and believe that I have an appropriate level of expertise to confirm that it is of an acceptable scientific standard.NoAre the protocols appropriate and is the work technically sound?YesAre sufficient details of methods and materials provided to allow replication by others?Yes Are the datasets clearly presented in a useable and accessible format?Yes Competing Interests: No competing interests were disclosed.Reviewer Expertise: bee health and genomics I confirm that I have read this submission and believe that I have an appropriate level of expertise to confirm that it is of an acceptable scientific standard, however I have significant reservations, as outlined above.Reviewer Report 23 December 2023 https://doi.org/10.21956/wellcomeopenres.21331.r70837Are the protocols appropriate and is the work technically sound?Yes Are sufficient details of methods and materials provided to allow replication by others?Yes Are the datasets clearly presented in a useable and accessible format?Yes Competing Interests: No competing interests were disclosed.

INSDC accession Chromosome Size (Mb) GC%
(Guan et al., 2020)d out withHifiasm (Cheng et al., 2021)and haplotypic duplication was identified and removed with purge_dups(Guan et al., 2020).One round of polishing was performed by aligning 10X Genomics read data to the assembly with Long Ranger ALIGN, calling variants with FreeBayes (Allio et al., 2020)012).The assembly was then scaffolded with Hi-C data(Rao et al., 2014) using SALSA2 (Ghurye  et al., 2019).The assembly was checked for contamination and corrected using the gEVAL system(Chow et al., 2016)as described previously(Howe et al., 2021).Manual curation was performed using gEVAL, HiGlass(Kerpedjiev et al.,  2018)and Pretext (Harry, 2022).The mitochondrial genome was assembled using MitoHiFi (Uliano-Silva et al., 2022), which performed annotation using MitoFinder(Allio et al., 2020).To evaluate the assembly, MerquryFK was used to estimate consensus quality (QV) scores and k-mer completeness (Rhie Tree of Life Partner.The submission of materials by a Darwin Tree of Life Partner is subject to the Darwin Tree of Life Project Sampling Code of Practice.By agreeing with and signing up to the Sampling Code of Practice, the Darwin Tree of Life Partner agrees they will meet the legal and ethical requirements and standards set out within this document in respect of all samples acquired for, and supplied to, the Darwin Tree of Life Project.All efforts Members of the Natural History Museum Genome Acquisition Lab are listed here: https://doi.org/10.5281/zenodo.4790042.Members of the Darwin Tree of Life Barcoding collective are listed here: https://doi.org/10.5281/zenodo.4893703.Members of Wellcome Sanger Institute Scientific Operations: DNA Pipelines collective are listed here: https://doi.org/10.5281/zenodo.4790455.Members of the Tree of Life Core Informatics collective are listed here: https://doi.org/10.5281/zenodo.5013541.Members of the Darwin Tree of Life Consortium are listed here: https://doi.org/10.5281/zenodo.4783558.