The genome sequence of the plain dark bee, Stelis phaeoptera (Kirby, 1802)

We present a genome assembly from an individual female Stelis phaeoptera (the plain dark bee; Arthropoda; Insecta; Hymenoptera; Megachilidae). The genome sequence is 301 megabases in span. Most of the assembly is scaffolded into 17 chromosomal pseudomolecules. The mitochondrial genome has also been assembled and is 18.6 kilobases in length. Gene annotation of this assembly on Ensembl identified 9,850 protein coding genes.


Background
Stelis phaeoptera (plain dark bee) is a rare megachilid bee, found across the western Palaearctic from Spain and North Africa, to Kazakhstan. In Britain, the species is found in scattered locations across Wales and the south of England. The species, which was never common, has declined in the last 30 years, particularly in the south of England (Else & Edwards, 2018). However, it has a stronghold in the Welsh and Shropshire Marches (Jones & Cheeseborough, 2012).
As its common name suggests, the appearance of this small bee is unremarkable, having a dark, strongly punctate, body (Falk & Lewington, 2015). It has one generation each year, with adults on the wing from late May to mid-August, and occasionally into September (Else & Edwards, 2018). Bees in the genus Stelis are cleptoparasites and lay their eggs in the nests of other species of megachilid bee. They are most often seen around garden 'bee hotels', as these are favoured by their host species. In the UK, confirmed hosts are the mason bees, Osmia leaiana (Jones & Cheeseborough, 2012), and Osmia bicornis (Boyes, 2021;Boyes, 2022). In Britain, O. leaiana appears to be the usual host, as its flight period coincides with S. phaeoptera. The flight period of O. bicornis is earlier, only overlapping by about a week, leaving fewer opportunities for S. phaeoptera to use this species. Interestingly, O. leiana sometimes exhibits a two-year life cycle (Boyes, 2020), which may be a strategy to reduce parasitism as the Stelis hatch after one year, leaving fewer hosts available.
The genome of S. phaeoptera was sequenced as part of the Darwin Tree of Life Project, a collaborative effort to sequence all named eukaryotic species in the Atlantic Archipelago of Britain and Ireland. Here we present a chromosomally complete genome sequence for S. phaeoptera, based on one female specimen from a garden in Powys, UK.

Genome sequence report
The genome was sequenced from one female Stelis phaeoptera (Figure 1) collected from a garden in Middletown, Powys, UK (latitude 52.70, longitude -3.03). A total of 79-fold coverage in Pacific Biosciences single-molecule HiFi long reads was generated. Primary assembly contigs were scaffolded with chromosome conformation Hi-C data. Manual assembly curation corrected six missing joins or misjoins, reducing the scaffold number by 2,15% and increasing the scaffold N50 by 2.25%.
The final assembly has a total length of 301.0 Mb in 182 sequence scaffolds, with a scaffold N50 of 14.1 Mb (Table 1). Most (81.34%) of the assembly sequence was assigned to 17 chromosomal-level scaffolds. Chromosome-scale scaffolds confirmed by the Hi-C data are named in order of size   Table 2). A large number of heterochromatic contigs remain unlocalised ( Figure 5). The assembly has a BUSCO v5.3.2 (Manni et al., 2021) completeness of 97.2% (single 97.1%, duplicated 0.1%) using the OrthoDB v10 Hymenoptera reference set (n = 5,991). While not fully phased, the assembly deposited is of one haplotype. Contigs corresponding to the second haplotype have also been deposited.

Methods
Sample acquisition and nucleic acid extraction A female Stelis phaeoptera (iyStePhae1) was collected from a garden in Middletown, Powys, UK (latitude 52.70, longitude -3.03) by potting, on 4 July 2021. The specimen was collected and identified by Clare Boyes. The specimen was snap-frozen on dry ice.
DNA was extracted at the Tree of Life laboratory, Wellcome Sanger Institute (WSI). The iyStePhae1 sample was weighed and dissected on dry ice with head tissue set aside for Hi-C sequencing. Abdomen tissue was disrupted using a Nippi Powermasher fitted with a BioMasher pestle. High molecular weight (HMW) DNA was extracted using the Qiagen MagAttract HMW DNA extraction kit. Low molecular weight DNA was removed from a 20 ng aliquot of extracted DNA using 0.8X AMpure XP purification kit prior to 10X Chromium sequencing; a minimum of 50 ng DNA was submitted for 10X sequencing. HMW DNA was sheared into an average fragment size of 12-20 kb in a Megaruptor 3 system with speed setting 30. Sheared DNA was purified by solid-phase reversible  immobilisation using AMPure PB beads with a 1.8X ratio of beads to sample to remove the shorter fragments and concentrate the DNA sample. The concentration of the sheared and purified DNA was assessed using a Nanodrop spectrophotometer and Qubit Fluorometer and Qubit dsDNA High Sensitivity Assay kit. Fragment size distribution was evaluated by running the sample on the FemtoPulse system.
RNA was extracted from thorax tissue of iyStePhae1 in the Tree of Life Laboratory at the WSI using TRIzol, according to the manufacturer's instructions. RNA was then eluted in 50 μl RNAse-free water and its concentration was assessed using a Nanodrop spectrophotometer and Qubit Fluorometer using the Qubit RNA Broad-Range (BR) Assay kit. Analysis of the integrity of the RNA was done using Agilent RNA 6000 Pico Kit and Eukaryotic Total RNA assay.

Sequencing
Pacific Biosciences HiFi circular consensus and 10X Genomics read cloud DNA sequencing libraries were Poly(A) RNA-Seq libraries were constructed using the NEB Ultra II RNA Library Prep kit. DNA and RNA sequencing were performed by the Scientific Operations core at the WSI on Pacific Biosciences SEQUEL II (HiFi) and Illumina NovaSeq 6000 (RNA-Seq) instruments. Hi-C data were also generated from head tissue of iyStePhae1 using the Arima v2 kit and sequenced on the NovaSeq 6000 instrument.

Genome assembly
Assembly was carried out with Hifiasm (Cheng et al., 2021) and haplotypic duplication was identified and removed with purge_dups (Guan et al., 2020). The assembly was scaffolded with Hi-C data (Rao et al., 2014) using YaHS (Zhou et al., 2022. The assembly was checked for contamination as described previously (Howe et al., 2021). Manual curation was performed using HiGlass (Kerpedjiev et al., 2018) and Pretext (Harry, 2022). The mitochondrial genome was assembled using MitoHiFi (Uliano-Silva et al., 2022), which performed annotation using MitoFinder (Allio et al., 2020). The genome was analysed and BUSCO scores were generated within the BlobToolKit environment (Challis et al., 2020). Table 3 contains a list of all software tool versions used, where appropriate.

Genome annotation
The Ensembl gene annotation system (Aken et al., 2016) was used to generate annotation for the Stelis phaeoptera assembly (GCA_943735885.1). Annotation was created primarily through    The authors report the sequencing of a new bee genome. All looks fine and in line with high standards.
Only few minor comments: "Manual assembly curation corrected six missing joins or misjoins, reducing the scaffold number" -how was this done? No explanation of manual criteria or references are provided. ○ "2,15%" -unusual to have commas.
○ "named in order of size". Decreasing? ○ Table 1 "Raw data"-indicates NCBI project numbers. It would be great to see how much sequencing effort there is for each in this table.
○ Fig 2: the unplaced scaffolds seem to have 10% higher GC. Is this usual? Are they potentially spread throughout the genome? Or represent one or few other chromosomes? Or contamination? Some interpretation would be welcome.

Are the protocols appropriate and is the work technically sound? Yes
Are sufficient details of methods and materials provided to allow replication by others?