The genome sequence of Fabricius’ Nomad Bee, Nomada fabriciana (Linne, 1767)

We present a genome assembly from an individual female Nomada fabriciana (Fabricius’ Nomad Bee; Arthropoda; Insecta; Hymenoptera; Apidae). The genome sequence is 233.6 megabases in span. Most of the assembly is scaffolded into 12 chromosomal pseudomolecules. The mitochondrial genome has also been assembled and is 19.4 kilobases in length. Gene annotation of this assembly on Ensembl identified 9,700 protein coding genes.


Background
Fabricius' Nomad Bee, Nomada fabriciana, is a small nomad bee in the family Apidae.It occurs across the Palaearctic and is common and widespread across the south of the UK, becoming scarcer further north with few records in Scotland and Ireland.Nomada are relatively hairless bees, causing possible confusion with wasps for the inexperienced.Identification of Nomada can be challenging, although N. fabriciana is easily recognised by its small size, dark overall appearance, reddish gaster and the unique combination of a black labrum and bidentate mandibles.Furthermore, females have antennae with a red base and tip with a band of black segments in the middle.The typical form has yellow spots on tergite two, although dark forms occur which lack this along with almost entirely dark legs and antennae.
Nomada are brood parasites of other bees, mainly attacking Andrena although some species are kleptoparasites of other bee genera (Odanaka et al., 2022).Female Nomad bees can often be seen patrolling and investigating nesting sites of the host.They enter the host nest and lay their own eggs on the wall of unsealed cells.The larvae hatch and proceed to kill the host egg or larva using large, well-developed mandibles, before consuming the pollen provisions (Rozen Jr, 1991).The main host of N. fabriciana is Andrena bicolor, although other hosts are used including A. angustior, A. chrysosceles, A. flavipes and A. nigroaenea (Falk & Lewington, 2019).It is likely that the large size variation in the species is determined by the size of the host.
It is associated with a large range of habitats and can be found anywhere that its hosts occur.In the UK it is largely bivoltine with the first flight period from March to June, and the second from June to August (Else & Edwards, 2018).Adults visit a wide range of flowers for nectar, including dandelion (Taraxacum sp.), daisy (Bellis perennis), ragwort (Senecio jacobaea), field scabious (Knautia arvensis), speedwell (Veronica spp.), spurge (Euphorbia spp.), stitchwort (Stellaria spp.), strawberry (Fagaria vesca) and willow (Salix spp.) (Else & Edwards, 2018).
This complete genome represents one of the first for the genus, which has important and biological differences to other genera within the Apidae that includes ecologically and economically important species, such as the Honeybee (Apis mellifera) and bumblebees (Bombus spp.).As such, this genome will facilitate studies into research areas such as the evolution of kleptoparasitism, sociality and reproductive systems and well as providing useful data for resolving Hymenopteran taxonomy.

Genome sequence report
The genome was sequenced from one female Nomada fabriciana (Figure 1) collected from Wytham Woods, Oxfordshire, UK (51.77,.A total of 99-fold coverage in Pacific Biosciences single-molecule HiFi long reads and 154-fold coverage in 10X Genomics read clouds were generated.Primary assembly contigs were scaffolded with chromosome conformation Hi-C data.Manual assembly curation corrected 11 missingjoins or mis-joins, reducing the scaffold number by 3.5%, and increasing the scaffold N50 by 22.04%. The final assembly has a total length of 233.6 Mb in 193 sequence scaffolds with a scaffold N50 of 18.6 Mb (Table 1).A summary of the assembly statistics is shown in Figure 2, while the distribution of assembly scaffolds on GC proportion and coverage is shown in Figure 3.The cumulative assembly plot in Figure 4 shows curves for subsets of scaffolds assigned to different phyla.Most (95.24%) of the assembly sequence was assigned to 12 chromosomal-level scaffolds.Chromosomescale scaffolds confirmed by the Hi-C data are named in order of size (Figure 5; Table 2).The specimen is a diploid female.While not fully phased, the assembly deposited is of one haplotype.Contigs corresponding to the second haplotype have also been deposited.The mitochondrial genome was also assembled and can be found as a contig within the multifasta file of the genome submission.
Metadata for specimens, spectral estimates, sequencing runs, contaminants and pre-curation assembly statistics can be found at https://links.tol.sanger.ac.uk/species/601510.

Genome annotation report
The Nomada fabriciana genome assembly (GCA_907165295.1)was annotated using the Ensembl rapid annotation pipeline   Ten (10X) instruments.Hi-C data were also generated from remaining tissue of iyNomFabr1 using the Arima2 kit and sequenced on the HiSeq X Ten instrument.Table 3 contains a list of relevant software tool versions and sources.

Genome annotation
The Ensembl gene annotation system (Aken et al., 2016) was used to generate annotation for the Nomada fabriciana assembly (GCA_907165295.1).Annotation was created primarily through alignment of transcriptomic data to the genome, with gap filling via protein-to-genome alignments of a select set of proteins from UniProt (UniProt Consortium, 2019).

Wellcome Sanger Institute -Legal and Governance
The materials that have contributed to this genome note have been supplied by a Darwin Tree of Life Partner.The submission of materials by a Darwin Tree of Life Partner is subject to the 'Darwin Tree of Life Project Sampling Code of Practice', which can be found in full on the Darwin Tree of Life website here.By agreeing with and signing up to the Sampling Code of Practice, the Darwin Tree of Life Partner agrees they will meet the legal and ethical requirements and standards set out within this document in respect of all samples acquired for, and supplied to, the Darwin Tree of Life Project.Further, the Wellcome Sanger Institute employs a process whereby due diligence is carried out proportionate to the nature of the materials themselves, and the circumstances under which they have been/are to be collected and provided for use.The purpose of this is to address and mitigate any potential legal and/or ethical implications of receipt and use of the materials as part of the research project, and to ensure that in doing so we align with best practice wherever possible.The overarching areas of consideration are: • Ethical review of provenance and sourcing of the material All methods used are appropriate for invertebrate genome assembly and result in a high-quality and well-annotated genome that is clearly presented with interactive figures and tables throughout the report.Importantly, the methods for every step (sample acquisition, DNA extraction, sequencing, assembly, and annotation) are provided in detail, allowing replication of each step by other researchers if necessary.The software versions for all bioinformatic tools, along with links to protocols and pipelines for each assembly and annotation step are provided, improving reproducibility.This is something that I think these genome reports have fallen short on in the past, so it's good to see this addressed.
Finally, all datasets generated are presented in an easily accessible manner, allowing other researchers to utilise these genomic resources.It's great to see that the raw read data and precuration assembly statistics are now linked within these reports, which further improves transparency and reproducibility.
Is the rationale for creating the dataset(s) clearly described?Yes

Are the protocols appropriate and is the work technically sound? Yes
Are sufficient details of methods and materials provided to allow replication by others?

Yes
Are the datasets clearly presented in a useable and accessible format?Yes Competing Interests: No competing interests were disclosed.
Reviewer Expertise: insect evolution; population genetics; behavioural ecology; plant-insect interactions I confirm that I have read this submission and believe that I have an appropriate level of expertise to confirm that it is of an acceptable scientific standard.The manuscript also provides sufficient details of methods and materials, enabling replication by others in the scientific community.It delineates the procedures for sample collection, DNA extraction, library preparation, and sequencing, along with software tools and pipelines utilized for genome assembly, curation, and annotation.The inclusion of specific laboratory protocols, sequencing parameters, and software versions enhances the transparency and reproducibility of the research findings.
The datasets are presented in a usable and accessible format, facilitating further analysis and exploration by other researchers.The assembly statistics, genome annotation results, and quality metrics are clearly presented, providing comprehensive insights into the genomic characteristics of Nomada fabriciana.Additionally, he deposition of genomic data in public repositories ensures broad accessibility and promotes collaboration within the scientific community.The authors' thorough approach to experimental design, execution, and data dissemination contributes to the advancement of knowledge in bee biology and genomics.
Is the rationale for creating the dataset(s) clearly described?Yes Are the protocols appropriate and is the work technically sound?Yes

Are sufficient details of methods and materials provided to allow replication by others? Yes
Are the datasets clearly presented in a useable and accessible format?Yes Competing Interests: No competing interests were disclosed.
I confirm that I have read this submission and believe that I have an appropriate level of expertise to confirm that it is of an acceptable scientific standard.

Figure 2 .Figure 3 .
Figure 2. Genome assembly of Nomada fabriciana, iyNomFabr1.1:metrics.The BlobToolKit Snailplot shows N50 metrics and BUSCO gene completeness.The main plot is divided into 1,000 size-ordered bins around the circumference with each bin representing 0.1% of the 233,603,770 bp assembly.The distribution of scaffold lengths is shown in dark grey with the plot radius scaled to the longest scaffold present in the assembly (26,420,868 bp, shown in red).Orange and pale-orange arcs show the N50 and N90 scaffold lengths (18,567,632 and 10,996,985 bp), respectively.The pale grey spiral shows the cumulative scaffold count on a log scale with white scale lines showing successive orders of magnitude.The blue and pale-blue area around the outside of the plot shows the distribution of GC, AT and N percentages in the same bins as the inner plot.A summary of complete, fragmented, duplicated and missing BUSCO genes in the hymenoptera_odb10 set is shown in the top right.An interactive version of this figure is available at https://blobtoolkit.genomehubs.org/view/Nomada%20fabriciana/dataset/CAJRBH01/snail.
Assembly was carried out withHifiasm (Cheng et al., 2021)   and haplotypic duplication was identified and removed with purge_dups(Guan et al., 2020).One round of polishing was performed by aligning 10X Genomics read data to the assembly with Long Ranger ALIGN, calling variants with FreeBayes(Garrison & Marth, 2012).The assembly was then scaffolded with Hi-C data (Rao et al., 2014) using SALSA2(Ghurye et al., 2019).The assembly was checked for contamination and corrected using the gEVAL system(Chow et al., 2016)   as described previously(Howe et al., 2021).Manual curation was performed using gEVAL, HiGlass(Kerpedjiev et al., 2018) andPretext (Harry, 2022).The mitochondrial genome was assembled using MitoHiFi (Uliano-Silva et al., 2023), which runs MitoFinder(Allio et al., 2020) or MITOS(Bernt et al.,  2013)  and uses these annotations to select the final mitochondrial contig and to ensure the general quality of the sequence.A Hi-C map for the final assembly was produced using bwa-mem2(Vasimuddin et al., 2019)  in the Cooler file format

Figure 5 .
Figure 5. Genome assembly of Nomada fabriciana, iyNomFabr1.1:Hi-C contact map of the iyNomFabr1.1 assembly, visualised using HiGlass.Chromosomes are shown in order of size from left to right and top to bottom.An interactive version of this figure may be viewed at https://genome-note-higlass.tol.sanger.ac.uk/l/?d=Xd136OeBQwyuHs1mVJaVWg.

Reviewer
Report 14 March 2024 https://doi.org/10.21956/wellcomeopenres.22347.r75149© 2024 Palmieri L. This is an open access peer review report distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.Luciano Palmieri 1 Department of Integrative Biology, University of Wisconsin-Madison, Madison, Wisconsin, USA 2 Faculty of Agricultural, Free University of Bolzen, Bolzano State, Bolzano, 39100, ItalyThe manuscript presents a clear rationale for creating the datasets associated with Nomada fabriciana.It elucidates the significance of understanding the genomic composition of this species, highlighting its importance in studying evolutionary dynamics, social behavior, and taxonomy within the Apidae family.The authors underscore the biological and ecological relevance of Nomada fabriciana, particularly in relation to its role as a brood parasite and its interactions with host species.The methods described in the manuscript are appropriate for the objectives of the study.The authors clearly describe the sample acquisition process, DNA extraction, and sequencing, including the use of PacBio single-molecule HiFi long reads, 10X Genomics read clouds, and Hi-C data for assembly and scaffolding.The integration of multiple sequencing technologies and manual curation steps enhances the accuracy and completeness of the genomic dataset.