The genome sequence of the ichneumon wasp Buathra laborator (Thunberg, 1822)

We present a genome assembly from an individual Buathra laborator (Arthropoda; Insecta; Hymenoptera; Ichneumonidae). The genome sequence is 330 megabases in span. Over 60% of the assembly is scaffolded into 11 chromosomal pseudomolecules. The mitochondrial genome has also been assembled and is 35.8 kilobases in length.


Background
In Britain, Buathra laborator is a common and widespread Darwin wasp species. Although there is no organised recording of most ichneumonid wasps, data are available in museum collections (e.g., Schwarz & Shaw, 1998) to demonstrate its wide distribution and relatively long flight period, from April to August. This is one of our largest species of the ichneumonid subfamily Cryptinae, particularly conspicuous because it is found in rather open areas and often on flowers. Both sexes are basically black with red legs, females have a rather long ovipositor, males have bright white markings on the face and on the hind tarsus. They can be confused with some species of Cryptus (which are usually smaller) and with the very similar Buathra tarsoleucos (Schrank); separation from B. tarsoleucos depends on close examination of the female ovipositor tip or, in males, details of the propodeum (Schwarz, 1990). In Britain, B. tarsoleucos seems to be rare (Schwarz & Shaw, 1998).
Despite its conspicuousness, B. laborator has never been reared in Europe. There is an American rearing from the geometrid Western Carpet moth Melanolophia imitate (Walker) (Townes & Townes, 1962), which pupates in leaf litter. Available host records in the literature (summarised by Yu et al., 2016) seem to relate to misidentifications. Buathra tarsoleucos is a parasitoid of the pupae of sand wasps (Hymenoptera: Sphecidae) (Casiraghi et al., 2001). As with B. tarsoleucos and species of the closely related (Santos, 2017) genus Cryptus, the hosts of B. laborator will undoubtedly be cocooned pupae hidden in the soil; most Cryptinae are idiobiont ectoparasitoids, i.e., the host is permanently paralysed and the wasp larva feeds externally (summarised by (Broad et al., 2018)).
Outside of Britain, B. laborator has a very wide range across Europe, temperate Asia and North America (summarised by (Yu et al., 2016)). Several genes have previously been sequenced, placing B. laborator within a phylogenetic framework for Cryptinae (Santos, 2017) and Ichneumonidae (Quicke et al., 2009), but this is only the second complete genome for an ichneumonid, or Darwin wasp (see (Klopfstein et al., 2019)); both B. laborator and Ichneumon xanthorius Forster (Broad, 2022) are classified within the informal ichneumoniformes grouping of subfamilies (e.g., Quicke et al., 2009).
The genome of the wasp B. laborator was sequenced as part of the Darwin Tree of Life Project, a collaborative effort to sequence all named eukaryotic species in the Atlantic Archipelago of Britain and Ireland.

Genome sequence report
The genome was sequenced from a female B. laborator collected from Hartslock, UK (latitude 51.51, longitude -1.11) (Wellcome Sanger Institute, 2022). A total of 49-fold coverage in Pacific Biosciences single-molecule HiFi long reads and 119fold coverage in 10X Genomics read clouds were generated. Primary assembly contigs were scaffolded with chromosome conformation Hi-C data. Manual assembly curation corrected 23 missing/misjoins, reducing the scaffold number by 14.84% and increasing the scaffold N50 by 33.3%.
The final assembly has a total length of 330 Mb in 109 sequence scaffolds with a scaffold N50 of 17 Mb (Table 1). Most (68.92%) of the assembly sequence was assigned to 11 chromosomal-level scaffolds (Figure 2- Figure 5; Table 2). Chromosome-scale scaffolds confirmed by the Hi-C data are named in order of size. Several repeat types can be seen to cluster by Hi-C yet have no association with the defined chromosomes. These repeats make up a large proportion of the assembly ( Figure 5). While not fully phased, the assembly deposited is of one haplotype. Contigs corresponding to the second haplotype have also been deposited. The assembly has a BUSCO v5.3.2 (Manni et al., 2021) completeness of 95.9% (single 95.7%, duplicated 0.3%) using the hymenoptera_odb10 reference set.

Sample acquisition and nucleic acid extraction
An individual B. laborator (iyBuaLabo1) (Figure 1) was collected and identified by Matt Smith from Hartslock, UK (latitude 51.51, longitude -1.11). The sample was caught using an aerial net and preserved in liquid nitrogen.
DNA was extracted at the Tree of Life laboratory, Wellcome Sanger Institute. The iyBuaLabo1 sample was weighed and dissected on dry ice with head tissue set aside for Hi-C sequencing. The thorax tissue was disrupted using a Nippi Powermasher fitted with a BioMasher pestle. High molecular weight (HMW) DNA was extracted using the Qiagen MagAttract HMW DNA extraction kit. Low molecular weight DNA was removed from a 20 ng aliquot of extracted DNA using 0.8X AMpure XP purification kit prior to 10X Chromium sequencing; a minimum of 50 ng DNA was submitted for 10X sequencing. HMW DNA was sheared into an average fragment size of 12-20 kb in a Megaruptor 3 system with speed setting 30. Sheared DNA was purified by solid-phase reversible immobilisation using AMPure PB beads with a 1.8X ratio of beads to sample to remove the shorter fragments and concentrate the DNA sample. The concentration of the sheared and purified DNA was assessed using a Nanodrop spectrophotometer and Qubit Fluorometer and Qubit dsDNA High Sensitivity Assay kit. Fragment size distribution was evaluated by running the sample on the FemtoPulse system.

Sequencing
Pacific Biosciences HiFi circular consensus and 10X Genomics read cloud DNA sequencing libraries were constructed according to the manufacturers' instructions. DNA sequencing was performed by the Scientific Operations core at the WSI on Pacific Biosciences SEQUEL II (HiFi) and Illumina NovaSeq 6000 (10X) instruments. Hi-C data were also generated from head tissue of iyBuaLabo1 using the Arimav2 kit and sequenced on the Illumina NovaSeq 6000 instrument.        The genome sequence is released openly for reuse. The Buathra laborator genome sequencing initiative is part of the Darwin Tree of Life (DToL) project. All raw sequence data and the assembly have been deposited in INSDC databases. The genome will be annotated using available RNA-Seq data and presented through the Ensembl pipeline at the European Bioinformatics Institute. Raw data and assembly accession identifiers are reported in Table 1.

Author information
Members of the Natural History Museum Genome Acquisition Lab are listed here: https://doi.org/10.5281/zenodo.4790042. I confirm that I have read this submission and believe that I have an appropriate level of expertise to confirm that it is of an acceptable scientific standard.