The genome sequence of the small wasp-sawfly, Tenthredo distinguenda (R. Stein, 1885)

We present a genome assembly from an individual male Tenthredo distinguenda (the small wasp-sawfly; Arthropoda; Insecta; Hymenoptera; Tenthredinidae). The genome sequence is 229.4 megabases in span. Most of the assembly is scaffolded into 9 chromosomal pseudomolecules. The mitochondrial genome has also been assembled and is 31.6 kilobases in length. Gene annotation of this assembly on Ensembl identified 11,332 protein coding genes.


Background
The Tenthredo genus is large, with over one thousand species distributed across the Holarctic region and Oriental regions.There are 30 species present in Britain.Within this genus, numerous species groups and subspecies have been identified.Tenthredo distinguenda (R. Stein, 1885) falls within the subgenus Zonuledo, together in Britain with Tenthredo amoena (Gravenhorst, 1807).
The subspecies Tenthredo distinguenda distinguenda (R. Stein, 1885) is distributed across Europe and is a comparatively small (8.5 to 9.5 mm.), black Tenthredo, marked with yellow on the head, antennae, pronotum, tegulae, abdomen and legs.The subspecies Tenthredo distinguenda hyrcana (Benson, 1968) occurs in Eastern Europe, Turkey and Iran.However, the status of hyrcana is still in discussion.In Britain, adults can be distinguished from the similar T. amoena by a combination of the densely punctured and dull mesepisternum, entirely yellow tegulae and antennal scapes marked with black on the outer face.Little is known about the ecology of the species, but many Tenthredo are predatory species that contribute to pest control.The larvae feed on Hypericum perforatum, Perforate St John's Wort, and as such are not considered to be a pest of agricultural or horticultural significance (Macek et al., 2020).The species is univoltine, with adults on the wing from May to July.
Although the species boundaries remain unclear, in Europe and North Africa there are currently eight named species in the subgenus.These species are morphologically very similar with high levels of intra-species character variability.Indeed, the genital structures of both males and females exhibit variability and are not considered reliable identification characteristics (Taeger, 1991).Phylogenetic classification of the Zonuledo subgenus based on morphology is problematic due to the lack of obvious synapomorphic features.In BOLD, COI barcoding produces four well-defined clusters, namely Tenthredo flavipennis (Brull, 1832), Tenthredo zonula (Klug, 1817), T. distinguenda and T. amoena.One specimen of T. distinguenda hyrcana from Armenia falls into a separate BIN ABU8418.A further specimen from Iran, which appears from images to be T. distinguenda, is close to T. zonula (in this case, contamination cannot be excluded).The remaining T. distinguenda specimens all fall within BIN ABU8417 (pers. comm. Taeger, 2023).In the southern parts of its range, T. distinguenda appears variable in colour and the current species Tenthredo lacourti (Taeger, 1991), Tenthredo kervillei (Konow, 1907) and Tenthredo berberensis (Lacourt, 1986) could be forms or subspecies of T. distinguenda (pers. comm. Taeger, 2023).
There are no previously barcoded specimens from Britain.Knowledge of sawfly evolution will benefit from the comparative analysis of genomes from closely and distantly related species.This male specimen from Wytham Woods, England matches the description of T. distinguenda distinguenda using the characteristics in Benson's key (Benson, 1952) and the publication of the complete gene sequence will help our understanding of the phylogeny of this group.

Genome sequence report
The genome was sequenced from one male Tenthredo distinguenda (Figure 1) collected from Wytham Woods, Oxfordshire, UK (51.76,.A total of 81-fold coverage in Pacific Biosciences single-molecule HiFi long reads was generated.Primary assembly contigs were scaffolded with chromosome conformation Hi-C data.Manual assembly curation corrected 14 missing joins or mis-joins, reducing the scaffold number by 8.65%, and increasing the scaffold N50 by 54.96%. The final assembly has a total length of 229.4 Mb in 95 sequence scaffolds with a scaffold N50 of 25.5 Mb (Table 1).Most (98.81%) of the assembly sequence was assigned to 9 chromosomal-level scaffolds.Chromosome-scale scaffolds confirmed by the Hi-C data are named in order of size (Figure 2-Figure 5; Table 2).The mitochondrial genome was also assembled and can be found as a contig within the multifasta file of the genome submission.
Metadata for specimens, spectral estimates, sequencing runs, contaminants and pre-curation assembly statistics can be found at https://links.tol.sanger.ac.uk/species/1385102.

Sample acquisition and nucleic acid extraction
Two Tenthredo distinguenda specimens were collected from Wytham Woods, Oxfordshire (biological vice-county Berkshire), UK (latitude 51.76, longitude -1.33) on 2021-05-31.The specimens were netted in a woodland habitat by Steven Falk (independent researcher), and identified by the same person.The specimens were preserved on dry ice.The specimen used for genome sequencing was specimen number Ox001514, iyTenDist1 (Figure 1), while the second specimen, Ox001520, iyTenDist2 was used for Hi-C scaffolding.
DNA was extracted at the Tree of Life laboratory, Wellcome Sanger Institute (WSI).The iyTenDist1 sample was weighed and dissected on dry ice.Whole organism tissue was disrupted using a Nippi Powermasher fitted with a BioMasher pestle.High molecular weight (HMW) DNA was extracted using the Qiagen MagAttract HMW DNA extraction kit.HMW DNA was sheared into an average fragment size of 12-20 kb in a Megaruptor 3 system with speed setting 30.Sheared DNA was purified by solid-phase reversible immobilisation using AMPure PB beads with a 1.8X ratio of beads to sample to    A Hi-C map for the final assembly was produced using bwa-mem2 (Vasimuddin et al., 2019) in the Cooler file format (Abdennur & Mirny, 2020).To assess the assembly metrics, the k-mer completeness and QV consensus quality values were calculated in Merqury (Rhie et al., 2020).This work was done using Nextflow (Di Tommaso et al., 2017) DSL2 pipelines "sanger-tol/readmapping" (Surana et al., 2023a) and "sanger-tol/genomenote" (Surana et al., 2023b).The genome was analysed within the BlobToolKit environment (Challis et al., 2020) andBUSCO scores (Manni et al., 2021;Simão et al., 2015) were calculated.
Table 3 contains a list of relevant software tool versions and sources.

Genome annotation
The Ensembl gene annotation system (Aken et al., 2016) was used to generate annotation for the Tenthredo distinguenda assembly (GCA_947538915.1).Annotation was created primarily through alignment of transcriptomic data to the genome, with gap filling via protein-to-genome alignments of a select set of proteins from UniProt (UniProt Consortium, 2019).

Wellcome Sanger Institute -Legal and Governance
The materials that have contributed to this genome note have been supplied by a Darwin Tree of Life Partner.The submission of materials by a Darwin Tree of Life Partner is subject to the 'Darwin Tree of Life Project Sampling Code of Practice', which can be found in full on the Darwin Tree of Life website here.By agreeing with and signing up to the Sampling Code of Practice, the Darwin Tree of Life Partner agrees they will meet the legal and ethical requirements and standards set out within this document in respect of all samples acquired for, and supplied to, the Darwin Tree of Life Project.
Further, the Wellcome Sanger Institute employs a process whereby due diligence is carried out proportionate to the nature of the materials themselves, and the circumstances under which they have been/are to be collected and provided for use.The purpose of this is to address and mitigate any potential legal and/or ethical implications of receipt and use of the materials as part of the research project, and to ensure that in doing so we align with best practice wherever possible.The overarching areas of consideration are: • Ethical review of provenance and sourcing of the material • Legality of collection, transfer and use (national and international) Each transfer of samples is further undertaken according to a Research Collaboration Agreement or Material Transfer

Software tool Version Source
Agreement entered into by the Darwin Tree of Life Partner, Genome Research Limited (operating as the Wellcome Sanger Institute), and in some circumstances other Darwin Tree of Life collaborators.
The Introduction has some focus on comparing the target species to T. amoena, probably because it is the closest relative of T. distinguenda in UK.However, I believe the closest relative is T. zonula.
Fortunately, this species is also considered and the barcode differences of the subgenus are compared.
The produced genome shows metrics that to my (slightly limited) understanding suggest that the new genome is of high-quality.For example, BUSCO gene completeness is of high level.I cannot assess all technical details of this work, yet I am confident that that the genome is overall of very high-quality, for which reason I warmly recommend this genome work.

Is the rationale for creating the dataset(s) clearly described? Yes
Are the protocols appropriate and is the work technically sound?Yes

Are sufficient details of methods and materials provided to allow replication by others? Yes
Are the datasets clearly presented in a useable and accessible format?Yes Competing Interests: No competing interests were disclosed.
Reviewer Expertise: I am an insect taxonomist with species expertise especially on Lepidoptera and sawflies.I am predominantly using genetic and genomic methods in my research.While I have experience of various genomic tools, my experience in building full genomes is limited.
I confirm that I have read this submission and believe that I have an appropriate level of expertise to confirm that it is of an acceptable scientific standard.Nevertheless, I have a few comments regarding the methods and background.

BACKGROUND:
-It is mentioned that T. distinguenda is a predatory species that contributes to pest control, but not explained in more detail which pests would be controlled.
-The structure of the background section is rather confusing, jumping from subspecies details to related species.Although I appreciate the taxonomic details, the background would benefit from re-ordering the paragraphs.

METHODS:
-For better reproducibility, the specific settings of the software would be good to know, in addition to the versions (table 3).
-I assume that the males of this species are haploid, like it is the case for most Hymenoptera.I am wondering how much the purge_dups step after initial assembly improves the assembly.This could be easily checked by calculating (and providing) the usual assembly stats and completeness measures (QV, BUSCO).
-Please provide more details on the transcriptome data used for annotation (source of the data and accession numbers) and the protein set from UniProt.
Is the rationale for creating the dataset(s) clearly described?Yes

Are the protocols appropriate and is the work technically sound? Yes
Are sufficient details of methods and materials provided to allow replication by others?

Partly
Are the datasets clearly presented in a useable and accessible format?Yes Competing Interests: No competing interests were disclosed.
Reviewer Expertise: Evolutionary genomics, phylogenomics, Hymenoptera evolution I confirm that I have read this submission and believe that I have an appropriate level of expertise to confirm that it is of an acceptable scientific standard, however I have significant reservations, as outlined above.

Figure 2 .
Figure 2. Genome assembly of Tenthredo distinguenda, iyTenDist1.1:metrics.The BlobToolKit Snailplot shows N50 metrics and BUSCO gene completeness.The main plot is divided into 1,000 size-ordered bins around the circumference with each bin representing 0.1% of the 229,413,372 bp assembly.The distribution of scaffold lengths is shown in dark grey with the plot radius scaled to the longest scaffold present in the assembly (40,074,581 bp, shown in red).Orange and pale-orange arcs show the N50 and N90 scaffold lengths(25,528,205 and 18,198,451 bp), respectively.The pale grey spiral shows the cumulative scaffold count on a log scale with white scale lines showing successive orders of magnitude.The blue and pale-blue area around the outside of the plot shows the distribution of GC, AT and N percentages in the same bins as the inner plot.A summary of complete, fragmented, duplicated and missing BUSCO genes in the hymenoptera_odb10 set is shown in the top right.An interactive version of this figure is available at https://blobtoolkit.genomehubs.org/view/iyTenDist1.1/dataset/CANNYR01/snail.

Figure 5 .
Figure 5. Genome assembly of Tenthredo distinguenda, iyTenDist1.1:Hi-C contact map of the iyTenDist1.1 assembly, visualised using HiGlass.Chromosomes are shown in order of size from left to right and top to bottom.An interactive version of this figure may be viewed at https://genome-note-higlass.tol.sanger.ac.uk/l/?d=GTiO0K6pQQubqIdEMilHsQ.

Reviewer Report 11
May 2024 https://doi.org/10.21956/wellcomeopenres.21626.r72612© 2024 Lopes D. This is an open access peer review report distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.Denilce Lopes Departamento de Biologia Geral, Universidade Federal de Vicosa, Viçosa, State of Minas Gerais, Brazil The authors present genomic sequencing, nuclear and mytocondrial data from the wasp-sawfly, Tenthredo distinguenda.The analysis methods are adequate, similar to those presented by other works within the Darwin Tree of Life Barcoding collective project.The indicators show high data quality.Is the rationale for creating the dataset(s) clearly described?Yes Are the protocols appropriate and is the work technically sound?Yes Are sufficient details of methods and materials provided to allow replication by others?Yes Are the datasets clearly presented in a useable and accessible format?Yes Competing Interests: No competing interests were disclosed.Reviewer Expertise: Cytogenetic and genomic I confirm that I have read this submission and believe that I have an appropriate level of expertise to confirm that it is of an acceptable scientific standard.Reviewer Report 05 April 2024 https://doi.org/10.21956/wellcomeopenres.21626.r76757© 2024 Wutke S. This is an open access peer review report distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.Saskia Wutke Department of Environmental and Biological Sciences, University or Eastern Finland, Joensuu, Finland This data note describes the genome of the sawfly Tenthredo distinguenda.The genome was assembled using PacBio HiFi data and scaffolded with Hi-C data.Annotation was performed based on transcriptome data.The methods are state-of-the-art and follow the usual DToL procedure.The final chromosome-level genome assembly is 229.4Mb long.With a scaffold N50 of 25.5 Mb and a total number of 95 scaffolds, the assembly is of good quality and provides a valuable addition to the growing list of sawfly genomes.