The genome sequence of a tachinid fly, Thelaira solivaga (Harris, 1780)

We present a genome assembly from an individual male Thelaira solivaga (a tachinid fly; Arthropoda; Insecta; Diptera; Tachinidae). The genome sequence is 429.3 megabases in span. Most of the assembly is scaffolded into 7 chromosomal pseudomolecules, including the X and Y sex chromosomes. The mitochondrial genome has also been assembled and is 21.09 kilobases in length.


Background
Thelaira solivaga (Diptera, Tachinidae) is a medium sized tachinid fly.The long-legged adults are most frequently encountered basking on sunlit leaves on low growing vegetation along the edges of woodlands or areas of scrub.Females are mostly dark with some pale dusting on the thorax and abdomen, males are much brighter in colour and usually have extensive orange markings on the sides of the upper segments of the abdomen.It is very similar in appearance to the closely related Thelaria nigrina (Fallén), and separation of the two species may require examination of a voucher specimen (Raper, 2012).
The larvae are internal parasites of various species of Tiger Moths (Lepidoptera: Erebidae).Eggs are laid externally directly onto the host, with multiple larvae developing within a single host.The caterpillars of many Tiger Moth species overwinter as hibernating larvae, and it appears likely that Thelaira solivaga larvae overwinter as early-stage larvae within the hibernating host.Recorded hosts include the Cream Spot Tiger Arctia villica (Belshaw, 1993) and the Ruby Tiger Phragmatobia fuliginosa and Garden Tiger Arctia caja (Tschorsnig & Herting, 1994).
Thelaira solivaga is recorded from across southern and central Britain north to the Tyneside region.There are no records from Ireland.The species is probably double brooded, with adults on the wing from late April or early May until mid-September.

Genome sequence report
The genome was sequenced from one male Thelaira solivaga (Figure 1) collected from Wytham Woods, Oxfordshire, UK (51.77, -1.33).A total of 46-fold coverage in Pacific Biosciences single-molecule HiFi long reads was generated.Primary assembly contigs were scaffolded with chromosome conformation Hi-C data.Manual assembly curation corrected 45 missing joins or mis-joins and removed one haplotypic duplication, reducing the scaffold number by 32.5%, and increasing the scaffold N50 by 0.48%.
The final assembly has a total length of 429.3 Mb in 26 sequence scaffolds with a scaffold N50 of 77.2 Mb (Table 1).Most (99.86%) of the assembly sequence was assigned to 7 chromosomal-level scaffolds, representing 6 autosomes and the X and Y sex chromosomes.Chromosome-scale scaffolds confirmed by the Hi-C data are named in order of size (Figure 2-Figure 5; Table 2).While not fully phased, the assembly deposited is of one haplotype.Contigs corresponding to the second haplotype have also been deposited.The mitochondrial genome was also assembled and can be found as a contig within the multifasta file of the genome submission.
Metadata for specimens, spectral estimates, sequencing runs, contaminants and pre-curation assembly statistics can be found at https://links.tol.sanger.ac.uk/species/1918187.

Sample acquisition and nucleic acid extraction
A male Thelaira solivaga (specimen ID Ox002161, individual idTheSoli1) was collected from Wytham Woods, Oxfordshire (biological vice-county Berkshire), UK (latitude 51.77, longitude -1.33) on 2022-05-19 by netting.The specimen was collected and identified by Steven Falk (University of Oxford) and was preserved on dry ice.
The sample was prepared for DNA extraction at the Tree of Life laboratory, Wellcome Sanger Institute (WSI).The idTheSoli1 sample was weighed and dissected on dry ice with tissue set aside for Hi-C sequencing.Head and thorax tissue was disrupted using a Nippi Powermasher fitted with a BioMasher pestle.DNA was extracted at the WSI Scientific Operations core using the Qiagen MagAttract HMW DNA kit, according to the manufacturer's instructions.

Sequencing
Pacific Biosciences HiFi circular consensus DNA sequencing libraries were constructed according to the manufacturers' instructions.DNA sequencing was performed by the Scientific Operations core at the WSI on a Pacific Biosciences  SEQUEL II (HiFi) instrument.Hi-C data were also generated from head and thorax tissue of idTheSoli1 using the Arima2 kit and sequenced on the Illumina NovaSeq 6000 instrument.

Genome assembly, curation and evaluation
Assembly was carried out with Hifiasm (Cheng et al., 2021) and haplotypic duplication was identified and removed with purge_dups (Guan et al., 2020).The assembly was scaffolded with Hi-C data (Rao et al., 2014) using YaHS (Zhou et al., 2023).The assembly was checked for contamination and corrected as described previously (Howe et al., 2021).Manual curation was performed using HiGlass (Kerpedjiev et al., 2018) and Pretext (Harry, 2022).The mitochondrial genome was assembled using MitoHiFi (Uliano- Silva et al., 2022), which runs MitoFinder (Allio et al., 2020) or MITOS (Bernt et al., 2013) and uses these annotations to select the final  agrees they will meet the legal and ethical requirements and standards set out within this document in respect of all samples acquired for, and supplied to, the Darwin Tree of Life Project.
Further, the Wellcome Sanger Institute employs a process whereby due diligence is carried out proportionate to the nature of the materials themselves, and the circumstances under which they have been/are to be collected and provided for use.
The purpose of this is to address and mitigate any potential legal and/or ethical implications of receipt and use of the materials as part of the research project, and to ensure that in doing so we align with best practice wherever possible.The overarching areas of consideration are: • Ethical review of provenance and sourcing of the material • Legality of collection, transfer and use (national and international)

Darren J Obbard
The University of Edinburgh, Edinburgh, Scotland, UK This data note reports the sequencing and assembly of the genome of Thelaira solivaga as part of the "Darwin Tree of Life" programme.In common with other data notes from this research effort, the reporting is standardized and quite brief.As such, I have very few comments to make.
The approach is state-of-the-art, the raw data appear to be of a suitably high quality, and the assembly methods are appropriate.The public availability of raw data and genome assembly are appropriate.The resulting genome is likely to be of very high quality, and I have no doubt that it will be of great value to any researchers working on tachinids and other diptera, or on the comparative or evolutionary genomics of insects more generally.
My minor suggestions are: (1) In addition to the sequenced specimen itself, it would be nice to have an 'in life' photograph, ideally of both sexes, given the described morphological differences.CC-BY images do seem to be available from Wikimedia Commons and iNaturalist (but may be misidentified Thelaria nigrina?).
(2) I think a reference (or two) is required for the statement "Thelaira solivaga is recorded from across southern and central Britain ….The species is probably double brooded, with adults on the wing from late April or early May until mid-September."Or, a "Pers Obs" to make clear that information is not otherwise available in the literature.
(3) I am surprised that no statements are made regarding the following: (a) Global distribution and abundance, (b) conservation status, (c) value or utility of the sequences provided.I think there maybe a boilerplate paragraph missing from the end of the background?It usually reads: "Here we present a chromosomally complete genome sequence for ….This project is a collaborative effort to sequence all named eukaryotic species in the Atlantic Archipelago of Britain and Ireland." (4) I notice that RNAseq data have also been generated from the abdomen of the same specimen (ERR11641108), and I therefore think this should be mentioned in the paper.

Is the rationale for creating the dataset(s) clearly described? No
Are the protocols appropriate and is the work technically sound?Yes Are sufficient details of methods and materials provided to allow replication by others?Yes

Are the datasets clearly presented in a useable and accessible format? Yes
Competing Interests: No competing interests were disclosed.
Reviewer Expertise: Evolutionary genetics and genomics of invertebrates (Drosophila) and their pathogens (viruses).
I confirm that I have read this submission and believe that I have an appropriate level of expertise to confirm that it is of an acceptable scientific standard.
distribution (perhaps also outside of Britain?) included.
2. No annotation has been performed on the genome.If due to difficulties in obtaining RNAseq data, I suggest either a preliminary annotation is performed using existing protein data of related species, or else the authors should include some reasoning as to why there is no annotation Minor comments: 1. Sufficient details are provided in the Methods, except there is no mention of the amount of Hi-C data generated (and its coverage) 2. For the mitochondrial assembly using MitoHiFi, it's not clear which algorithm was used -MitoFinder or MITOS?
Is the rationale for creating the dataset(s) clearly described?Partly Are the protocols appropriate and is the work technically sound?Yes

Are sufficient details of methods and materials provided to allow replication by others? Partly
Are the datasets clearly presented in a useable and accessible format?Yes Competing Interests: No competing interests were disclosed.
Reviewer Expertise: population genomics, transcriptomics, genome assembly and annotation of non-model species.
I confirm that I have read this submission and believe that I have an appropriate level of expertise to confirm that it is of an acceptable scientific standard, however I have significant reservations, as outlined above.

Ravikumar Dodiya
Sardarkrushinagar Dantiwada Agricultural University, Sardarkrushinagar, Dantiwada, India The study presents a comprehensive genome assembly of an individual male Thelaira solivaga, a tachinid fly belonging to the order Diptera, with a genome sequence spanning 429.3 megabases.The assembly is scaffolded into 7 chromosomal pseudomolecules, including the X and Y sex chromosomes, and the mitochondrial genome is also assembled to a length of 21.09 kilobases.Thelaira solivaga is described as a medium-sized tachinid fly, with distinct characteristics in males and females.The larvae of this species are internal parasites of various species of Tiger Moths (Lepidoptera: Erebidae), with eggs laid externally onto the host and multiple larvae developing within a single host.The species is found across southern and central Britain, possibly doublebrooded, with adults observed from late April or early May until mid-September.The genome was sequenced from a male specimen collected from Wytham Woods, Oxfordshire, UK, with sequencing data generated using Pacific Biosciences HiFi circular consensus DNA sequencing and Hi-C data.The assembly process involved scaffolding with Hi-C data and manual curation to correct assembly errors and remove haplotypic duplications.The final assembly comprises 26 sequence scaffolds, with 99.86% assigned to 7 chromosomal-level scaffolds, including autosomes and sex chromosomes.Quality assessment indicates a high-quality assembly with a QV of 63.8, 100% k-mer completeness, and 99.0%BUSCO completeness using the diptera_odb10 reference set.Software tools such as Hifiasm, purge_dups, YaHS, and Merqury were used for assembly and evaluation, and the genome was analyzed within the BlobToolKit environment.The study provides valuable insights into the genomics of Thelaira solivaga, which could contribute to further research on its biology, ecology, and interactions with host species.

In Background informstion:
Write super parasitism or multiple parasitism behavior seen in Thelaira solivaga instead of this Eggs are laid externally directly onto the host, with multiple larvae developing within a single host.
Provide the exact latitude and longitude of location w from where the sample collected.
WHY ONLY TAKEN MALE of Thelaira solivaga ? is there a lack of availability of sample?
Is the rationale for creating the dataset(s) clearly described?Yes Are the protocols appropriate and is the work technically sound?Yes

Are sufficient details of methods and materials provided to allow replication by others? Yes
Are the datasets clearly presented in a useable and accessible format?Yes Competing Interests: No competing interests were disclosed.
Reviewer Expertise: Biological control I confirm that I have read this submission and believe that I have an appropriate level of expertise to confirm that it is of an acceptable scientific standard, however I have significant reservations, as outlined above.

Figure 2 .
Figure 2. Genome assembly of Thelaira solivaga, idTheSoli1.1:metrics.The BlobToolKit Snailplot shows N50 metrics and BUSCO gene completeness.The main plot is divided into 1,000 size-ordered bins around the circumference with each bin representing 0.1% of the 429,359,894 bp assembly.The distribution of scaffold lengths is shown in dark grey with the plot radius scaled to the longest scaffold present in the assembly (92,503,287 bp, shown in red).Orange and pale-orange arcs show the N50 and N90 scaffold lengths (77,173,402 and 69,333,079 bp), respectively.The pale grey spiral shows the cumulative scaffold count on a log scale with white scale lines showing successive orders of magnitude.The blue and pale-blue area around the outside of the plot shows the distribution of GC, AT and N percentages in the same bins as the inner plot.A summary of complete, fragmented, duplicated and missing BUSCO genes in the diptera_odb10 set is shown in the top right.An interactive version of this figure is available at https://blobtoolkit.genomehubs.org/view/idTheSoli1.1/dataset/CANDYK01/snail.

Figure 5 .
Figure 5. Genome assembly of Thelaira solivaga, idTheSoli1.1:Hi-C contact map of the idTheSoli1.1 assembly, visualised using HiGlass.Chromosomes are shown in order of size from left to right and top to bottom.An interactive version of this figure may be viewed at https://genome-note-higlass.tol.sanger.ac.uk/l/?d=bUwmRY-QTn2S-xCjTLlU4Q.

Reviewer Report 08
May 2024 https://doi.org/10.21956/wellcomeopenres.21755.r81667© 2024 Dodiya R.This is an open access peer review report distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Table 3 . Software tools: versions and sources. Software tool
Each transfer of samples is further undertaken according to a Research Collaboration Agreement or Material Transfer Agreement entered into by the Darwin Tree of Life Partner, Genome Research Limited (operating as the Wellcome Sanger Institute), and in some circumstances other Darwin Tree of Life collaborators.