The genome sequence of a conopid fly, Myopa testacea (Linnaeus, 1767)

We present a genome assembly from an individual male Myopa testacea (conopid fly; Arthropoda; Insecta; Diptera; Conopidae). The genome sequence is 243.3 megabases in span. Most of the assembly is scaffolded into 5 chromosomal pseudomolecules, including the X and Y sex chromosomes. The mitochondrial genome has also been assembled and is 17.61 kilobases in length. Gene annotation of this assembly on Ensembl identified 25,472 protein coding genes.


Background
Myopa testacea (Diptera: Conopidae, Myopinae) is a common Palaearctic fly from the 'bee-grabber' family.This family name is given by the endoparasitic nature of its members, which are thought to prey on other insects, particularly aculeate Hymenoptera (Stuke, 2017).The distinguishing features of Myopa species are their reddish-brown colour, large white inflated faces, and curling abdomens used by females to deliver their eggs to unsuspecting prey (Smith, 1969).Although M. testacea is typically 7-10 mm long, they vary greatly in size, with individuals ranging from 4-11 mm (Stuke & Clements, 2008).
Myopa testacea is univoltine and has a spring flight period coinciding with the emergence of mating solitary bees.This flight period can be extended to July or later where higher altitudes and northerly latitudes delay prey emergence (Stuke & Clements, 2008).The larval development of this species has been recorded to occur in the adult mining bees, Andrena scotica and Andrena vaga, however further research into their biology is required to understand the true extent and flexibility of host choice (Stuke & Clements, 2008).Being the most common Myopa species in the UK, the role of M. testacea as pollinator could be significant since it can be found feeding on the same catkins and flowers that attract the solitary bees which it parasitises.Recent application of DNA barcoding in Myopa host-parasite interactions has successfully identified a fourth host species in this genus (Smit et al., 2018).Here we add to the growing database of chromosomal DNA for Myopa testacea in the hope that it will be used to help untangle the complex taxonomy in this genus and aid further investigation into the biology and ecology of this species.

Genome sequence report
The genome was sequenced from one male Myopa testacea (Figure 1) collected from Wytham Woods, Oxfordshire, UK (51.76, -1.34).A total of 117-fold coverage in Pacific Biosciences single-molecule HiFi long reads was generated.Primary assembly contigs were scaffolded with chromosome conformation Hi-C data.Manual assembly curation corrected 201 missing joins or mis-joins and removed 28 haplotypic duplications, reducing the assembly length by 0.71% and the scaffold number by 76.15%, and increasing the scaffold N50 by 3.64%.
The final assembly has a total length of 243.3 Mb in 30 sequence scaffolds with a scaffold N50 of 63.2 Mb (Table 1).The snailplot in Figure 2 provides a summary of the assembly statistics, while the distribution of assembly scaffolds on GC proportion and coverage is shown in Figure 3.
The cumulative assembly plot in Figure 4 shows curves for subsets of scaffolds assigned to different phyla.Most (99.78%) of the assembly sequence was assigned to 5 chromosomal-level scaffolds, representing 3 autosomes and the X and Y sex chromosomes.The sex chromosomes were identified by read depth.Chromosome-scale scaffolds confirmed by the Hi-C data are named in order of size (Figure 5; Table 2).While not fully phased, the assembly deposited is of one haplotype.Contigs corresponding to the second haplotype have also been deposited.The mitochondrial genome was also assembled and can be found as a contig within the multifasta file of the genome submission.

Sample acquisition and nucleic acid extraction
A male Myopa testacea (specimen ID Ox001290, ToLID idMyoTest1) was netted in Wytham Woods, Oxfordshire, UK (latitude 51.76, longitude -1.34) on 2021-04-23.The specimen was collected and identified by Steven Falk (independent researcher) and snap-frozen on dry ice.A Hi-C map for the final assembly was produced using bwa-mem2 (Vasimuddin et al., 2019) in the Cooler file format (Abdennur & Mirny, 2020).To assess the assembly metrics, the k-mer completeness and QV consensus quality values were calculated in Merqury (Rhie et al., 2020).This work was done  et al., 2021;Simão et al., 2015) were calculated.
Table 3 contains a list of relevant software tool versions and sources.

Genome annotation
The BRAKER2 pipeline (Brůna et al., 2021) was used in the default protein mode to generate annotation for the Myopa testacea assembly (GCA_949629155.1) in Ensembl Rapid Release.

Wellcome Sanger Institute -Legal and Governance
The materials that have contributed to this genome note have been supplied by a Darwin Tree of Life Partner.The submission of materials by a Darwin Tree of Life Partner is subject to the 'Darwin Tree of Life Project Sampling Code of Practice', which can be found in full on the Darwin Tree of Life website here.By agreeing with and signing up to the Sampling Code of Practice, the Darwin Tree of Life Partner agrees they will meet the legal and ethical requirements and standards set out within this document in respect of all samples acquired for, and supplied to, the Darwin Tree of Life Project.
Further, the Wellcome Sanger Institute employs a process whereby due diligence is carried out proportionate to the nature of the materials themselves, and the circumstances under which they have been/are to be collected and provided for use.The purpose of this is to address and mitigate any potential legal and/or ethical implications of receipt and use of the materials as part of the research project, and to ensure that in doing so we align with best practice wherever possible.The overarching areas of consideration are: • Ethical review of provenance and sourcing of the material  This article reports the assembly of the chromosome-level genome of Myopa testacea through sequencing of a male specimen, with a genome size of 243.3 Mb and a Contig N50 of 0.7 Mb.Additionally, 25472 protein-coding genes were annotated.In addition to the genome, the mitochondrial genome of Myopa testacea was also assembled.All data presented in this study have been released and are available for public access, and detailed methods have been provided.
The data provided in this study serve as important reference data for future genomic and phylogenetic studies of Diptera.However, I still have some comments on this article.

Abstract
The completeness of the mitochondrial genome, whether it forms a complete circle, is a crucial aspect that should be clarified.

Figure 1
The quality of the insect images is poor, making it difficult to discern their primary morphological features.It is recommended to provide high-quality specimen photographs.

Genome sequence report
1.The chromosome genome obtained in the article suggests conducting a synteny analysis with the model organism Drosophila, to better demonstrate to readers the size and structural features of the chromosomes.
2. The manuscript mentions the assembly of the mitochondrial genome, yet lacks elaboration on its size, structure, and whether it forms a complete circle.It is suggested to provide a brief overview of these aspects.
The article annotates 26,236 protein-coding genes in the genome.It is well known that repetitive sequences and non-coding RNAs play important roles in organisms.However, the annotation status of repetitive sequences and non-coding RNAs is not mentioned in the text.If possible, it is recommended to add this information to provide readers with a more comprehensive understanding of the genome.
Is the rationale for creating the dataset(s) clearly described?Yes Are the protocols appropriate and is the work technically sound?Yes

Are sufficient details of methods and materials provided to allow replication by others? Yes
Are the datasets clearly presented in a useable and accessible format?Yes Competing Interests: No competing interests were disclosed.
Reviewer Expertise: Systematic entomology I confirm that I have read this submission and believe that I have an appropriate level of expertise to confirm that it is of an acceptable scientific standard.

Ravikumar D Dodiya
Sardarkrushinagar Dantiwada Agricultural University, Sardarkrushinagar, Dantiwada, India Summary: The article presents the genome assembly of Myopa testacea, a species of conopid fly.The genome sequence spans 243.3 megabases and includes chromosomal pseudomolecules, including the X and Y sex chromosomes.Additionally, the mitochondrial genome, 17.61 kilobases in length, has been assembled.Gene annotation identified 25,472 protein-coding genes using Ensembl.
Myopa testacea is described as a common Palaearctic fly from the Conopidae family known for its endoparasitic nature, particularly preying on other insects like solitary bees.The species exhibits distinctive features such as a reddish-brown color, large white faces, and curling abdomens in females for laying eggs on prey.
The genome was sequenced from a male specimen collected in Wytham Woods, Oxfordshire, UK, with extensive coverage using Pacific Biosciences HiFi long reads and scaffolded with chromosome conformation Hi-C data.Manual curation corrects assembly errors and improves scaffold quality.
Methods for sample acquisition, nucleic acid extraction, and sequencing are detailed, including the use of high molecular weight DNA extraction, Hi-C data generation, and quality assessment of the assembly.
Overall, the genome assembly provides valuable genetic information for further research into the biology, ecology, and taxonomy of Myopa testacea and contributes to understanding the complex interactions within its ecosystem.
Is the rationale for creating the dataset(s) clearly described?Yes Are the protocols appropriate and is the work technically sound?Yes

Are sufficient details of methods and materials provided to allow replication by others? Yes
Are the datasets clearly presented in a useable and accessible format?Yes Competing Interests: No competing interests were disclosed.
Reviewer Expertise: Biological control, Molecular Research I confirm that I have read this submission and believe that I have an appropriate level of expertise to confirm that it is of an acceptable scientific standard.

Figure 2 .
Figure 2. Genome assembly of Myopa testacea, idMyoTest1.1:metrics.The BlobToolKit Snailplot shows N50 metrics and BUSCO gene completeness.The main plot is divided into 1,000 size-ordered bins around the circumference with each bin representing 0.1% of the 243,347,938 bp assembly.The distribution of scaffold lengths is shown in dark grey with the plot radius scaled to the longest scaffold present in the assembly (74,748,633 bp, shown in red).Orange and pale-orange arcs show the N50 and N90 scaffold lengths (63,215,029 and 39,529,147 bp), respectively.The pale grey spiral shows the cumulative scaffold count on a log scale with white scale lines showing successive orders of magnitude.The blue and pale-blue area around the outside of the plot shows the distribution of GC, AT and N percentages in the same bins as the inner plot.A summary of complete, fragmented, duplicated and missing BUSCO genes in the diptera_odb10 set is shown in the top right.An interactive version of this figure is available at https://blobtoolkit.genomehubs.org/view/Myopa%20testacea/dataset/CATIVO01/snail.

Figure 5 .
Figure 5. Genome assembly of Myopa testacea, idMyoTest1.1:Hi-C contact map of the idMyoTest1.1 assembly, visualised using HiGlass.Chromosomes are shown in order of size from left to right and top to bottom.An interactive version of this figure may be viewed at https://genome-note-higlass.tol.sanger.ac.uk/l/?d=Rylu9Mn1RYGQo-JpaIBmig.