The genome sequence of a hoverfly, Syrphus vitripennis (Meigen, 1822)

We present a genome assembly from an individual male Syrphus vitripennis (a hoverfly; Arthropoda; Insecta; Diptera; Syrphidae). The genome sequence is 388.8 megabases in span. Most of the assembly is scaffolded into 5 chromosomal pseudomolecules, including the XY sex chromosome. The mitochondrial genome has also been assembled and is 18.33 kilobases in length.


Background
Syrphus vitripennis Meigen, 1822 is a medium-sized (body length 8-11 mm, wing length 7-10 mm) yellow and black wasp-mimicking hoverfly.It is one of the five species from the genus Syrphus occurring in Britain, three of which are common but difficult to separate.The genus is characterised by the hairs on the upper side of the lower calypter.The males of S. vitripennis are indistinguishable from S. rectus Osten Sacken, 1875, a North American species, found in Europe (including Britain) on a few occasions (Ball & Morris, 2015;van Veen, 2014).The females can be separated based on the colour of the hind femur: black in the basal two-thirds in S. vitripennis and yellow on the basal half in S. rectus.The second basal cell of S. vitripennis is only partially covered in microtrichia, which distinguishes this species from S. ribesii (Linnaeus, 1758) and S. torvus Osten Sacken, 1875 which have their second basal cell entirely covered with microtrichia (Ball & Morris, 2015;van Veen, 2014).
Syrphus vitripennis is a Holarctic species, in North America it occurs from Alaska to California (Speight, 2017).This species is highly migratory, moving seasonally to higher latitudes in the spring and lower latitudes in autumn.They have been observed in June on the Isles of Scilly and in mainland Cornwall migrating northwards into the rest of Britain (Hawkes et al., 2022).This species is common and widely distributed in Britain and Ireland, occurring in wide range of lowland habitats including woodland, clearings, gardens, parks and hedgerows.The adults are on the wing from March to November, peaking in July and August (Ball et al., 2011;Ball & Morris, 2015;Speight, 2017).
The complete mitochondrial genome of Syrphus vitripennis was reported by Liu et al. (2020).The high-quality genome of Syrphus vitripennis presented here was sequenced as part of the Darwin Tree of Life Project, a collaborative effort to sequence all named eukaryotic species in the Atlantic Archipelago of Britain and Ireland.It will aid research into the taxonomy, biology and phylogeny of the species.A chromosomally complete genome sequence for Syrphus vitripennis was based on a male specimen from Luton.

Genome sequence report
The genome was sequenced from one male Syrphus vitripennis (Figure 1) collected from Wigmore, Luton, England, UK (51.88, -0.37).A total of 45-fold coverage in Pacific Biosciences single-molecule HiFi long reads was generated.Primary assembly contigs were scaffolded with chromosome conformation Hi-C data.Manual assembly curation corrected 158 missing joins or mis-joins and removed one haplotypic duplication, reducing the scaffold number by 18.53%, and increasing the scaffold N50 by 87.19%.
The final assembly has a total length of 388.8 Mb in 399 sequence scaffolds with a scaffold N50 of 110.9 Mb (Table 1).The snailplot in Figure 2 provides a summary of the assembly statistics, while the distribution of assembly scaffolds on GC proportion and coverage is shown in Figure 3.The cumulative assembly plot in Figure 4 shows curves for subsets of scaffolds assigned to different phyla.Most (97.72%) of the assembly sequence was assigned to 5 chromosomal-level scaffolds, representing 3 autosomes and the X and Y sex chromosome.Chromosome-scale scaffolds confirmed by the Hi-C data are named in order of size (Figure 5; Table 2).While not fully phased, the assembly deposited is of one haplotype.Contigs corresponding to the second haplotype have also been deposited.The mitochondrial genome was also assembled and can be found as a contig within the multifasta file of the genome submission.

Sample acquisition and nucleic acid extraction
A male Syrphus vitripennis (specimen ID NHMUK014111101, ToLID idSyrVitr1) was collected from Wigmore Park, Luton, England, UK (latitude 51.88, longitude -0.37) on 2020-07-07 by netted.The specimen was collected by Olga Sivell (Natural History Museum, London) and identified by Duncan Sivell (Natural History Museum) and preserved on dry ice.
The specimen used for RNA sequencing (specimen ID Ox001241, ToLID idSyrVitr4) was collected from Wytham Woods, Oxfordshire (biological vice-county Berkshire), UK HMW DNA was extracted using the Manual MagAttract v1 protocol (Strickland et al., 2023b).DNA was sheared into an average fragment size of 12-20 kb in a Megaruptor 3 system with speed setting 30 (Todorovic et al., 2023).Sheared DNA was purified by solid-phase reversible immobilisation (Strickland et al., 2023a): in brief, the method employs a 1.8X ratio of AMPure PB beads to sample to eliminate shorter   (Bernt et al., 2013) and uses these annotations to select the final mitochondrial contig and to ensure the general quality of the sequence.
Table 3 contains a list of relevant software tool versions and sources.

Wellcome Sanger Institute -Legal and Governance
The materials that have contributed to this genome note have been supplied by a Darwin Tree of Life Partner.The submission  of materials by a Darwin Tree of Life Partner is subject to the 'Darwin Tree of Life Project Sampling Code of Practice', which can be found in full on the Darwin Tree of Life website here.By agreeing with and signing up to the Sampling Code of Practice, the Darwin Tree of Life Partner agrees they will meet the legal and ethical requirements and standards set out within this document in respect of all samples acquired for, and supplied to, the Darwin Tree of Life Project.
Further, the Wellcome Sanger Institute employs a process whereby due diligence is carried out proportionate to the nature of the materials themselves, and the circumstances under which they have been/are to be collected and provided for use.The purpose of this is to address and mitigate any potential legal and/or ethical implications of receipt and use of the materials as part of the research project, and to ensure that in doing so we align with best practice wherever possible.The overarching areas of consideration are: • Ethical review of provenance and sourcing of the material It is modern, the raw data is high quality, and the methodology is appropriate and understandable.Additionally, the authors have made the raw data and genome publicly available.
Based on my own investigation of the data, the resulting genome is high quality and has excellent coverage.
I appreciate the detail into which the report goes regarding the background of this species as well as the difficulty regarding its identification.
One idea that might prove helpful is grabbing the COI barcode region and testing it against the BOLD database.Syrphids especially are extremely well sampled and this would serve as an additional check on the species identity of the specimen for difficult groups.
Additionally, it would be helpful if the literary resource used to determine the identity of this species was provided in the text.Perhaps this was implied somewhere, however, it should be outright stated.

Is the rationale for creating the dataset(s) clearly described? Yes
Are the protocols appropriate and is the work technically sound?Yes

Are sufficient details of methods and materials provided to allow replication by others? Yes
Are the datasets clearly presented in a useable and accessible format?

Yes
Competing Interests: No competing interests were disclosed.
Reviewer Expertise: Phylogenetics, Syrphid Taxonomy, Bioinformatics I confirm that I have read this submission and believe that I have an appropriate level of expertise to confirm that it is of an acceptable scientific standard.

University of Trieste, Trieste, Italy
This manuscript reports the genome assembly of the hoverfly Syrphus vitripennis.This genome was sequenced within the Darwin Tree of Life project, and both sequencing and assembly were carried out using well established protocols.Hence, I have no technical concerns.All reported metrics indicate that the quality of this resource was high and in line with expectations.
As it often happens for the Darwin Tree of Life genomes, however, no gene annotation track was available.Although the authors mentioned that the genome will be annotated by using available RNA-seq resources, an assembled genome lacking annotation is only of partial usefulness for the scientific community, and I would urge the authors to include a link to this important resource as soon as possible.I know for a fact that several other DToL genomes have been annotated using the Ensembl Rapid pipeline.The use of this pipeline would offer the scientific community to have a standardized, albeit imperfect, preliminary gene annotation track.

Jerome Hui
The Chinese University of Hong Kong, Hong Kong, Hong Kong Sivell, Sivell, Falk and colleagues report the genome sequence of hoverfly Syrphus vitripennis (Meigen, 1822).This species is common and widespread in Britain.Prior to this report, molecular data available for this species are mainly the ribosomal and mitochondrial sequences.This new genome resource will be useful for further studies, such as understanding the cryptic species/subpopulations, ecological, evolutionary, and genomics questions related to insects.
This genome resource is excellent from the summary statistics, with excellent BUSCO scores, high sequence continuity (scaffold N50), and majority of sequences contained on the 3 pseudochromosomes (plus mitochondrion and sex chromosomes).To sum up, this is a valuable contribution from the Darwin Tree of Life Consortium.

Is the rationale for creating the dataset(s) clearly described? Yes
Are the protocols appropriate and is the work technically sound?Yes

Are sufficient details of methods and materials provided to allow replication by others? Yes
Are the datasets clearly presented in a useable and accessible format?Yes Competing Interests: I have published with Peter Holland more than three years ago, and confirm that this potential conflict of interest did not affect my ability to write an objective and unbiased review of the article.

Figure 2 .
Figure 2. Genome assembly of Syrphus vitripennis, idSyrVitr1.1:metrics.The BlobToolKit Snailplot shows N50 metrics and BUSCO gene completeness.The main plot is divided into 1,000 size-ordered bins around the circumference with each bin representing 0.1% of the 388,834,270 bp assembly.The distribution of scaffold lengths is shown in dark grey with the plot radius scaled to the longest scaffold present in the assembly (148,792,233 bp, shown in red).Orange and pale-orange arcs show the N50 and N90 scaffold lengths (110,905,286 and 110,522,704 bp), respectively.The pale grey spiral shows the cumulative scaffold count on a log scale with white scale lines showing successive orders of magnitude.The blue and pale-blue area around the outside of the plot shows the distribution of GC, AT and N percentages in the same bins as the inner plot.A summary of complete, fragmented, duplicated and missing BUSCO genes in the diptera_odb10 set is shown in the top right.An interactive version of this figure is available at https://blobtoolkit.genomehubs.org/view/idSyrVitr1_1/dataset/idSyrVitr1_1/snail.

Figure 3 .
Figure 3. Genome assembly of Syrphus vitripennis, idSyrVitr1.1:BlobToolKit GC-coverage plot.Scaffolds are coloured by phylum.Circles are sized in proportion to scaffold length.Histograms show the distribution of scaffold length sum along each axis.An interactive version of this figure is available at https://blobtoolkit.genomehubs.org/view/idSyrVitr1_1/dataset/idSyrVitr1_1/blob.

Figure 4 .
Figure 4. Genome assembly of Syrphus vitripennis, idSyrVitr1.1:BlobToolKit cumulative sequence plot.The grey line shows cumulative length for all scaffolds.Coloured lines show cumulative lengths of scaffolds assigned to each phylum using the buscogenes taxrule.An interactive version of this figure is available at https://blobtoolkit.genomehubs.org/view/idSyrVitr1_1/dataset/idSyrVitr1_1/cumulative.

Figure 5 .
Figure 5. Genome assembly of Syrphus vitripennis, idSyrVitr1.1:Hi-C contact map of the idSyrVitr1.1 assembly, visualised using HiGlass.Chromosomes are shown in order of size from left to right and top to bottom.An interactive version of this figure may be viewed at https://genome-note-higlass.tol.sanger.ac.uk/l/?d=DwdApaIqR-eOeeDXHwDppQ.

Table 2 . Chromosomal pseudomolecules in the genome assembly of Syrphus vitripennis, idSyrVitr1. INSDC accession Chromosome Length (Mb) GC%
This is an open access peer review report distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Is the rationale for creating the dataset(s) clearly described? Yes Are the protocols appropriate and is the work technically sound? Yes Are sufficient details of methods and materials provided to allow replication by others? Yes Are the datasets clearly presented in a useable and accessible format? Partly
Competing Interests: No competing interests were disclosed.

have read this submission and believe that I have an appropriate level of expertise to confirm that it is of an acceptable scientific standard, however I have significant reservations, as outlined above.
This is an open access peer review report distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.