The genome sequence of the Thick-legged Hoverfly, Syritta pipiens (Linnaeus, 1758)

We present a genome assembly from an individual female Syritta pipiens (the Thick-legged Hoverfly; Arthropoda; Insecta; Diptera; Syrphidae). The genome sequence is 318.5 megabases in span. Most of the assembly is scaffolded into 5 chromosomal pseudomolecules. The mitochondrial genome has also been assembled and is 15.76 kilobases in length. Gene annotation of this assembly on Ensembl identified 18,405 protein coding genes.


Background
Syritta pipiens is the only representative of this genus of hoverflies in Britain and Ireland.It can be distinguished from other hoverflies in the region by the enlarged hind femora and ash-grey/silverdusting of the lateral thorax (Ball & Morris, 2015).It is a small, narrow hoverfly with paired orange or grey spots on tergites two and three and a row of on small spines on the ventral surface of the swollen hind femur.
It is widespread and common species, and adults have been recorded in all months of the year visiting a huge variety of flowers (Ball & Morris, 2015).The larvae are detritivores, feeding on damp decaying vegetable matter such as leaves and compost, but have also been recorded damaging daffodil bulbs (Hodson, 1931), and feeding on human corpses and thus may have a use in forensic pathology (Magni et al., 2013).In flight it is an effective mimic of small crabronid wasps.
Males possess large eyes with enlarged anterior facets, which is believed to confer enhanced binocular vision (Stubbs & Falk, 2002).This may contribute to the males' very efficient visual system for tracking females and remaining 5 to 15 cm away until they are ready to catch the female (Collett & Land, 1975).This system has inspired a flying robot which chases in a similar manner (Colonnier et al., 2019).This is the first full genome sequence to be published for Syritta pipiens, but a complete mitochondrial sequence has been published (Shi et al., 2021).We present a chromosomally complete genome sequence for S. pipiens, based on one female specimen from Wytham Woods, as part of the Darwin Tree of Life Project.This project is a collaborative effort to sequence all named eukaryotic species in the Atlantic Archipelago of Britain and Ireland.

Genome sequence report
The genome was sequenced from one female Syritta pipiens (Figure 1) collected from Wytham Woods, Oxfordshire (51.77,.A total of 44-fold coverage in Pacific Biosciences single-molecule HiFi long reads and 123-fold coverage in 10X Genomics read clouds were generated.Primary assembly contigs were scaffolded with chromosome conformation Hi-C data.Manual assembly curation corrected 15 missing joins or mis-joins and removed one haplotypic duplication, reducing the assembly length by 0.95% and the scaffold number by 70%, and increasing the scaffold N50 by 206.63%. The final assembly has a total length of 318.5 Mb in 6 sequence scaffolds with a scaffold N50 of 86.5 Mb (Table 1).Most (99.98%) of the assembly sequence was assigned to 5 chromosomal-level scaffolds, representing 4 autosomes and the X sex chromosome.Chromosome-scale scaffolds confirmed by the Hi-C data are named in order of size (Figure 2-Figure 5; Table 2).While not fully phased, the assembly deposited is of one haplotype.Contigs corresponding to the second haplotype have also been deposited.The mitochondrial genome was also assembled and can be found as a contig within the multifasta file of the genome submission.
Metadata for specimens, spectral estimates, sequencing runs, contaminants and pre-curation assembly statistics can be found at https://links.tol.sanger.ac.uk/species/34682.

Sample acquisition and nucleic acid extraction
A female Syritta pipiens (specimen ID Ox000241, ToLID idSyrPipi1) was collected from rough Common in Wytham Woods, Oxfordshire (biological vice-county Berkshire), UK (latitude 51.77, longitude -1.34) on 2019-09-03 by netting.The specimen was collected and identified by Liam Crowley (University of Oxford) and preserved on dry ice.
The specimen used for RNA sequencing (specimen ID NHMUK014111601, ToLID idSyrPipi3) was collected from Orchard House, England (50.97, -2.67) by netting on 2020-07-23.The specimen was collected and identified by  Michael Ashworth for the Natural History Museum.The specimen was preserved in liquid nitrogen.
DNA was extracted at the Tree of Life laboratory, Wellcome Sanger Institute (WSI).The idSyrPipi1 sample was weighed and dissected on dry ice with tissue set aside for Hi-C sequencing.Head and thorax tissue was disrupted using a Nippi Powermasher fitted with a BioMasher pestle.High molecular weight (HMW) DNA was extracted using the Qiagen MagAttract HMW DNA extraction kit.Low molecular weight DNA was removed from a 20 ng aliquot of extracted DNA using the 0.8X AMpure XP purification kit prior to 10X Chromium sequencing; a minimum of 50 ng DNA was submitted for 10X sequencing.HMW DNA was sheared into an average fragment size of 12-20 kb in a Megaruptor 3 system with speed setting 30.Sheared DNA was purified by solid-phase reversible RNA was extracted from thorax tissue of idSyrPipi3 in the Tree of Life Laboratory at the WSI using TRIzol, according to the manufacturer's instructions.RNA was then eluted in 50 μl RNAse-free water and its concentration assessed using a Nanodrop spectrophotometer and Qubit Fluorometer using the Qubit RNA Broad-Range (BR) Assay kit.Analysis of the integrity of the RNA was done using Agilent RNA 6000 Pico Kit and Eukaryotic Total RNA assay.
Table 3 contains a list of relevant software tool versions and sources.

Genome annotation
The Ensembl gene annotation system (Aken et al., 2016) was used to generate annotation for the Syritta pipiens assembly (GCA_905187475.1).Annotation was created primarily through alignment of transcriptomic data to the genome, with gap filling via protein-to-genome alignments of a select set of proteins from UniProt (UniProt Consortium, 2019).

Wellcome Sanger Institute -Legal and Governance
The materials that have contributed to this genome note have been supplied by a Darwin Tree of Life Partner.The submission of materials by a Darwin Tree of Life Partner is subject to the 'Darwin Tree of Life Project Sampling Code of Practice', which can be found in full on the Darwin Tree of Life website here.By agreeing with and signing up to the Sampling Code of Practice, the Darwin Tree of Life Partner agrees they will meet the legal and ethical requirements and standards set out within this document in respect of all samples acquired for, and supplied to, the Darwin Tree of Life Project.
Further, the Wellcome Sanger Institute employs a process whereby due diligence is carried out proportionate to the nature of the materials themselves, and the circumstances under which they have been/are to be collected and provided for use.The purpose of this is to address and mitigate any potential legal and/or ethical implications of receipt and use of the materials as part of the research project, and to ensure that in doing so we align with best practice wherever possible.

Muzafar Riyaz
Xavier Research Foundation, St Xavier's College (Ringgold ID: 29983), Palayamkottai, Tamil Nadu, India The genome assembly of Syritta pipiens is a valuable addition to the Darwin Tree of Life Project, providing a comprehensive genetic resource for future research.The data note effectively presents the genome sequence, assembly metrics, and gene annotation, demonstrating thoroughness in data collection and processing.However, a few aspects could enhance the utility and clarity of the data note: Data Accessibility: The inclusion of direct links to the interactive visualizations and raw data repositories, such as BlobToolKit and European Nucleotide Archive, is commendable.
To further improve accessibility, consider providing a step-by-step guide or a brief tutorial on how to navigate and utilize these resources effectively.

○
Detailed Metadata: While the data note includes metadata for the specimen and sequencing runs, more detailed information about the environmental conditions and specific collection methods could be beneficial for researchers attempting to replicate or build upon this study.

Quality Control Metrics:
The assembly quality metrics (e.g., N50, BUSCO scores) are welldocumented, but a comparative table listing these metrics alongside those from related species could provide additional context for evaluating the assembly's quality.
○ Future Research Directions: Although the primary focus is on data presentation, a brief section suggesting potential research applications and implications of this genome assembly could guide researchers in leveraging this data for various scientific inquiries.

○
Overall, the data note is well-structured and provides a comprehensive overview of the genome assembly for Syritta pipiens.Addressing the suggestions above could further enhance its clarity, accessibility, and utility for the research community.
Is the rationale for creating the dataset(s) clearly described?

Annabel Whibley
The University of Auckland, Auckland, Auckland, New Zealand Crowley, Ashworth, Wawman and colleagues present a reference genome assembly and annotation of the Thick-legged Hoverfly (Syritta pipiens).The reference is high-quality, and has been constructed using appropriate tools and with comprehensive reporting of the sample collection, data generation and analysis and all associated metadata.Links to data accessions are functional.
That the gene annotation was informed by RNAseq is worth highlighting, perhaps even in the abstract.
A comment on the sequence identity of the published mtDNA sequence to your assembled one here would be good to include.

Minor comments:
Typo in background, "It is a widespread and common species…" ○ Lingering italics in "This may contribute to the males' very efficient visual system for …" ○ "Rough" should be capitalised in "A female Syritta pipiens (specimen ID Ox000241, ToLID idSyrPipi1) was collected from rough Common in Wytham Woods," ○ I will continue to query whether this templated detail is correct: "in brief, the method employs a 1.8X ratio of AMPure PB beads to sample to eliminate shorter fragments and concentrate the DNA."This ratio of beads is not size-selective, I believe this should be 0.6x, as in the protocols.ioGuidelines.

○
Is the rationale for creating the dataset(s) clearly described?Yes Are the protocols appropriate and is the work technically sound?Yes

Are sufficient details of methods and materials provided to allow replication by others? Yes
Are the datasets clearly presented in a useable and accessible format?Yes Competing Interests: No competing interests were disclosed.
Reviewer Expertise: Genomics, Bioinformatics, Evolution I confirm that I have read this submission and believe that I have an appropriate level of expertise to confirm that it is of an acceptable scientific standard.

Figure 2 .
Figure 2. Genome assembly of Syritta pipiens, idSyrPipi1.1:metrics.The BlobToolKit Snailplot shows N50 metrics and BUSCO gene completeness.The main plot is divided into 1,000 size-ordered bins around the circumference with each bin representing 0.1% of the 318,522,517 bp assembly.The distribution of sequence lengths is shown in dark grey with the plot radius scaled to the longest sequence present in the assembly (108,597,361 bp, shown in red).Orange and pale-orange arcs show the N50 and N90 sequence lengths (86,509,480 and 57,582,887 bp), respectively.The pale grey spiral shows the cumulative sequence count on a log scale with white scale lines showing successive orders of magnitude.The blue and pale-blue area around the outside of the plot shows the distribution of GC, AT and N percentages in the same bins as the inner plot.A summary of complete, fragmented, duplicated and missing BUSCO genes in the diptera_ odb10 set is shown in the top right.An interactive version of this figure is available at https://blobtoolkit.genomehubs.org/view/Syrittapipiens/dataset/CAJJIO01/snail.

Figure 3 .
Figure 3. Genome assembly of Syritta pipiens, idSyrPipi1.1:BlobToolKit GC-coverage plot.Scaffolds are coloured by phylum.Circles are sized in proportion to scaffold length.Histograms show the distribution of scaffold length sum along each axis.An interactive version of this figure is available at https://blobtoolkit.genomehubs.org/view/Syritta%20pipiens/dataset/CAJJIO01/blob.

Figure 4 .
Figure 4. Genome assembly of Syritta pipiens, idSyrPipi1.1:BlobToolKit cumulative sequence plot.The grey line shows cumulative length for all scaffolds.Coloured lines show cumulative lengths of scaffolds assigned to each phylum using the buscogenes taxrule.An interactive version of this figure is available at https://blobtoolkit.genomehubs.org/view/Syritta%20pipiens/dataset/CAJJIO01/cumulative.

Figure 5 .
Figure 5. Genome assembly of Syritta pipiens, idSyrPipi1.1:Hi-C contact map of the idSyrPipi1.1 assembly, visualised using HiGlass.Chromosomes are shown in order of size from left to right and top to bottom.An interactive version of this figure may be viewed at https://genome-note-higlass.tol.sanger.ac.uk/l/?d=WbDLb3N3SHuZ-PBlaziUYA.

Peer Review Current Peer Review Status: Version 1
Research Collaboration Agreement or Material Transfer Agreement entered into by the Darwin Tree of Life Partner, Genome Research Limited (operating as the Wellcome Sanger Institute), and in some circumstances other Darwin Tree of Life collaborators.Members of the Tree of Life Core Informatics collective are listed here: https://doi.org/10.5281/zenodo.5013541.Members of the Darwin Tree of Life Consortium are listed here: https://doi.org/10.5281/zenodo.4783558.This is an open access peer review report distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Yes Are the protocols appropriate and is the work technically sound? Yes Are sufficient details of methods and materials provided to allow replication by others? Yes Are the datasets clearly presented in a useable and accessible format? Yes Competing Interests:
No competing interests were disclosed.

have read this submission and believe that I have an appropriate level of expertise to confirm that it is of an acceptable scientific standard.
https://doi.org/10.21956/wellcomeopenres.21981.r88663© 2024 Whibley A. This is an open access peer review report distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.