The genome sequence of the Vagrant Hoverfly, Eupeodes corollae (Fabricius, 1794)

We present a genome assembly from an individual female Eupeodes corollae (the Vagrant Hoverfly; Arthropoda; Insecta; Diptera; Syrphidae). The genome sequence is 648.2 megabases in span. Most of the assembly is scaffolded into four chromosomal pseudomolecules, including with the X sex chromosome. The mitochondrial genome has also been assembled and is 18.3 kilobases in length.


Background
The Vagrant or Migrant Hoverfly, Eupeodes corollae, is a common and widespread, medium-sized hoverfly (wing length 5-8 mm) with a long flight season.This hoverfly occurs throughout Britain and can be found in a variety of open or boundary habitats in urban and rural settings, such as parks, gardens, brownfield sites, agricultural land, woodland edge and road verges (Ball et al., 2011;Ball & Morris, 2015;Stubbs & Falk, 2002).Adults can be seen from March to November and visit a variety of flowers but are usually most frequent in July and August when they can be common on umbellifers (Ball et al., 2011;Morris, 1998).
Like many Syrphinae hoverflies, E. corollae has yellow markings on a dark abdomen, but in this species the pattern is both variable and sexually dimorphic.In males the yellow markings are larger and more square-shaped, whereas in female they are smaller and narrower.The extent to which these yellow markings reach the lateral edge of the abdomen and the colour of setae along this edge are important characters for identification.The male genital capsule of E. corollae is much larger than other Eupeodes species which is another useful identification feature (Ball & Morris, 2015;Stubbs & Falk, 2002).
The species is found across Europe, Asia, and North Africa and, as the common name suggests, is a long-distance migrant.During the springtime, the hoverfly has been documented flying at least 105 km across open ocean from the Middle East to Cyprus (Hawkes et al., 2022), and has been recorded heading south in the autumn through the Alps and the Pyrenees (Aubert et al., 1976;Williams et al., 1956).In addition, entomological radar studies above southern Britain have confirmed seasonally directed movements, revealed their huge abundance and investigated their responses to wind currents (Gao et al., 2020;Wotton et al., 2019).
Eupeodes corollae has been widely investigated as a predator of aphid crop pests or as a dual-purpose pollinator and biocontrol agent (Lillo et al., 2021;van Oystaeyen et al., 2022).In southern Britain, 1 to 4 trillion aphids a year are consumed by the offspring of E. corollae and the marmalade hoverfly Episyrphus balteatus, following their mass arrivals, making this species an important biocontrol agent in agricultural ecosystems (Wotton et al., 2019).Reflecting this role, E. corollae is commercially available for the biocontrol of aphids.
This species is also one of the few Syrphids with existing genomic resources, including a previous genome assembly, a transcriptomic analysis of chemosensory genes from the antennae, a genome-wide analysis of population genetic structure and has had mutant lines created for the investigation of the molecular basis of aphid detection (Liu et al., 2019;Wang et al., 2017;Wang et al., 2022;Yuan et al., 2022).The production of a high quality Eupeodes corollae genome described here, generated as part of the Darwin Tree of Life project, will further aid in understanding of the biology and ecology of this hoverfly.

Genome sequence report
The genome was sequenced from one female E. corollae specimen (Figure 1 b and c) collected from Luton, UK (latitude 51.89, longitude -0.39).A total of 34-fold coverage in Pacific Biosciences single-molecule HiFi long reads and 34-fold coverage in 10X Genomics read clouds were generated.Primary assembly contigs were scaffolded with chromosome conformation Hi-C data.Manual assembly curation corrected 670 missing or mis-joins and removed 33 haplotypic duplications, reducing the assembly length by 1.53% and the scaffold number by 36.24%, and increasing the scaffold N50 by 6.66%.
The final assembly has a total length of 648.2 Mb in 783 sequence scaffolds with a scaffold N50 of 158.6 Mb (Table 1).Most (95.29%) of the assembly sequence was assigned to four chromosomal-level scaffolds, representing three autosomes and the X sex chromosome.Chromosome-scale scaffolds confirmed by the Hi-C data are named in order of size (Figure 2-Figure 5; Table 2).The assembly has a BUSCO v5.completeness of 96.7% (single 96.0%, duplicated 0.7%) using the diptera_odb10 reference set.While not fully phased, the assembly deposited is of one haplotype.Contigs corresponding to the second haplotype have also been deposited.

Sample acquisition and nucleic acid extraction
Two Eupeodes corollae specimens (idEupCoro1 and idEupCoro2) were collected by netting in an urban garden in Luton, UK (latitude 51.89, longitude -0.39) on 26 April 2020 and 16 June 2020.The specimens were collected by Olga and Duncan Sivell (Natural History Museum) and were preserved on dry ice.Both specimens were identified by Duncan Sivell using Stubbs and Falk (2002) and Ball & Morris (2015).
DNA was extracted at the Tree of Life laboratory, Wellcome Sanger Institute (WSI).The idEupCoro1 sample was weighed and dissected on dry ice.Tissue was disrupted using a Nippi Powermasher fitted with a BioMasher pestle.High molecular weight (HMW) DNA was extracted using the Qiagen on Pacific Biosciences SEQUEL II (HiFi) and HiSeq X Ten (10X) instruments.Hi-C data were also generated from specimen idEupCoro2 using the Arima v2 kit and sequenced on the Illumina NovaSeq 6000 instrument.using gEVAL, HiGlass (Kerpedjiev et al., 2018) and Pretext (Harry, 2022).The mitochondrial genome was assembled using MitoHiFi (Uliano-Silva et al., 2022), which performed annotation using MitoFinder (Allio et al., 2020).The genome was analysed and BUSCO scores were generated within the Blob-ToolKit environment (Challis et al., 2020).Table 3 contains a list of all software tool versions used, where appropriate.

Are the datasets clearly presented in a useable and accessible format? Yes
Competing Interests: No competing interests were disclosed.
I confirm that I have read this submission and believe that I have an appropriate level of expertise to confirm that it is of an acceptable scientific standard.

Jaakko L.O. Pohjoismäki
Department of Environmental and Biological Sciences, University of Eastern Finland, Joensuu, Finland Background: Maybe worth noting that the coloration among the species is variable and especially the females might be difficult to determine.Would be also informative to tell that there are 8 other Eupeodes species in the UK plus one, whose taxonomic status is uncertain.See the species list at: http://hoverfly.uk/Not much to comment on the genome sequencing, it nicely follows the DToL pipeline, quality standards and presentation.
Methods: The sex of the specimens is not specified?
Otherwise, I have the same complaint as I have had with other similar reports.Although the mitochondrial genome was retrieved the Cox1 DNA barcode has not been checked.I did this and it seems to confirm the species identity.
DNA barcodes work quite poor for some Syrphid genera, but for this species the identify should be ok.
I also noted that the mitochondrial genome produced here that is available in the GenBank has not been annotated and is in reverse orientation?https://www.ncbi.nlm.nih.gov/nuccore/OX244027.1?report=genbank

Is the rationale for creating the dataset(s) clearly described? Yes
Are the protocols appropriate and is the work technically sound?Yes

Are sufficient details of methods and materials provided to allow replication by others? Yes
Are the datasets clearly presented in a useable and accessible format?Yes Competing Interests: No competing interests were disclosed.
Reviewer Expertise: Mitochondrial DNA, molecular biology, genetics, taxonomy I confirm that I have read this submission and believe that I have an appropriate level of expertise to confirm that it is of an acceptable scientific standard, however I have significant reservations, as outlined above.

Shigeyuki Koshikawa
Graduate School of Environmental Science, Hokkaido University, Sapporo, Japan Takumi Karasawa Graduate School of Environmental Science, Hokkaido University, Sapporo, Hokkaido Prefecture, Japan This article reports a high-quality genome sequence of the Vagrant Hoverfly, Eupeodes corollae.
The taxonomic position and ecology of this species are briefly presented.The methods used to obtain the genome sequence are explained in an appropriate amount of detail.We believe this article will promote an understanding of the nature of this species, regional biodiversity, and insect biology in general.
We have a few minor comments below.
The last e letter in the species name in the article title should be in italics.
In the introduction, the authors wrote that a genome assembly of this species has been published before.We think the authors should describe how much better this genome assembly is compared to that earlier assembly.
It is not clear what the 311M in the legend in the lower-left corner of Figure 2 indicates.

Reviewer Expertise: Evolutionary Developmental Biology
We confirm that we have read this submission and believe that we have an appropriate level of expertise to confirm that it is of an acceptable scientific standard.

Josephine Reinhardt
State University of New York at Geneseo, Geneseo, NY, USA This report is of the genome of the Vagrant hoverfly Eupeodes corollae, produced as part of the Darwin Tree of Life (DToL) consortium.As such, it uses standardized methods and reports standardized results in line with genome reports of other arthropod species produced by the consortium.The brief introduction to the species and its interest to the scientific community and broader public appears appropriate.The approach DToL is taking with their genome reports has great benefits, because makes it a simple matter to compare genome assembly quality across taxa.The genome is of high quality based on the metrics presented.I also checked that the data were available as reported (they were).
As a reader that hasn't personally worked with DToL genomes, there are a few places that I think could be improved in terms of clarifying for a broader audience referencing these reports.
Figure 2 -a "pale grey spiral" is referenced -I think this is meant to be the small light grey blob in the middle.This does not appear to be a spiral here, but is more spiral-like in similar reports e.g.https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10349268/figure/f2/ ).Someone only looking at this report without seeing others could have trouble interpreting.There are also light grey (white?) dashed lines that are barely visible in the figure especially over the orange area.What are these and could the color be adjusted for visibility?
Figure 3: The Y-axis label is "ERR9904177_cov" and this ID does not appear anywhere else in the report so perhaps should be changed.This appears to be sequencing coverage, but multiple types of sequencing were done -both PacBioHiFi (34x) and 10X Genomics (34x).Which coverage is presented here?
Methods -the amount of DNA that was submitted for 10X sequencing (50ng) is mentioned but not the amount of DNA that was used in producing the HiFi and HiC libraries.Readers attempting to sequence genomes using multiple technologies from single small bodied individuals might be interested to know this information for planning their studies.

Is the rationale for creating the dataset(s) clearly described? Yes
Are the protocols appropriate and is the work technically sound?Yes

Are sufficient details of methods and materials provided to allow replication by others? Yes
Are the datasets clearly presented in a useable and accessible format?Yes Competing Interests: No competing interests were disclosed.
Reviewer Expertise: Evolution, Genomics I confirm that I have read this submission and believe that I have an appropriate level of expertise to confirm that it is of an acceptable scientific standard.
Figure 1.a. Photograph of live Eupeodes corollae by Will Hawkes.b.Ventral view of specimen idEupCoro1, c.Lateral view of idEupCoro1.

Figure 2 .Figure 3 .
Figure 2. Genome assembly of Eupeodes corollae, idEupCoro1.1:metrics.The BlobToolKit Snailplot shows N50 metrics and BUSCO gene completeness.The main plot is divided into 1,000 size-ordered bins around the circumference with each bin representing 0.1% of the 648,213,917 bp assembly.The distribution of scaffold lengths is shown in dark grey with the plot radius scaled to the longest scaffold present in the assembly (311,186,714 bp, shown in red).Orange and pale-orange arcs show the N50 and N90 scaffold lengths (158,619,773 and 139,491,442 bp), respectively.The pale grey spiral shows the cumulative scaffold count on a log scale with white scale lines showing successive orders of magnitude.The blue and pale-blue area around the outside of the plot shows the distribution of GC, AT and N percentages in the same bins as the inner plot.A summary of complete, fragmented, duplicated and missing BUSCO genes in the diptera_odb10 set is shown in the top right.An interactive version of this figure is available at https://blobtoolkit.genomehubs.org/view/idEupCoro1.1/dataset/CAMAOT01/snail.
Assembly was carried out withHifiasm (Cheng et al., 2021)   and haplotypic duplication was identified and removed with purge_dups(Guan et al., 2020).One round of polishing was performed by aligning 10X Genomics read data to the assembly with Long Ranger ALIGN, calling variants with FreeBayes (Garrison & Marth, 2012).The assembly was then scaffolded with Hi-C data(Rao et al., 2014) using YaHS (Zhou et al.,  2023).The assembly was checked for contamination and corrected using the gEVAL system(Chow et al., 2016) as described previously(Howe et al., 2021).Manual curation was performed

Figure 5 .
Figure 5. Genome assembly of Eupeodes corollae, idEupCoro1.1:Hi-C contact map.Hi-C contact map of the idEupCoro1.1 assembly, visualised using HiGlass.Chromosomes are shown in order of size from left to right and top to bottom.An interactive version of this figure may be viewed at https://genome-note-higlass.tol.sanger.ac.uk/l/?d=RLyscKYrT8GuIgKEvXVwiQ.

Reviewer
Report 05 October 2023 https://doi.org/10.21956/wellcomeopenres.21175.r67741© 2023 Koshikawa S et al.This is an open access peer review report distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
It would be easier to understand if there was a diagram showing how much the size of the circle in Figure3corresponds to the scaffold length.Also, it is difficult to understand what the Y-axis in Figure 3, which says ERR9904177_cov, indicates.The blue letters indicating chromosomes 1-3 in Figure 5 cover the figure.It would be better to move them outside the figure.Is the rationale for creating the dataset(s) clearly described?Yes Are the protocols appropriate and is the work technically sound?Yes Are sufficient details of methods and materials provided to allow replication by others?Yes Are the datasets clearly presented in a useable and accessible format?Yes Competing Interests: No competing interests were disclosed.