The genome sequence of a hoverfly, Merodon equestris (Fabricius, 1794)

We present a genome assembly from an individual female Merodon equestris (hoverfly; Arthropoda; Insecta; Diptera; Syrphidae). The genome sequence is 873.0 megabases in span. Most of the assembly is scaffolded into 6 chromosomal pseudomolecules. The mitochondrial genome has also been assembled and is 15.95 kilobases in length.


Background
Merodon equestris (Fabricius, 1794) is a large, hairy, bumblebeemimicking hoverfly with a small head and somewhat elongate abdomen as compared to Bombus species.The species is commonly called the Large Bulb Fly or Large Narcissus Fly, as the larvae are pests of flower bulbs.It is the only species from the genus Merodon occurring in Britain (Ball & Morris, 2015;Chandler, 2023).It can be easily identified by the clear wing, a loop in vein R4+5, black hind legs with swollen femora with a large triangular ventral projection (Ball & Morris, 2015;Stubbs & Falk, 2002;van Veen, 2014).There are 34 colour types, e.g.thorax black or tawny haired, or tawny anteriorly and black posteriorly and abdomen with a grey, yellow, orange, whitish, buff, or red tail; or abdomen tawny-haired with a transverse black band on tergite 3 (Conn, 1972;Stubbs & Falk, 2002).
Meredon equestris is a Palaearctic species occurring from Fennoscandia to the Mediterranean, European parts of Russia, North Africa, Japan, North America and it has been introduced to New Zealand (Speight, 2010) and recently reported from South Korea (Han et al., 2018).It has been recorded from Britain since 1869.It is believed to have been introduced from The Netherlands with narcissus bulbs (Hodson, 1932).It is now widely distributed in Britain and Ireland, and can be found in gardens, urban sites and countryside, with adults on the wing from April to September (Ball et al., 2011;Ball & Morris, 2015;Stubbs & Falk, 2002).
The larvae feed on a number of bulb-producing plant species such as Narcissus, Hyacinth, Tulipa (rarely), Amaryllis, Habranthus, Vallota, Galtonia, Scilla, Leucojum, Eurycles, the wild snowdrop Galanthus and wild bluebell Hyacinthoides non-scripta (Fryer, 1914;Hodson, 1932).The eggs are laid on a leaf close to the ground or on a flower bulb (Hodson, 1932;Stubbs & Falk, 2002).They hatch after approximately 10 to 15 days.The newly hatched larva is distinctively different in appearance from later stages.It usually enters the bulb through the thinner base and begins feeding.Typically, one larva is found in a bulb, and it can migrate to another bulb if the food is depleted and the bulbs are grown closely together.The larval stage lasts approximately 300 days and larva vacates the bulb in early spring, digs through the soil and pupates at the surface level or just below it.The pupal stage lasts approximately 35 to 40 days.The observed adult lifespan was between 5 and 24 days for females and 6 to 18 days for males (Hodson, 1932).The males are aggressive in copulation and will attempt to engage with anything resembling a female, including bumblebees and other males.If successful the flies will land (if flying) and copulate on the ground (Conn, 1971).Adults fly low, often rest on the ground, and can be encountered feeding on umbellifers (Speight, 2017).
There is one generation per year and the species overwinters as a larva inside the plant bulb (Conn, 1971;Hodson, 1932).The egg, larva and pupa were described by Hodson (1932), the larva was also described by Heiss (Heiss, 1938) and was later illustrated by Rotheray (1986).The genetics of Merodon equestris colour polymorphism have been studied by Conn (1971;1972;1976).
The high-quality genome sequence described here is the first one reported for Merodon equestris.It was sequenced based on one specimen from the Eden Project, England.It will aid research on biology, phylogeny and ecology of the species.This genome has been generated as part of the Darwin Tree of Life Project, a collaborative effort to sequence all named eukaryotic species in the Atlantic Archipelago of Britain and Ireland.

Genome sequence report
The genome was sequenced from one Merodon equestris (Figure 1) collected from The Eden Project in Cornwall (50.36,.A total of 33-fold coverage in Pacific Biosciences single-molecule HiFi long reads was generated.Primary assembly contigs were scaffolded with chromosome conformation Hi-C data.Manual assembly curation corrected 44 missing joins or mis-joins and removed 5 haplotypic duplications, reducing the scaffold number by 6.21%, and increasing the scaffold N50 by 3.69%.The final assembly has a total length of 873.0 Mb in 271 sequence scaffolds with a scaffold N50 of 129.8 Mb (Table 1).The snailplot in Figure 2 provides a summary of the assembly statistics, while the distribution of assembly scaffolds on GC proportion and coverage is shown in Figure 3.The cumulative assembly plot in Figure 4 shows curves for subsets of scaffolds assigned to different phyla.Most (92.4%) of the assembly sequence was assigned to 6 chromosomal-level scaffolds.Chromosome-scale scaffolds confirmed by the Hi-C data are named in order of size (Figure 5; Table 2).While not fully  phased, the assembly deposited is of one haplotype.Contigs corresponding to the second haplotype have also been deposited.The mitochondrial genome was also assembled and can be found as a contig within the multifasta file of the genome submission.

Sample acquisition and nucleic acid extraction
A specimen Merodon equestris specimen used for DNA sequencing (specimen ID NHMUK014452805, ToLID idMerEque2) and a second specimen used for RNA (NHMUK014452803, idMerEque3) were collected at the Eden Project (latitude 50.36, longitude -4.74) on 2021-06-28 using an aerial net.In sample preparation, the idMerEque2 sample was weighed and dissected on dry ice (Jay et al., 2023).Tissue from the thorax was homogenised using a PowerMasher II tissue disruptor (Denton et al., 2023a).HMW DNA was extracted RNA was extracted from abdomen tissue of idMerEque3 in the Tree of Life Laboratory at the WSI using the RNA Extraction: Automated MagMax™ mirVana protocol (do Amaral et al., 2023).The RNA concentration was assessed using a Nanodrop spectrophotometer and a Qubit Fluorometer using the Qubit RNA Broad-Range Assay kit.Analysis of the integrity of the RNA was done using the Agilent RNA 6000 Pico Kit and Eukaryotic Total RNA assay.
Protocols developed by the WSI Tree of Life core laboratory are publicly available on protocols.io(Denton et al., 2023b).

Sequencing
Pacific Biosciences HiFi circular consensus DNA sequencing libraries were constructed according to the manufacturers' instructions.Poly(A) RNA-Seq libraries were constructed using the NEB Ultra II RNA Library Prep kit.DNA and RNA sequencing was performed by the Scientific Operations core at the WSI on Pacific Biosciences SEQUEL II (HiFi) and Illumina NovaSeq 6000 (RNA-Seq) instruments.Hi-C data were also generated from head and thorax tissue of idMerEque1 using the Arima2 kit and sequenced on the HiSeq X Ten instrument.
MitoFinder (Allio et al., 2020) or MITOS (Bernt et al., 2013) and uses these annotations to select the final mitochondrial contig and to ensure the general quality of the sequence.

Arka Mukherjee
Zoological Survey of India, Kolkata, West Bengal, India The manuscript describes the complete genome of the bumble-mimicking hoverfly Merodon equestris.In an ecological context, authors may include the lacunae about the taxonomic diversity of the genus.
The materials and methods are sufficient and a detailed explanation is mentioned in the manuscript.The assembly and annotation of the genome are perfectly presented in the graphical representation.As this work is part of the Darwin Tree of Life project, this work is adequate to assist the global biodiversity genomics.

Is the rationale for creating the dataset(s) clearly described? Yes
Are the protocols appropriate and is the work technically sound?Yes

Are sufficient details of methods and materials provided to allow replication by others? Yes
Are the datasets clearly presented in a useable and accessible format?Yes Competing Interests: No competing interests were disclosed.
Reviewer Expertise: Taxonomy, DNA barcoding, Mitochondrial genome, Phylogeny, Diptera I confirm that I have read this submission and believe that I have an appropriate level of expertise to confirm that it is of an acceptable scientific standard.

Figure 2 .
Figure 2. Genome assembly of Merodon equestris, idMerEque2.1:metrics.The BlobToolKit Snailplot shows N50 metrics and BUSCO gene completeness.The main plot is divided into 1,000 size-ordered bins around the circumference with each bin representing 0.1% of the 872,988,745 bp assembly.The distribution of scaffold lengths is shown in dark grey with the plot radius scaled to the longest scaffold present in the assembly (262,796,939 bp, shown in red).Orange and pale-orange arcs show the N50 and N90 scaffold lengths (129,753,689 and 123,427,599 bp), respectively.The pale grey spiral shows the cumulative scaffold count on a log scale with white scale lines showing successive orders of magnitude.The blue and pale-blue area around the outside of the plot shows the distribution of GC, AT and N percentages in the same bins as the inner plot.A summary of complete, fragmented, duplicated and missing BUSCO genes in the diptera_odb10 set is shown in the top right.An interactive version of this figure is available at https://blobtoolkit.genomehubs.org/view/idMerEque2_1/dataset/idMerEque2_1/snail.
These specimens were collected and identified by Olga Sivell (Natural History Museum, London) and preserved by dry freezing at -80°C.The specimen used for Hi-C sequencing (specimen ID Ox000426, ToLID idMerEque1) was netted in Wytham Woods, Oxfordshire (biological vice-county Berkshire), UK (latitude 51.77, longitude -1.34) on 2020-05-22.The specimen was collected and identified by Liam Crowley (University of Oxford) and preserved on dry ice.The workflow for high molecular weight (HMW) DNA extraction at the Wellcome Sanger Institute (WSI) includes a sequence of core procedures: sample preparation; sample homogenisation, DNA extraction, fragmentation, and clean-up.

Figure 3 .
Figure 3. Genome assembly of Merodon equestris, idMerEque2.1:BlobToolKit GC-coverage plot.Scaffolds are coloured by phylum.Circles are sized in proportion to scaffold length.Histograms show the distribution of scaffold length sum along each axis.An interactive version of this figure is available at https://blobtoolkit.genomehubs.org/view/idMerEque2_1/dataset/idMerEque2_1/blob.

Figure 4 .
Figure 4. Genome assembly of Merodon equestris, idMerEque2.1:BlobToolKit cumulative sequence plot.The grey line shows cumulative length for all scaffolds.Coloured lines show cumulative lengths of scaffolds assigned to each phylum using the buscogenes taxrule.An interactive version of this figure is available at https://blobtoolkit.genomehubs.org/view/idMerEque2_1/dataset/idMerEque2_1/cumulative.

Figure 5 .
Figure 5. Genome assembly of Merodon equestris, idMerEque2.1:Hi-C contact map of the idMerEque2.1 assembly, visualised using HiGlass.Chromosomes are shown in order of size from left to right and top to bottom.An interactive version of this figure may be viewed at https://genome-note-higlass.tol.sanger.ac.uk/l/?d=GJMrxZniSRaNqvLTHHNs2w.

Darwin Tree of Life Project Sampling Code of Practice', which
can be found in full on the Darwin Tree of Life website here.By agreeing with and signing up to the Sampling Code of Practice, the Darwin Tree of Life Partner agrees they will meet the legal and ethical requirements and