The genome sequence of a snail-killing fly, Pherbina coryleti (Scopoli, 1763)

We present a genome assembly from an individual male Pherbina coryleti (snail-killing fly; Arthropoda; Insecta; Diptera; Sciomyzidae). The genome sequence is 863 megabases in span. Most of the assembly is scaffolded into six chromosomal pseudomolecules, including the assembled X sex chromosome. The mitochondrial genome has also been assembled and is 20.9 kilobases in length. Gene annotation of this assembly on Ensembl identified 32,619 protein coding genes.


Background
Pherbina coryleti is a snail-killing fly from family Sciomyzidae (Diptera), tribe Tetanocerini.It is a yellowish, medium-sized fly (8.6-9.3 mm) with infuscated, distinctly patterned wings.It closely resembles Pherbina intermedia (northern European species, not recorded in Britain).Several features allow for their separation, such as strong setae on the mesopleuron (three in P. coryleti, one in P. intermedia), the shape of antennae (stout and with convex upper and lower margins in P. coryleti, slender and with parallel upper and lower margins in P. intermedia) and the patterning of cell cu (Rozkošný, 1984;Rozkošný, 1987).Larvae and pupae were described by Knutson, Rozkošný and Berg (1975), and keys allowing for their differentiation from P. intermedia were given by (Rozkošný, 2002).
Pherbina coryleti is univoltine (with one generation per year).Mating occurs in spring/early summer and oviposition is delayed for several months (Knutson & Vala, 2011).The eggs are laid in batches, on plant stems or leaves in moist conditions (Beaver, 1973).The larvae are semi-aquatic, but weak swimmers.They are predators and saprophages of a wide range of non-operculate (mainly freshwater) snails encountered on moist and exposed surfaces (e.g., stranded).The larva ruptures the haemocoel of a freshwater snail and feeds on its flesh and occasionally on haemolymph.The snail dies within minutes.Each larva consumes between 10-20 snails in its lifetime (Knutson & Vala, 2011).Pherbina coryleti is a wasteful feeder, killing many more snails than needed for its development (Beaver, 1974).The species overwinters as a third instar larva.The fly pupates away from the host.Floating puparia and mature larvae may be found in spring amongst marginal vegetation (Rozkošný, 1984;Speight & Knutson, 2012).This Eurasian species is common and widely distributed in Britain.It can be found in wetland habitats such as inland lakes, marshes (including coastal), fens, seasonally-flooded unimproved grassland, reeds and tall sedge beds.The flight period is from May to September (Ball, 2017;Speight & Knutson, 2012).
The high-quality genome sequence described here is the first one reported for P. coryleti and has been generated as part of the Darwin Tree of Life project.It will aid in the study of the species as well as the evolution and phylogenetics of the group.

Genome sequence report
The genome was sequenced from one male P. coryleti specimen (Figure 1) collected from Parsonage Moor, UK (latitude 51.69, longitude -1.33).A total of 23-fold coverage in Pacific Biosciences single-molecule HiFi long reads was generated.Primary assembly contigs were scaffolded with chromosome conformation Hi-C data.Manual assembly curation corrected 417 missing joins or mis-joins and removed 13 haplotypic duplications, reducing the assembly length by 5.83% and the scaffold number by 92.41%, and increasing the scaffold N50 by 3.25%.
The final assembly has a total length of 863.0 Mb in 17 sequence scaffolds with a scaffold N50 of 178.9 Mb (Table 1).Most (99.79%) of the assembly sequence was assigned to six chromosomal-level scaffolds, representing five autosomes and the X sex chromosome.Chromosome-scale scaffolds confirmed by the Hi-C data are named in order of size.Scaffolds 72,42,127,59,153,102,159,29 are unlocalised scaffolds.One or more of these could be the Y chromosome, or could belong to the X chromosome (Figure 2-Figure 5; Table 2).The assembly has a BUSCO v5.3.2 (Manni et al., 2021) completeness of 97.4% (single 96.3%, duplicated 1.1%) using the OrthoDB v10 diptera reference set (n = 3,285).While not fully phased, the assembly deposited is of one haplotype.Contigs corresponding to the second haplotype have also been deposited.

Sample acquisition and nucleic acid extraction
A live male P. coryleti specimen (idPheCory1) was collected from vegetation at Parsonage Moor, SU459998, Abington, UK (latitude 51.69, longitude -1.33) on 19 June 2021, by Olga Sivell and Ryan Mitchell (Natural History Museum, London) using an insect net.It was identified by Duncan Sivell, Natural History Museum, London, following Rozkošný (1984), Rozkošný (1987) and Ball (2017).The specimen NHMUK014036906, Figure 1) was snap-frozen on dry ice.The tissue samples taken from it were stored in a CoolRack prior to genome sequencing.
DNA was extracted at the Tree of Life laboratory, Wellcome Sanger Institute (WSI).The idPheCory1 sample was weighed and dissected on dry ice with tissue set aside for Hi-C sequencing.Whole organism tissue was disrupted using a Nippi Powermasher fitted with a BioMasher pestle.High molecular weight (HMW) DNA was extracted using the Qiagen MagAttract HMW DNA extraction kit.Low molecular weight DNA was removed from a 20 ng aliquot of extracted DNA using 0.8X AMpure XP purification kit prior to 10X Chromium sequencing; a minimum of 50 ng DNA was submitted for 10X sequencing.HMW DNA was sheared into an average fragment size of 12-20 kb in a Megaruptor 3 system with speed setting 30.Sheared DNA was purified by solid-phase reversible immobilisation using AMPure PB beads with a 1.8X ratio of beads to sample to remove the shorter fragments and concentrate the DNA sample.The concentration  of the sheared and purified DNA was assessed using a Nanodrop spectrophotometer and Qubit Fluorometer and Qubit dsDNA High Sensitivity Assay kit.Fragment size distribution was evaluated by running the sample on the FemtoPulse system.

Sequencing
Pacific Biosciences HiFi circular consensus and 10X Genomics read cloud DNA sequencing libraries were constructed according to the manufacturers' instructions.DNA sequencing was performed by the Scientific Operations core at the WSI on a Pacific Biosciences SEQUEL II (HiFi) instrument.Hi-C data were also generated from tissue of idPheCory1 using the Arima v2 kit and sequenced on the Illumina NovaSeq 6000 instrument.

Genome annotation
The BRAKER2 pipeline (Brůna et al., 2021) was used in Ensembl to generate draft annotation for the P. coryleti assembly (GCA_943735915.1).

Ethics/compliance issues
The

Liping Yan
Beijing Forestry University, Beijing, China The manuscript is well prepared, with methodology detailly written, results clearly reported.It seems to me, the Hi-C contact map could be improved, and the sequence depth is not moderate but high enough when using DNA extracted from a single fly individual.
The mitogenome of Pherbina coryleti is assembled, the access of which should also be documented.
Is the rationale for creating the dataset(s) clearly described?

Sanjay Kumar Pradhan
Department of Agricultural Entomology, College of Agriculture, University of Agricultural Sciences, Bengaluru, Karnataka, India The manuscript is well structured, and the analysis covered is justifying the core research question raised.Some technical and typographical errors are notified for corrections in the PDF version (in track change mode for reference).Authors can look into the suggestions for further improvements.I approve the manuscript with some minor revisions.
Is the rationale for creating the dataset(s) clearly described?Yes Are the protocols appropriate and is the work technically sound?Yes

Are sufficient details of methods and materials provided to allow replication by others? Yes
Are the datasets clearly presented in a useable and accessible format?Yes Competing Interests: No competing interests were disclosed.
Reviewer Expertise: Viral genomics in insects, genetic engineering I confirm that I have read this submission and believe that I have an appropriate level of expertise to confirm that it is of an acceptable scientific standard.

Figure 1 .
Figure 1.Photographs of the Pherbina coryleti (idPheCory1) specimen used for genome sequencing.a) A male habitus in dorsal view, b) A male habitus in lateral view.Photographs by Olga Sivell.

Figure 2 .
Figure 2. Genome assembly of Pherbina coryleti, idPheCory1.1:metrics.The BlobToolKit Snailplot shows N50 metrics and BUSCO gene completeness.The main plot is divided into 1,000 size-ordered bins around the circumference with each bin representing 0.1% of the 862,972,526 bp assembly.The distribution of scaffold lengths is shown in dark grey with the plot radius scaled to the longest sequence present in the assembly (199,727,623 bp, shown in red).Orange and pale-orange arcs show the N50 and N90 scaffold lengths (178,934,347 and 136,521,607 bp), respectively.The pale grey spiral shows the cumulative scaffold count on a log scale with white scale lines showing successive orders of magnitude.The blue and pale-blue area around the outside of the plot shows the distribution of GC, AT and N percentages in the same bins as the inner plot.A summary of complete, fragmented, duplicated and missing BUSCO genes in the diptera_odb10 set is shown in the top right.An interactive version of this figure is available at https://blobtoolkit.genomehubs.org/view/idPheCory1.1/dataset/CALSEO01/snail.

Figure 5 .
Figure 5. Genome assembly of Pherbina coryleti, idPheCory1.1:Hi-C contact map.Hi-C contact map of the idPheCory1.1 assembly, visualised using HiGlass.Chromosomes are shown in order of size from left to right and top to bottom.An interactive version of this figure may be viewed at https://genome-note-higlass.tol.sanger.ac.uk/l/?d=Z4Bj68n2TiuM4jGCb-bNNw.

Table 2 . Chromosomal pseudomolecules in the genome assembly of Pherbina coryleti, idPheCory1. INSDC accession Chromosome Size (Mb) GC%
materials that have contributed to this genome note have been supplied by a Darwin Tree of Life Partner.The submission of materials by a Darwin Tree of Life Partner is subject to the Darwin Tree of Life Project Sampling Code of Practice.By agreeing with and signing up to the Sampling Code of Practice, the Darwin Tree of Life Partner agrees they will meet the legal and ethical requirements and standards set out within this document in respect of all samples acquired for, and supplied to, the Darwin Tree of Life Project.Each transfer of samples is further undertaken according to a Research Collaboration Agreement or Material Transfer Agreement entered into by the Darwin Tree of Life Partner, Genome Research Limited (operating as the Wellcome Sanger Institute), and in some circumstances other Darwin Tree of Life collaborators.Members of the Tree of Life Core Informatics collective are listed here: https://doi.org/10.5281/zenodo.5013541.Members of the Darwin Tree of Life Consortium are listed here: https://doi.org/10.5281/zenodo.4783558.

Open Peer Review Current Peer Review Status: Version 1
This is an open access peer review report distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Yes Are the protocols appropriate and is the work technically sound? Yes Are sufficient details of methods and materials provided to allow replication by others? Yes Are the datasets clearly presented in a useable and accessible format? Yes Competing Interests:
No competing interests were disclosed.

have read this submission and believe that I have an appropriate level of expertise to confirm that it is of an acceptable scientific standard.
This is an open access peer review report distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.