The genome sequence of the Stripe-backed Dasysyrphus, Dasysyrphus albostriatus (Fallén, 1817)

We present a genome assembly from an individual female Dasysyrphus albostriatus (the Stripe-backed Dasysyrphus; Arthropoda; Insecta; Diptera; Syrphidae). The genome sequence is 662.5 megabases in span. Most of the assembly is scaffolded into 5 chromosomal pseudomolecules, including the X sex chromosome. The mitochondrial genome has also been assembled and is 17.55 kilobases in length. Gene annotation of this assembly on Ensembl identified 12,259 protein coding genes.


Background
The Stripe-backed Dasysyrphus, Dasysyrphus albostriatus, is a medium-sized black and yellow striped hoverfly in the subfamily Syrphinae.It occurs across the Palearctic and is widespread throughout Great Britain and Ireland in woodland clearings and edges (Stubbs & Falk, 2002).Dasysyrphus species can be distinguished from hoverflies in other genera by hairy eyes and a long black wing stigma, a dark marking along the leading edge of the wing.Dasysyrphus albostriatus has yellow bars on the second, third and fourth abdominal segments that are angled downwards and often fused to form an inverted 'v' (Ball & Morris, 2013).There are a pair of grey stripes on the thorax, from which the common name is derived.
It is bivoltine, with the first flight period from April to June and a second August to September (Ball & Morris, 2013).Both sexes visit a wide range of flowers.The larvae are aphidophagous, feeding on a range of arboreal aphid species at night and resting in the trees near the aphid colony during the day, protected by their camouflage, of dark colours and projections from their bodies, that helps them to blend in with twigs and the bark of branches (Hoverfly Recording Scheme, 2022).
We present a chromosomally complete genome sequence for Dasysyrphus albostriatus based on one adult female specimen, sequenced as part of the Darwin Tree of Life Project.This project is a collaborative effort to sequence all named eukaryotic species in the Atlantic Archipelago of Britain and Ireland.

Genome sequence report
The genome was sequenced from one female Dasysyrphus albostriatus (Figure 1) collected from Tubney House, Oxfordshire, UK (51.69, -1.37).A total of 29-fold coverage in Pacific Biosciences single-molecule HiFi long reads was generated.Primary assembly contigs were scaffolded with chromosome conformation Hi-C data.Manual assembly curation corrected 130 missing joins or mis-joins and removed 15 haplotypic duplications, reducing the assembly length by 1.35% and the scaffold number by 72.15%.
The final assembly has a total length of 662.5 Mb in 22 sequence scaffolds with a scaffold N50 of 154.0 Mb (Table 1).The snailplot in Figure 2 provides a summary of the assembly statistics, while the distribution of assembly scaffolds on GC proportion and coverage is shown in Figure 3.The cumulative assembly plot in Figure 4 shows curves for subsets of scaffolds assigned to different phyla.Most (99.98%) of the assembly sequence was assigned to 5 chromosomal-level scaffolds, representing 4 autosomes and the X sex chromosome.Chromosome-scale scaffolds confirmed by the Hi-C data are named in order of size (Figure 5; Table 2).The scaffold order and orientation uncertain on chromosome X in the region of 15.31-18.96Mb.While not fully phased, the assembly deposited is of one haplotype.Contigs corresponding to the second haplotype have also been deposited.The mitochondrial genome was also assembled and can be found as a contig within the multifasta file of the genome submission.

Sample acquisition and nucleic acid extraction
A female Dasysyrphus albostriatus (specimen ID Ox001343, ToLID idDasAlbo1) was netted at Tubney House, Oxfordshire, UK (latitude 51.69, longitude -1.37) on 2021-05-10.The specimen was collected and identified by Liam Crowley (University of Oxford) and preserved on dry ice.
Protocols developed by the Wellcome Sanger Institute (WSI) Tree of Life core laboratory have been deposited on     Qubit Fluorometer and Qubit dsDNA High Sensitivity Assay kit.Fragment size distribution was evaluated by running the sample on the FemtoPulse system.
RNA was extracted from abdomen tissue of idDasAlbo1 in the Tree of Life Laboratory at the WSI using the RNA Extraction: Automated MagMax™ mirVana protocol (do Amaral et al., 2023).The RNA concentration was assessed using a Nanodrop spectrophotometer and a Qubit Fluorometer using the Qubit RNA Broad-Range Assay kit.Analysis of the integrity of the RNA was done using the Agilent RNA 6000 Pico Kit and Eukaryotic Total RNA assay.

Sequencing
Pacific Biosciences HiFi circular consensus DNA sequencing libraries were constructed according to the manufacturers' instructions.Poly(A) RNA-Seq libraries were constructed using the NEB Ultra II RNA Library Prep kit.DNA and RNA sequencing was performed by the Scientific Operations core at the WSI on Pacific Biosciences SEQUEL II (HiFi) and Illumina NovaSeq 6000 (RNA-Seq) instruments.Hi-C data were also generated from head tissue of idDasAlbo1 using the Arima2 kit and sequenced on the Illumina NovaSeq 6000 instrument.et al., 2020) or MITOS (Bernt et al., 2013) and uses these annotations to select the final mitochondrial contig and to ensure the general quality of the sequence.

Genome assembly, curation and evaluation
A Hi-C map for the final assembly was produced using bwa-mem2 (Vasimuddin et al., 2019) in the Cooler file format

Figure 2 .
Figure 2. Genome assembly of Dasysyrphus albostriatus, idDasAlbo1.1:metrics.The BlobToolKit Snailplot shows N50 metrics and BUSCO gene completeness.The main plot is divided into 1,000 size-ordered bins around the circumference with each bin representing 0.1% of the 662,557,047 bp assembly.The distribution of scaffold lengths is shown in dark grey with the plot radius scaled to the longest scaffold present in the assembly (249,865,697 bp, shown in red).Orange and pale-orange arcs show the N50 and N90 scaffold lengths (154,042,049 and 88,412,347 bp), respectively.The pale grey spiral shows the cumulative scaffold count on a log scale with white scale lines showing successive orders of magnitude.The blue and pale-blue area around the outside of the plot shows the distribution of GC, AT and N percentages in the same bins as the inner plot.A summary of complete, fragmented, duplicated and missing BUSCO genes in the diptera_odb10 set is shown in the top right.An interactive version of this figure is available at https://blobtoolkit.genomehubs.org/view/CAMIUA01/dataset/CAMIUA01/snail.

Figure 3 .
Figure 3. Genome assembly of Dasysyrphus albostriatus, idDasAlbo1.1:BlobToolKit GC-coverage plot.Scaffolds are coloured by phylum.Circles are sized in proportion to scaffold length.Histograms show the distribution of scaffold length sum along each axis.An interactive version of this figure is available at https://blobtoolkit.genomehubs.org/view/CAMIUA01/dataset/CAMIUA01/blob.

Figure 4 .
Figure 4. Genome assembly of Dasysyrphus albostriatus, idDasAlbo1.1:BlobToolKit cumulative sequence plot.The grey line shows cumulative length for all scaffolds.Coloured lines show cumulative lengths of scaffolds assigned to each phylum using the buscogenes taxrule.An interactive version of this figure is available at https://blobtoolkit.genomehubs.org/view/CAMIUA01/dataset/CAMIUA01/cumulative.

Figure 5 .
Figure 5. Genome assembly of Dasysyrphus albostriatus, idDasAlbo1.1:Hi-C contact map of the idDasAlbo1.1 assembly, visualised using HiGlass.Chromosomes are shown in order of size from left to right and top to bottom.An interactive version of this figure may be viewed at https://genome-note-higlass.tol.sanger.ac.uk/l/?d=G3xi3rX8S4y1rbDptjgNRw.

Darwin Tree of Life Project Sampling Code of Practice
', which can be found in full on the Darwin Tree of Life website here.By agreeing with and signing up to the Sampling Code of Practice, the Darwin Tree of Life Partner

Table 3 . Software tools: versions and sources. Software tool Version Are the protocols appropriate and is the work technically sound? Yes Are sufficient details of methods and materials provided to allow replication by others? Yes Are the datasets clearly presented in a useable and accessible format? Yes Competing Interests:
No competing interests were disclosed.

have read this submission and believe that I have an appropriate level of expertise to confirm that it is of an acceptable scientific standard. Is the rationale for creating the dataset(s) clearly described? Yes Are the protocols appropriate and is the work technically sound? Yes Are sufficient details of methods and materials provided to allow replication by others? Yes Are the datasets clearly presented in a useable and accessible format? Yes Competing Interests:
No competing interests were disclosed.Entomology, Diptera, morphology, molecular systematics, ecology, shoreline habitats I