The genome sequence of the Golden-tailed Leafwalker, Xylota sylvarum (Linnaeus, 1758)

We present a genome assembly from an individual male Xylota sylvarum (the Golden-tailed Leafwalker; Arthropoda; Insecta; Diptera; Syrphidae). The genome sequence is 534.8 megabases in span. Most of the assembly is scaffolded into five chromosomal pseudomolecules, including the assembled X sex chromosome. The mitochondrial genome has also been assembled and is 16.0 kilobases in length. Gene annotation of this assembly on Ensembl identified 11,993 protein coding genes.


Background
Xylota sylvarum (Linnaeus, 1758) is a large, black hoverfly with an elongate body and a characteristic covering of golden yellow hairs on tergite 4, at the tip of its abdomen (Stubbs & Falk, 2002).In the field, it can be readily confused with X. xanthocnema (Collin, 1939) (Ball & Morris, 2015;Stubbs & Falk, 2002).These two species are separated on the basis of the colour of the hind tibiae: Those of X. sylvarum are dark at the distal end (Ball & Morris, 2015;Stubbs & Falk, 2002;van Veen, 2004), whereas the tibia of X. xanthocnema are yellow.Care is required as X. sylvarum can be misidentified if viewed from above.According to (Rotheray, 1993;Rotheray, 2004), the larvae can also be separated if examined in detail.
Xylota sylvarum is abundant across the UK and Europe (GBIF Secretariat, 2022), being listed as 'Least Concern' (Vujić et al., 2022).It is widespread in England and Wales, but scarcer in Scotland (Ball et al., 2011).Larvae have been reared from a wide range of rotting tree material from a variety of species, both broadleaved and coniferous (Hartley, 1961;Hartley, 1963;Rotheray, 2004).Adults, which fly May-October, are mostly seen upon foliage (Ball & Morris, 2015;Stubbs & Falk, 2002).Adults feed on windblown pollen stuck to aphid honeydew on leaf surfaces (Ssymank & Gilbert, 1993), although will also visit flowers (Ball & Morris, 2015).The chromosomelevel genome assembly presented here is, to our knowledge, the first high-quality resource developed for a member of the genus Xylota.

Genome sequence report
The genome was sequenced from one male Xylota sylvarum (Figure 1) collected from Wytham Woods, Oxfordshire (biological vice-county: Berkshire), UK (latitude 51.78, longitude -1.33).A total of 34-fold coverage in Pacific Biosciences single-molecule HiFi long reads and 48-fold coverage in 10X Genomics read clouds were generated.Primary assembly contigs were scaffolded with chromosome conformation Hi-C data.Manual assembly curation corrected 37 missing joins or mis-joins and removed six haplotypic duplications, reducing the assembly length 0.86% and the scaffold number by 24.2%, and increasing the scaffold N50 by 327.13%.
The final assembly has a total length of 534.8 Mb in 119 sequence scaffolds with a scaffold N50 of 124.8 Mb (Table 1).Most (87.26%) of the assembly sequence was assigned to five chromosomal-level scaffolds, representing four autosomes, and the X sex chromosome.Chromosome-scale scaffolds confirmed by the Hi-C data are named in order of size (Figure 2-Figure 5; Table 2).While not fully phased, the assembly deposited is of one haplotype.Contigs corresponding to the second haplotype have also been deposited.The estimated k-mer-based Quality Value (QV) of the final assembly is 56.1 with k-mer based completeness of 99.99%, and the assembly has a BUSCO v5.3.2 (Manni et al., 2021) completeness of 97.0% (single = 96.5%,duplicated = 0.5%), using the diptera_odb 10 reference set (n = 3,285).

Sample acquisition and nucleic acid extraction
Two male Xylota sylvarum specimens (idXylSylv1 and idXylSylv2) were collected from Wytham Woods, Oxfordshire (biological vice-county: Berkshire), UK (latitude 51.78, longitude -1.33) on 8 August 2019 and 20 August 2019.The specimens were taken from woodland habitat by Liam Crowley (University of Oxford) by netting.The specimens were identified by the collector and snap frozen on dry ice.
DNA was extracted at the Tree of Life laboratory, Wellcome Sanger Institute (WSI).The idXylSylv2 sample was weighed and dissected on dry ice with tissue set aside   for Hi-C sequencing.Thorax tissue was disrupted using a Nippi Powermasher fitted with a BioMasher pestle.High molecular weight (HMW) DNA was extracted using the Qiagen MagAttract HMW DNA extraction kit.Low molecular weight DNA was removed from a 20 ng aliquot of extracted DNA using the 0.8X AMpure XP purification kit prior to 10X Chromium sequencing; a minimum of 50 ng DNA was submitted for 10X sequencing.
HMW DNA was sheared into an average fragment size of 12-20 kb in a Megaruptor 3 system with speed setting 30.Sheared DNA was purified by solid-phase reversible immobilisation using AMPure PB beads with a 1.8X ratio of beads to sample to remove the shorter fragments and concentrate the DNA sample.The concentration of the sheared and purified DNA was assessed using a Nanodrop spectrophotometer and Qubit Fluorometer and Qubit dsDNA High Sensitivity Assay kit.
Fragment size distribution was evaluated by running the sample on the FemtoPulse system.
RNA was extracted from head and thorax tissue of idXylSylv1 in the Tree of Life Laboratory at the WSI using TRIzol, according to the manufacturer's instructions.RNA was then eluted in 50 μl RNAse-free water and its concentration assessed using a Nanodrop spectrophotometer and Qubit Fluorometer using the Qubit RNA Broad-Range (BR) Assay kit.Analysis of the integrity of the RNA was done using Agilent RNA 6000 Pico Kit and Eukaryotic Total RNA assay.

Sequencing
Pacific Biosciences HiFi circular consensus and 10X Genomics read cloud DNA sequencing libraries were constructed according to the manufacturers' instructions.Poly(A) RNA-Seq libraries were constructed using the NEB Ultra II RNA Library Prep kit.DNA and RNA sequencing was performed by the Scientific Operations core at the WSI on Pacific Biosciences SEQUEL II (HiFi), Illumina HiSeq 4000 (RNA-Seq) and HiSeq X Ten (10X) instruments.Hi-C data were also generated from abdomen tissue of idXylSylv2 using the Arima v1 kit and sequenced on the HiSeq X Ten instrument.

Genome assembly
Assembly was carried out with Hifiasm (Cheng et al., 2021) and haplotypic duplication was identified and removed with purge_dups (Guan et al., 2020).One round of polishing was performed by aligning 10X Genomics read data to the assembly with Long Ranger ALIGN, calling variants with FreeBayes (Garrison & Marth, 2012).
To evaluate the assembly, MerquryFK was used to estimate k-mer completeness and consensus quality (QV) (Rhie et al., 2020).The genome was analysed and BUSCO scores (Simão et al., 2015) were generated within the Blob-ToolKit environment (Challis et al., 2020. Table 3 contains a list of software tool versions and sources.

Genome annotation
The Ensembl gene annotation system (Aken et al., 2016) was used to generate annotation for the Xylota sylvarum assembly (GCA_905220385.1).Annotation was created primarily through alignment of transcriptomic data to the genome, with gap filling via protein to-genome alignments of a select set of proteins from UniProt (UniProt Consortium, 2019).(Rao et al., 2014) using SALSA2 (Ghurye et al., 2019).The assembly was checked for contamination and corrected using the gEVAL system (Chow et al., 2016) as described previously (Howe et al., 2021).Manual curation  The manuscript reports the genome assembly from an individual male Xylota sylvarum (the Golden-tailed Leafwalker; Arthropoda; Insecta; Diptera; Syrphidae) at a level with 5 chromosomal pseudomolecules including the assembled X sex chromosome.The mitochondrial genome of Xylota sylvarum was also assembled.The manuscript provides useful morphological descriptions of Xylota sylvarum as well as the geographical distribution in Europe.The genome sequence is released openly for reuse.
As a minor revision I suggest to update the manuscript as other Xylota genomes are available: Xylota segnis (ID138115), GCA _963583995.1 and GCA_963583945.1.I also suggest to the authors to explain what is a full phased assembly of a genome.
Once the revision done, since the authors clearly provide a clear description of the insect as well as the methodology followed to obtained the genome, I approve the publication.This work is an important addition to the genome database and will be a useful addition to many study on insect phylogeny for instance.

Are sufficient details of methods and materials provided to allow replication by others? Yes
Are the datasets clearly presented in a useable and accessible format?
The authors report here the genome assembly for Xylota sylvarum (Diptera).The total genome size reached 534.8 Mb and is organized in 119 scaffolds (N50 of 124.8 Mb).The Hi-C data enabled to ensure the assignment of those scaffolds at chromosomal level (4 autosomes and the X sex chromosome), Through Ensembl rapid annotation pipeline it was possible to identify 19,577 transcripts from 11,993 protein coding regions.The note is written according to the template format of the Journal and the data will be valuable for further studies.As a minor revision I suggest clarifying the differences found between the number of predicted genes (12,000) and the number estimated by using the transcriptome data.Are you referring to possible isoforms?Also the mitochondrial genome is reported as 16 kb (Abstract) but in table 2 (MT) the size reported is 20 kb.So please clarify the mtGenome size.

Is the rationale for creating the dataset(s) clearly described? Yes
Are the protocols appropriate and is the work technically sound?Yes

Are sufficient details of methods and materials provided to allow replication by others? No
Are the datasets clearly presented in a useable and accessible format?Yes Competing Interests: No competing interests were disclosed.

Figure 2 .
Figure 2. Genome assembly of Xylota sylvarum, idXylSylv2.1:metrics.The BlobToolKit Snailplot shows N50 metrics and BUSCO gene completeness.The main plot is divided into 1,000 size-ordered bins around the circumference with each bin representing 0.1% of the 534,835,910 bp assembly.The distribution of scaffold lengths is shown in dark grey with the plot radius scaled to the longest scaffold present in the assembly (153,043,062 bp, shown in red).Orange and pale-orange arcs show the N50 and N90 scaffold lengths (124,801,819 and 2,404,465 bp), respectively.The pale grey spiral shows the cumulative scaffold count on a log scale with white scale lines showing successive orders of magnitude.The blue and pale-blue area around the outside of the plot shows the distribution of GC, AT and N percentages in the same bins as the inner plot.A summary of complete, fragmented, duplicated and missing BUSCO genes in the diptera_odb10 set is shown in the top right.An interactive version of this figure is available at https://blobtoolkit.genomehubs.org/view/Xylota%20sylvarum/dataset/CAJMZQ01/snail.

Figure 5 .
Figure 5. Genome assembly of Xylota sylvarum, idXylSylv2.1:Hi-C contact map.Hi-C contact map of the idXylSylv2.1 assembly, visualised using HiGlass.Chromosomes are shown in order of size from left to right and top to bottom.An interactive version of this figure may be viewed at https://genome-note-higlass.tol.sanger.ac.uk/l/?d=RfSk_MelQaWkABlgTka5NA.

©
2024 Hilliou F. This is an open access peer review report distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.Frederique Hilliou 1 Universite Cote d'Azur, Nice, Provence-Alpes-Côte d'Azur, France 2 Institut Sophia Agrobiotech, Institut national de la recherche pour l'agriculture, l'alimentation et l'environnement, Sophia Antipolis, France