The genome sequence of the hawkweed Cheilosia, Cheilosia urbana (Meigen, 1822)

We present a genome assembly from an individual female Cheilosia urbana (the hawkweed Cheilosia; Arthropoda; Insecta; Diptera; Syrphidae). The genome sequence is 546.9 megabases in span. Most of the assembly is scaffolded into 5 chromosomal pseudomolecules, including the X sex chromosome. The mitochondrial genome has also been assembled and is 17.08 kilobases in length.


Background
Cheilosia urbana (Meigen, 1822) is a common and widespread Palaearctic spring-flying species, that inhabits grasslands as well as open areas in both coniferous and deciduous forests and scrublands (Speight, 2020).This hoverfly occurs from the United Kingdom eastward through central and southern Europe to the Balkans and Turkey.The flight season extends from April to June, and even to July at higher altitudes and/or more northern latitudes.Adults have been found to visit a variety of flowers, including Acer pseudoplatanus, Anemone nemorosa, Prunus spinosa and species of the genera Salix, Taraxacum, Euphorbia and Potentilla.
In addition to pollination service, individuals of C. urbana also have a role as biological control agents.The English common name 'hawkweed Cheilosia' derives from its association with mouse-ear hawkweed, Hieracium pilosella (Doczkal, 1996).The females oviposit on leaf axils, afterwards larvae migrate into the soil and feed externally on the roots of the plant, in which they make small holes.Moreover, Grosskopf et al. (2002) revealed that neonate C. urbana larvae fed and completed development to the adult stage on eight different Hieracium spp., approving the potential of this hoverfly in management strategy of weeds.This is of particular importance in New Zealand, as well as the North and South Americas, where Hieracium species of Eurasian origin represent invasive alien plants that threaten to cause great economic and environmental harm (Grosskopf, 2005;Grosskopf et al., 2002).This is the first production of a high-quality C. urbana genome and we believe that the sequence described here, generated as part of the Darwin Tree of Life project, will further aid understanding of the biology and ecology of this hoverfly.

Genome sequence report
The genome was sequenced from one female Cheilosia urbana (Figure 1) collected from Wytham Woods, Oxfordshire, UK (51.76, -1.34).A total of 32-fold coverage in Pacific Biosciences single-molecule HiFi long reads was generated.Primary assembly contigs were scaffolded with chromosome conformation Hi-C data.Manual assembly curation corrected 48 missing joins or mis-joins and removed 6 haplotypic duplications, reducing the assembly length by 0.51% and the scaffold number by 41.94%, and increasing the scaffold N50 by 1.65%.
The final assembly has a total length of 546.9 Mb in 35 sequence scaffolds with a scaffold N50 of 172.0 Mb (Table 1).Most (99.7%) of the assembly sequence was assigned to 5 chromosomal-level scaffolds, representing 4 autosomes and the X sex chromosome.The assemblies used as comparators to identify the X chromosome were Cheilosia pagana (GCA_936431705.1)and Cheilosia vulpina (GCA_916610125).1.Chromosome-scale scaffolds confirmed by the Hi-C data are named in order of size (Figure 2-Figure 5; Table 2).While not fully phased, the assembly deposited is of one haplotype.Contigs corresponding to the second haplotype have also been deposited.The mitochondrial genome was also assembled and can be found as a contig within the multifasta file of the genome submission.
Metadata for specimens, spectral estimates, sequencing runs, contaminants and pre-curation assembly statistics can be found at https://links.tol.sanger.ac.uk/species/173985.

Sample acquisition and nucleic acid extraction
The specimen selected for genome sequencing was a female Cheilosia urbana (specimen ID Ox001288, individual idCheUrba1, while the specimen used for Hi-C scaffolding was a male C. urbana (specimen ID Ox001310, individual idCheUrba2).Both specimens were netted in Wytham Woods, Oxfordshire (biological vice-county Berkshire), UK (latitude 51.76, longitude -1.34) on 2021-04-23.Steven Falk (independent researcher) collected and identified the specimens.The specimens were snap-frozen on dry ice.
The specimen was prepared for DNA extraction at the Tree of Life laboratory, Wellcome Sanger Institute (WSI).The idCheU-rba1 sample was weighed and dissected on dry ice.Whole organism tissue was disrupted using a Nippi Powermasher fitted with a BioMasher pestle.DNA was extracted at the Wellcome Sanger Institute (WSI) Scientific Operations core using the Qiagen MagAttract HMW DNA kit, according to the manufacturer's instructions.

Sequencing
Pacific Biosciences HiFi circular consensus DNA sequencing libraries were constructed according to the manufacturers' instructions.DNA sequencing was performed by the Scientific Operations core at the WSI on a Pacific Biosciences SEQUEL II (HiFi) instrument.Hi-C data were also generated from whole organism tissue of idCheUrba2 using the Arima2 kit and sequenced on the Illumina NovaSeq 6000 instrument.2013) and uses these annotations to select the final mitochondrial contig and to ensure the general quality of the sequence.
A Hi-C map for the final assembly was produced using bwa-mem2 (Vasimuddin et al., 2019) in the Cooler file format (Abdennur & Mirny, 2020).To assess the assembly metrics, the k-mer completeness and QV consensus quality values were calculated in Merqury (Rhie et al., 2020).This work was done using Nextflow (Di Tommaso et al., 2017) DSL2 pipelines "sanger-tol/ readmapping" (Surana et al., 2023a) and "sanger-tol/genomenote" (Surana et al., 2023b).The genome was analysed within the BlobToolKit environment (Challis et al., 2020) and BUSCO scores (Manni et al., 2021;Simão et al., 2015) were calculated.Further, the Wellcome Sanger Institute employs a process whereby due diligence is carried out proportionate to the nature of the materials themselves, and the circumstances under which they have been/are to be collected and provided for use.The purpose of this is to address and mitigate any potential legal and/or ethical implications of receipt and use of the materials as part of the research project, and to ensure that in doing so we align with best practice wherever possible.The overarching areas of consideration are: • Ethical review of provenance and sourcing of the material   The authors clearly mention the "why" behind this genome assembly.European hawkweeds are one of the troublesome weeds that have become invasive outside of their native range.Thus, biological control of such invasive species is crucial, especially when mechanical control methods are difficult and ineffective.The introduction of natural enemies or predators is one of the ways in which we can recreate/ensure such balance.
Using the 3C criterion (contiguity, completeness, and correctness) for genome assembly evaluation, the authors present a highly contiguous assembly with a 97.1% score on the BUSCO v5.3.2 completeness scale, with minimal duplicated sequences (0.7%).They have also assembled the X-sex chromosome and mitochondrial genome.The sections and paragraphs are coherent, and figures and tables are appropriately labeled and aligned with the text.We found the manuscript itself to be very informative, with sufficient detail.The genome will be a great resource to the Diptera research community and will surely aid in understanding this hoverfly and its ecology.
We, however, had just a few suggestions and comments that would help improve this resource: 1.While the assembly, annotation, and quality assessment methods are well described and the metadata link is provided, it would greatly help the community if the authors were to share their code/scripts/pipeline via a GitHub/Zenodo page.This would also ensure the reproducibility of the methods.
2. Since it was mentioned that Cheilosia pagana and Cheilosia vulpina assemblies were used as comparators to identify the X chromosome for C. urbana, we do agree with Ivań Ballester-Torres's comment on adding taxonomic complexity of Cheilosia due to the morphological variability and the resemblance among species into the manuscript.

Is the rationale for creating the dataset(s) clearly described? Yes
Are the protocols appropriate and is the work technically sound?Yes

Are sufficient details of methods and materials provided to allow replication by others? Partly
Are the datasets clearly presented in a useable and accessible format?Partly Competing Interests: No competing interests were disclosed.
We confirm that we have read this submission and believe that we have an appropriate level of expertise to confirm that it is of an acceptable scientific standard.

Arun Arumugaperumal
Department of Biotechnology, Rajalakshmi Engineering College, Chennai, Tamilnadu, India This is the first report of the whole genome of hawkweed Cheilosia urbana.The genome size reported was 546.9 Mbp.The assembly consisted of 5 pseudochromosomes including one X chromosome.A closely related species C. vernalis had its genome sequenced and was reported to be 441.6Mbp with 6 pseudochromosomes including one X chromosome.Although the technologies used for sequencing were similar except for the fold coverage, there was a huge difference in the genome assembly in terms of size and number of chromosomes.This makes way to comparison studies between the species using the genomic data.The sequencing data is very useful in getting insights in this angle.
The assembly consisted of 152 contigs with N50 value of 172 Mb.Long read sequencing technology was used and the Pacific Biosciences SEQUEL II machine was used to sequence the samples with 32X coverage.BUSCO analysis indicates that there is a completeness of 97 percentage.Taken all together this data provides a valuable source for further research about the insect.
The hawkweed plant is an alien invasive species and this insect feeds on the plant in its larva stage and controls the spread.I think the genome data reported herewith will help to study the relationship between the plant and the animal in greater depths.However the genome of the plant is yet to be sequenced.
The accession numbers presented here are working fine and the methods used are technically sound.Hence the data article can be accepted for indexing.Reviewer Expertise: Entomology, Taxonomy and Systematics in Syrphidae, specifically in the genus Cheilosia I confirm that I have read this submission and believe that I have an appropriate level of expertise to confirm that it is of an acceptable scientific standard, however I have significant reservations, as outlined above.

Figure 2 .
Figure 2. Genome assembly of Cheilosia urbana, idCheUrba1.1:metrics.The BlobToolKit Snailplot shows N50 metrics and BUSCO gene completeness.The main plot is divided into 1,000 size-ordered bins around the circumference with each bin representing 0.1% of the 546,942,332 bp assembly.The distribution of scaffold lengths is shown in dark grey with the plot radius scaled to the longest scaffold present in the assembly (181,276,847 bp, shown in red).Orange and pale-orange arcs show the N50 and N90 scaffold lengths (171,959,956 and 80,646,311 bp), respectively.The pale grey spiral shows the cumulative scaffold count on a log scale with white scale lines showing successive orders of magnitude.The blue and pale-blue area around the outside of the plot shows the distribution of GC, AT and N percentages in the same bins as the inner plot.A summary of complete, fragmented, duplicated and missing BUSCO genes in the diptera_odb10 set is shown in the top right.An interactive version of this figure is available at https://blobtoolkit.genomehubs.org/view/idCheUrba1.1/dataset/CAMLCR01/snails.

Figure 3 .
Figure 3. Genome assembly of Cheilosia urbana, idCheUrba1.1:BlobToolKit GC-coverage plot.Scaffolds are coloured by phylum.Circles are sized in proportion to scaffold length.Histograms show the distribution of scaffold length sum along each axis.An interactive version of this figure is available at https://blobtoolkit.genomehubs.org/view/idCheUrba1.1/dataset/CAMLCR01/blob.

Figure 5 .
Figure 5. Genome assembly of Cheilosia urbana, idCheUrba1.1:Hi-C contact map of the idCheUrba1.1 assembly, visualised using HiGlass.Chromosomes are shown in order of size from left to right and top to bottom.An interactive version of this figure may be viewed at https://genome-note-higlass.tol.sanger.ac.uk/l/?d=LxrU8VG5TYuxrLOcPeXiKg.

Reviewer Report 26
July 2024 https://doi.org/10.21956/wellcomeopenres.21679.r87185© 2024 Arumugaperumal A. This is an open access peer review report distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

©
2024 Brown T. This is an open access peer review report distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.Thomas Brown Leibniz IZW (Berlin) & ERGA/BGP, Leibniz, Germany Falk et al. present the genome assembly of the hawkweed Cheilosia based on PacBio HiFi and Illumina HiC reads.The genome itself is of high quality and will certainly aid researchers in their genetic explorations of this phylum, however I feel there are a number of issues that need to be Is the rationale for creating the dataset(s) clearly described?Yes Are the protocols appropriate and is the work technically sound?Yes Are sufficient details of methods and materials provided to allow replication by others?Yes Are the datasets clearly presented in a useable and accessible format?Yes Competing Interests: No competing interests were disclosed.

Table 3
contains a list of relevant software tool versions and sources.

Is the rationale for creating the dataset(s) clearly described? Yes Are the protocols appropriate and is the work technically sound? Yes Are sufficient details of methods and materials provided to allow replication by others? Yes Are the datasets clearly presented in a useable and accessible format? Yes Competing Interests:
1. Falk S, Woodcock K, University of Oxford and Wytham Woods Genome Acquisition Lab, Darwin Tree of Life Barcoding collective, et al.: The genome sequence of the yarrow Cheilosia, Cheilosia vernalis (Fallén, 1817).Wellcome Open Research.2024; 9. Publisher Full Text No competing interests were disclosed.

confirm that I have read this submission and believe that I have an appropriate level of expertise to confirm that it is of an acceptable scientific standard.
https://doi.org/10.21956/wellcomeopenres.21679.r87194