The genome sequence of a heleomyzid fly, Suillia variegata (Loew, 1862)

We present a genome assembly from an individual male Suillia variegata (a heleomyzid fly; Arthropoda; Insecta; Diptera; Heleomyzidae). The genome sequence is 264.0 megabases in span. Most of the assembly is scaffolded into 7 chromosomal pseudomolecules, including the X and Y sex chromosomes. The mitochondrial genome has also been assembled and is 16.17 kilobases in length.


Background
Suillia variegata (Loewe, 1862) is a species of fly in the family Heleomyzidae.This species is found all year round throughout the Palearctic region, peaking in density in April/May and then again in July/August.The distribution of S. variegata extends across all of Britain and Ireland, with the highest recorded occurrence in England and Wales (GBIF Secretariat, 2022).
Suillia variegata are mycophagous and have been intentionally regularly reared from fungi to great success (Buxton, 1960;Papp, 1998).However, they have also been successfully reared from decaying flowers, roots and bird's nests (Rotheray, 2012).This suggests a mixed feeding strategy around decaying plant matter across a wide range of situations.As a result, S. variegata are present in various habitats but are most frequently found in shaded areas near fungi and decaying plant matter.With this in mind, beer traps have been useful in gathering S. variegata specimens (Preisler & Roháček, 2012).
Suillia variegata adults have an earthy brown thorax with sparse hairs and a striped abdomen that is pale on the lateral side body.The wings are mostly clear with a small pale area at the apex and a darker area just above.Adults can be distinguished from other Heleomyzidae by the regularly spaced spines on the fore edge of the wings.Suillia variegata puparium can be distinguished from other Heleomyzidae with a red-brown ground colour, a wide band of dorsal spicules on abdominal segments 5 to 7 and greater distribution of dorsal spicules posteriorly (Rotheray, 2012).
The genome of Suillia variegata was sequenced as part of the Darwin Tree of Life Project, a collaborative effort to sequence all named eukaryotic species in the Atlantic Archipelago of Britain and Ireland.

Genome sequence report
The genome was sequenced from one male Suillia variegata (Figure 1) collected from Wytham Woods, Oxfordshire, UK (51.77, -1.33).A total of 88-fold coverage in Pacific Biosciences single-molecule HiFi long reads was generated.Primary assembly contigs were scaffolded with chromosome conformation Hi-C data.Manual assembly curation corrected 34 missing joins or mis-joins, reducing the scaffold number by 24.32%.
The final assembly has a total length of 264.0 Mb in 27 sequence scaffolds with a scaffold N50 of 49.5 Mb (Table 1).Most (99.57%) of the assembly sequence was assigned to seven chromosomal-level scaffolds, representing 5 autosomes and the X and Y sex chromosomes.Chromosome-scale scaffolds confirmed by the Hi-C data are named in order of size (Figure 2-Figure 5; Table 2).While not fully phased, the assembly deposited is of one haplotype.Contigs corresponding to the second haplotype have also been deposited.The mitochondrial genome was also assembled and can be found as a contig within the multifasta file of the genome submission.
Metadata for specimens, spectral estimates, sequencing runs, contaminants and pre-curation assembly statistics can be found at https://links.tol.sanger.ac.uk/species/1230148.

Sample acquisition and nucleic acid extraction
A male Suillia variegata (specimen ID Ox002179, ToLID idSuiVari3) was collected from Wytham Woods, Oxfordshire (biological vice-county Berkshire), UK (latitude 51.77, longitude -1.33) on 2022-05-19.The specimen was collected and identified by Steven Falk (independent researcher) and preserved on dry ice.
The specimen used for Hi-C scaffolding was a male S. variegata (specimen ID NHMUK014449032, ToLID idSuiVari2), collected and identified by Duncan Sivell (Natural History Museum) from the Natural History Museum Wildlife Garden on 2021-04-21.This specimen was dry frozen at -80°C.
The idSuiVari3 sample was prepared for DNA extraction at the Tree of Life laboratory, Wellcome Sanger Institute (WSI).The sample was weighed and dissected on dry ice with tissue set aside for Hi-C sequencing.Tissue from the whole organism  was disrupted using a Nippi Powermasher fitted with a BioMasher pestle.DNA was extracted at the WSI Scientific Operations core using the Qiagen MagAttract HMW DNA kit, according to the manufacturer's instructions.

Sequencing
Pacific Biosciences HiFi circular consensus DNA sequencing libraries were constructed according to the manufacturers' instructions.DNA sequencing was performed by the Scientific Operations core at the WSI on a Pacific Biosciences SEQUEL II (HiFi) instrument.Hi-C data were also generated from head and thorax tissue of idSuiVari2 using the Arima2 kit and sequenced on the Illumina NovaSeq 6000 instrument.Further, the Wellcome Sanger Institute employs a process whereby due diligence is carried out proportionate to the   that in doing so we align with best practice wherever possible.The overarching areas of consideration are:

Genome assembly, curation and evaluation
• Ethical review of provenance and sourcing of the material  The data note by Falk et al. presents a genome assembly from male of the heleomyzid fly, Suillia variegata (Loew, 1862).The final assembly is 264 Mb.This genome information will be interesting for comparative genomics investigation and also for evolution and conservation studies.The methodology is thorough and explained and the figures are well constructed and the data note is well written.
I have no notes to add.

Is the rationale for creating the dataset(s) clearly described? Yes
Are the protocols appropriate and is the work technically sound?Yes

Are the datasets clearly presented in a useable and accessible format? Yes
Competing Interests: No competing interests were disclosed.
I confirm that I have read this submission and believe that I have an appropriate level of expertise to confirm that it is of an acceptable scientific standard.

Darren J Obbard
University of Edinburgh, Edinburgh EH, UK This data note reports the sequencing and assembly of the genome of Suillia variegata as part of the "Darwin Tree of Life" programme.In common with other data notes from this research effort, the reporting is standardised and quite brief.As such, I have very few comments to make.
The approach is state-of-the-art, the raw data appear to be of a suitably high quality, and the assembly methods are appropriate.The public availability of raw data and genome assembly are appropriate.The resulting genome is likely to be of very high quality, and I have no doubt that it will be of great value to any researchers working on the comparative or evolutionary genomics of insects.
I really have almost no suggestions for improvement: I usually suggest cross-references to the research literature -but I really cannot find any that were missed.
I suggest including some better photos of the species to provide context -ideally male and female in life.'I'm sure the first author must have some available.' 1.
It would be good to say something about the global distribution, to put the UK sample in perspective.

2.
The percentage sign is duplicated in "…of the final assembly is 60.3 with k-mer completeness of 100%% …" 3.

Is the rationale for creating the dataset(s) clearly described? Yes
Are the protocols appropriate and is the work technically sound?Yes

Are sufficient details of methods and materials provided to allow replication by others? Yes
Are the datasets clearly presented in a useable and accessible format?Yes Competing Interests: No competing interests were disclosed.

Figure 2 .
Figure 2. Genome assembly of Suillia variegata, idSuiVari3.1:metrics.The BlobToolKit Snailplot shows N50 metrics and BUSCO gene completeness.The main plot is divided into 1,000 size-ordered bins around the circumference with each bin representing 0.1% of the 263,984,139 bp assembly.The distribution of scaffold lengths is shown in dark grey with the plot radius scaled to the longest scaffold present in the assembly (62,750,091 bp, shown in red).Orange and pale-orange arcs show the N50 and N90 scaffold lengths (49,452,834 and 43,836,460 bp), respectively.The pale grey spiral shows the cumulative scaffold count on a log scale with white scale lines showing successive orders of magnitude.The blue and pale-blue area around the outside of the plot shows the distribution of GC, AT and N percentages in the same bins as the inner plot.A summary of complete, fragmented, duplicated and missing BUSCO genes in the diptera_ odb10 set is shown in the top right.An interactive version of this figure is available at https://blobtoolkit.genomehubs.org/view/idSuiVari3.1/ dataset/CASBRG01/snail.

Figure 3 .
Figure 3. Genome assembly of Suillia variegata, idSuiVari3.1:BlobToolKit GC-coverage plot.Scaffolds are coloured by phylum.Circles are sized in proportion to scaffold length.Histograms show the distribution of scaffold length sum along each axis.An interactive version of this figure is available at https://blobtoolkit.genomehubs.org/view/idSuiVari3.1/dataset/CASBRG01/blob.

Figure 4 .
Figure 4. Genome assembly of Suillia variegata, idSuiVari3.1:BlobToolKit cumulative sequence plot.The grey line shows cumulative length for all scaffolds.Coloured lines show cumulative lengths of scaffolds assigned to each phylum using the buscogenes taxrule.An interactive version of this figure is available at https://blobtoolkit.genomehubs.org/view/idSuiVari3.1/dataset/CASBRG01/cumulative.

Figure 5 .
Figure 5. Genome assembly of Suillia variegata, idSuiVari3.1:Hi-C contact map of the idSuiVari3.1 assembly, visualised using HiGlass.Chromosomes are shown in order of size from left to right and top to bottom.An interactive version of this figure may be viewed at https://genome-note-higlass.tol.sanger.ac.uk/l/?d=ZA72LrlPSAOQgCpfERxzXw.
nature of the materials themselves, and the circumstances under which they have been/are to be collected and provided for use.The purpose of this is to address and mitigate any potential legal and/or ethical implications of receipt and use of the materials as part of the research project, and to ensureRodolpho S.T MenezesDepartment of Biological Sciences, State University of Santa Cruz, Ilhéus, Brazil

Reviewer
Report 08 August 2024 https://doi.org/10.21956/wellcomeopenres.21746.r89661© 2024 Obbard D. This is an open access peer review report distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Table 3 . Software tools: versions and sources. Table 2. Chromosomal pseudomolecules in the genome assembly of Suillia variegata, idSuiVari3.
The genome sequence is released openly for reuse.The Suillia variegata genome sequencing initiative is part of the Darwin Tree of Life (DToL) project.All raw sequence data and the assembly have been deposited in INSDC databases.The genome will be annotated using available RNA-Seq data and presented through the Ensembl pipeline at the European Bioinformatics Institute.Raw data and assembly accession identifiers are reported in Table1.Members of the Darwin Tree of Life Consortium are listed here: https://doi.org/10.5281/zenodo.4783558.