The genome sequence of a druid fly, Clusia tigrina (Fallén, 1820)

We present a genome assembly from an individual male Clusia tigrina (a druid fly; Arthropoda; Insecta; Diptera; Clusiidae). The genome sequence is 1,216.4 megabases in span. Most of the assembly is scaffolded into 5 chromosomal pseudomolecules, including the X and Y sex chromosomes. The mitochondrial genome has also been assembled and is 17.68 kilobases in length.


Background
Clusia tigrina is a member of the Clusiidae family, subfamily Clusiinae (Halliday, 1838), commonly known as druid flies (Lonsdale, 2017).Druid flies have characteristic antennae, in which the second segment has a triangular projection over the third segment when viewed laterally.Flies of this family have dark brown to pale yellow narrow bodies (2.5-6.0 mm long) and variable anterodistal wing infuscations (Fu et al., 2010;Kazerani et al., 2020).Their species-specific morphologies include different patterns of spots and brown stripes (Lonsdale, 2017).C. tigrina can be distinguished from other closely related druid flies, such as C. flava, by the three prominent dark brown marks on their wings (Kazerani et al., 2020).
Clusiidae are distributed worldwide, but most occur in tropical regions and only 15 species have been identified from Europe (Hellqvist, 2018).They are more abundant in tropical regions, but it remains likely that more species await discovery in temperate biomes (Lonsdale, 2017).Clusia tigrina has been recorded mainly in western Europe and Scandinavia, with sparse records from Serbia and Russia (GBIF Secretariat, 2022).It is associated with forested habitats, with plenty of large, mature trees, since their saproxylic larvae develop in deadwood (Roháček et al., 2017).C. tigrina is a rare fly in Britain and Ireland (Falk, 1991), although recently there has been an increase in records.
Male C. tigrina (and other Clusiidae) have been observed in competitive courtship displays called 'lekking', in which they gather in one place for the purpose of attracting females to the area (Rathore et al., 2023).
Studies on the phylogeny and evolution of druid flies tend to rely on morphological data (Lonsdale, 2017), which may be limited, therefore, integrating molecular data may provide a more comprehensive understanding.The availability of high-quality genome data could help reconstruct the phylogeny and evolutionary history of C. tigrina.Here, we present a chromosomally complete genome sequence for C. tigrina based on one male specimen from Wytham Woods, Oxfordshire.This is the first whole genome sequence for a member of Clusiidae, and it is anticipated that it will provide a foundation for understanding biodiversity, evolutionary history and the genetic variation underlying the different morphological traits of this group.

Genome sequence report
The genome was sequenced from one male Clusia tigrina (Figure 1) collected from Wytham Woods, Oxfordshire, UK (latitude 51.76, longitude -1.32).A total of 31-fold coverage in Pacific Biosciences single-molecule HiFi long reads and 30-fold coverage in 10X Genomics read clouds was generated.Primary assembly contigs were scaffolded with chromosome conformation Hi-C data.Manual assembly curation corrected 271 missing joins or mis-joins and removed 5 haplotypic duplications, reducing the assembly length by 0.17% and the scaffold number by 24.06%, and increasing the scaffold N50 by 464.14%.
The final assembly has a total length of 1,216.4Mb in 665 sequence scaffolds with a scaffold N50 of 230.2 Mb (Table 1).A summary of the assembly statistics is shown in Figure 2, while the distribution of assembly scaffolds on GC proportion and coverage is shown in Figure 3.The cumulative assembly plot in Figure 4 shows curves for subsets of scaffolds assigned to different phyla.Most (92.58%) of the assembly sequence was assigned to 5 chromosomal-level scaffolds, representing 3 autosomes and the X and Y sex chromosomes.Chromosome-scale scaffolds confirmed by the Hi-C data are named in order of size (Figure 5; Table 2).Chromosome X contains a large region of low confidence from approximately 58. 61-118.49Mb.This block consists of numerous scaffolds with relatively high repeat content where the Hi-C signal is ambiguous in terms of being able to provide a clear order and orientation for the affected scaffolds.In addition, there is a repetitive region of low confidence on Chromosome 2 from approximately 60.32-92.32Mb, and it was not possible to achieve an accurate order and orientation for the scaffolds in this location.While not fully phased, the assembly deposited is of one haplotype.Contigs corresponding to the second haplotype have also been deposited.The mitochondrial genome was also assembled and can be found as a contig within the multifasta file of the genome submission.

Sample acquisition and nucleic acid extraction
One male (specimen ID Ox000706, idCluTigr1) and one female (specimen ID Ox000707, idCluTigr2) of Clusia tigrina were collected from Wytham Woods, Oxfordshire (biological vice-county Berkshire), UK (latitude 51.76, longitude -1.32) on 2020-07-24.Liam Crowley (University of Oxford) collected and identified the specimens, which were then preserved on dry ice.The male specimen (idCluTigr1) was used for genome sequencing, while the female (idCluTigr2) was used for Hi-C scaffolding.
DNA was extracted at the Tree of Life Laboratory, Wellcome Sanger Institute (WSI).The idCluTigr1 sample was weighed     MitoFinder (Allio et al., 2020) or MITOS (Bernt et al., 2013) and uses these annotations to select the final mitochondrial contig and to ensure the general quality of the sequence.
Table 3 contains a list of relevant software tool versions and sources.

Wellcome Sanger Institute -Legal and Governance
The materials that have contributed to this genome note have been supplied by a Darwin Tree of Life Partner.The submission of materials by a Darwin Tree of Life Partner is subject to the 'Darwin Tree of Life Project Sampling Code of Practice', which can be found in full on the Darwin Tree of Life website here.By agreeing with and signing up to the Sampling Code of Practice, the Darwin Tree of Life Partner agrees they will meet the legal and ethical requirements and standards set out within this document in respect of all samples acquired for, and supplied to, the Darwin Tree of Life Project.
Further, the Wellcome Sanger Institute employs a process whereby due diligence is carried out proportionate to the nature

Software tool Version
of the materials themselves, and the circumstances under which they have been/are to be collected and provided for use.The purpose of this is to address and mitigate any potential legal and/or ethical implications of receipt and use of the materials as part of the research project, and to ensure that in doing so we align with best practice wherever possible.The overarching areas of consideration are: • Ethical review of provenance and sourcing of the material This is yet another high-quality genome assembly from the Welcome Sanger Institute.It is a very nice illustration of how good today's technology has gotten with respect to both the physical and digital machinery.This is perhaps best illustrated by having only two authors manuscript describing this genome assembly, as compared to the very long author lists of earlier publications.
In order to review a genome assembly paper, there are two things to look at, the actual genome assembly, and the manuscript describing it.I spent some time looking for genes I am interested in in this assembly.It is really a good and nice assembly.As to the manuscript itself, I don't see any missing details.Tomas N. Generalovic Zoology, University of Cambridge, Cambridge, England, UK This article presents a genome assembly for the species Clusia tigrine, a species of which appears to be of primary interest in biodiversity monitoring and morphological analysis.This genome assembly is of high-quality and adds a significant and relatively complete resource to the public domain.Whilst reports on mitochondrial and nuclear markers have been reported for Clusiidae species [1], this appears to be the first whole genome resource available for the Clusiidae.The 1.2 Gb genome of the druid fly is assembled into three autosomes and two sex chromosomes.A typical XY sex determination system is identified.This reference is generated from a male individual and an array of sequencing technology allowing (pseudo-)chromosomal scaffolds to be assembled.This resource will likely be of benefit for studying the evolutionary trajectory of the species and wider genera and appears of particular interest in behavioural investigations but seemingly the most beneficial insights will be to monitor biodiversity.This fly was sampled in the UK (Wytham Woods, Oxford) where it appears to be a rare observation.
The genome appears to have several scaffolds orientated with low confidence, however, overall, the assembly appears to exceed several of the general benchmarks suggested by [2], with the exception of "Percentage of assembly mapped to chromosomes".The second (incomplete) haplotype was also only assembled to contig-level with only 540.9 Mb sequence assembled.
The rationale for this work is clear and the methodological approach is using up-to-date technology.All interactive links appear in working order and the data is available in the public domain as stated.
Overall, this article is scientifically sound as a data note with only minor comments related to grammar etc. which are provided below: Several incidents of values less than 10 being referred to in the numerical form, values <10 e.g."5" instead of "five".Advice changes throughout.
Under the Data availability statement the Wellcome Sanger Institute, 2021 reference is incorrectly placed after the full top.

Yes
Competing Interests: No competing interests were disclosed.
Reviewer Expertise: Evolutionary Biology I confirm that I have read this submission and believe that I have an appropriate level of expertise to confirm that it is of an acceptable scientific standard.

Figure 2 .
Figure 2. Genome assembly of Clusia tigrina, idCluTigr1.2:metrics.The BlobToolKit Snailplot shows N50 metrics and BUSCO gene completeness.The main plot is divided into 1,000 size-ordered bins around the circumference with each bin representing 0.1% of the 1,216,395,172 bp assembly.The distribution of scaffold lengths is shown in dark grey with the plot radius scaled to the longest scaffold present in the assembly (429,819,325 bp, shown in red).Orange and pale-orange arcs show the N50 and N90 scaffold lengths (230,177,572 and 1,864,154 bp), respectively.The pale grey spiral shows the cumulative scaffold count on a log scale with white scale lines showing successive orders of magnitude.The blue and pale-blue area around the outside of the plot shows the distribution of GC, AT and N percentages in the same bins as the inner plot.A summary of complete, fragmented, duplicated and missing BUSCO genes in the diptera_ odb10 set is shown in the top right.An interactive version of this figure is available at https://blobtoolkit.genomehubs.org/view/idCluTigr1.1/ dataset/CAKKTE01/snail.

Figure 5 .
Figure 5. Genome assembly of Clusia tigrina, idCluTigr1.2:Hi-C contact map of the idCluTigr1.2assembly, visualised using HiGlass.Chromosomes are shown in order of size from left to right and top to bottom.An interactive version of this figure may be viewed at https://genome-note-higlass.tol.sanger.ac.uk/l/?d=NWiOFk6uQqSFmLU8Voirqw.

the rationale for creating the dataset(s) clearly described? Yes Are the protocols appropriate and is the work technically sound? Yes Are sufficient details of methods and materials provided to allow replication by others? Yes Are the datasets clearly presented in a useable and accessible format? Yes Competing Interests:
No competing interests were disclosed.

have read this submission and believe that I have an appropriate level of expertise to confirm that it is of an acceptable scientific standard. Jan Veenstra INCIA
, UMR 5287, CNRS, University Bordeaux, Bordeaux, Gironde, France

the rationale for creating the dataset(s) clearly described? Yes Are the protocols appropriate and is the work technically sound? Yes Are sufficient details of methods and materials provided to allow replication by others? Yes Are the datasets clearly presented in a useable and accessible format? Yes Competing Interests:
Rao et al., 2014 referencencein the text misses the hyperlink.No competing interests were disclosed.

confirm that I have read this submission and believe that I have an appropriate level of expertise to confirm that it is of an acceptable scientific standard.
This is an open access peer review report distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.