The genome sequence of the spotted cranefly, Nephrotoma appendiculata (Pierre, 1919)

We present a genome assembly from an individual male Nephrotoma appendiculata (the spotted cranefly; Arthropoda; Insecta; Diptera; Tipulidae). The genome sequence is 1,138.0 megabases in span. Most of the assembly is scaffolded into 4 chromosomal pseudomolecules, including the X sex chromosome. The mitochondrial genome has also been assembled and is 17.42 kilobases in length. Gene annotation of this assembly on Ensembl identified 17,753 protein coding genes.


Background
The Spotted or Inverted-U Tiger Cranefly Nephrotoma appendiculata is a member of the family Tipulidae, or longpalped craneflies, and as such, it has the fairly typical "Daddy Long Legs" shape of these Diptera.Those in the genus Nephrotoma have black stipes on a yellow background, earning them the name tiger craneflies, and a distinctive pattern of wing venation that distinguishes them from other genera (Stubbs, 2021).
Nephrotoma appendiculata is a moderate sized cranefly with a wing length of 12-15 mm.It usually has a pale stigma, but this can be dark in a few specimens.There is a wide dull black stipe on the dorsal abdomen reaching across to the yellow sides.The determining feature is an upside-down U-shaped black mark around the base of the halteres (Stubbs, 2021).
Nephrotoma appendiculata is a common grassland species with adults flying from April to early June.It is tolerant of a range of pH and moisture levels, preferring unimproved grassland with medium to long grass, on better soils, but avoiding short turf, impoverished grassland and shade (Stubbs, 2021).
The structure of the spermatozoa of Nephrotoma appendiculata was found to be similar to that of several craneflies in the family Limoniidae and this has been used to support the idea that the families Tipulidae and Limoniidae should be combined (Dallai et al., 2008) but they currently remain classified into two families (Chandler, 2023), and the final decision is likely to be based on phylogenetic analyses of DNA sequences.
The genome of the spotted cranefly, Nephrotoma appendiculata, was sequenced as part of the Darwin Tree of Life Project, a collaborative effort to sequence all named eukaryotic species in the Atlantic Archipelago of Britain and Ireland.
Here we present a chromosomally complete genome sequence for Nephrotoma appendiculata, based on one male specimen from Wytham Woods, Oxfordshire, UK.

Genome sequence report
The genome was sequenced from one male Nephrotoma appendiculata (Figure 1) collected from Wytham Woods, Oxfordshire, UK (51.76,.A total of 30-fold coverage in Pacific Biosciences single-molecule HiFi long reads was generated.Primary assembly contigs were scaffolded with chromosome conformation Hi-C data.Manual assembly curation corrected 70 missing joins or mis-joins and removed 17 haplotypic duplications, reducing the assembly length by 0.25% and the scaffold number by 12.06%, and decreasing the scaffold N50 by 45.60%. The final assembly has a total length of 1138.0Mb in 422 sequence scaffolds with a scaffold N50 of 375.9 Mb (Table 1).The snailplot in Figure 2 provides a summary of the assembly statistics, while the distribution of assembly scaffolds on GC proportion and coverage is shown in Figure 3.The cumulative assembly plot in Figure 4 shows curves for subsets of scaffolds assigned to different phyla.Most (99.7%) of the assembly sequence was assigned to 4 chromosomal-level scaffolds, representing 3 autosomes and the X sex chromosome.Chromosome-scale scaffolds confirmed by the Hi-C data are named in order of size (Figure 5; Table 2).While not fully phased, the assembly deposited is of one haplotype.Contigs corresponding to the second haplotype have also been deposited.The mitochondrial genome was also assembled and can be found as a contig within the multifasta file of the genome submission.

Sample acquisition and nucleic acid extraction
A male Nephrotoma appendiculata (specimen ID Ox001277, ToLID idNepAppe1) was netted in Wytham Woods, Oxfordshire (biological vice-county Berkshire), UK (latitude   In sample preparation, the idNepAppe1 sample was weighed and dissected on dry ice (Jay et al., 2023).Tissue from the head and thorax was homogenised using a PowerMasher II tissue disruptor (Denton et al., 2023a).HMW DNA was extracted in the WSI Scientific Operations core using the Automated MagAttract v2 protocol (Oatley et al., 2023).HMW DNA was sheared into an average fragment size of 12-20 kb in a Megaruptor 3 system with speed setting 31 (Bates et al., 2023).Sheared DNA was purified by solid-phase reversible immobilisation (Strickland et al., 2023): in brief, the method employs a 1.8X ratio of AMPure PB beads to sample to eliminate shorter fragments and concentrate the DNA.The concentration of the sheared and purified DNA was assessed using a Nanodrop spectrophotometer and Qubit Fluorometer and Qubit

Genome annotation
The Ensembl gene annotation system (Aken et al., 2016) was used to generate annotation for the Nephrotoma appendiculata assembly (GCA_947310385.1).Annotation was created primarily through alignment of transcriptomic data to the genome, with gap filling via protein-to-genome alignments of a select set of proteins from UniProt (UniProt Consortium, 2019).

Wellcome Sanger Institute -Legal and Governance
The materials that have contributed to this genome note have been supplied by a Darwin Tree of Life Partner.The submission of materials by a Darwin Tree of Life Partner is subject to the 'Darwin Tree of Life Project Sampling Code of Practice', which can be found in full on the Darwin Tree of Life website here.By agreeing with and signing up to the Sampling Code of Practice, the Darwin Tree of Life Partner agrees they will meet the legal and ethical requirements and standards set out within this document in respect of all samples acquired for, and supplied to, the Darwin Tree of Life Project.
Further, the Wellcome Sanger Institute employs a process whereby due diligence is carried out proportionate to the nature of the materials themselves, and the circumstances under which they have been/are to be collected and provided for use.
The purpose of this is to address and mitigate any potential legal and/or ethical implications of receipt and use of the materials as part of the research project, and to ensure that

Adam James Reid
University of Cambridge, Cambridge, UK This genome note represents a very high quality genome assembly of the spotted cranefly, Nephrotoma appendiculata.The standards of methodological reporting and the availability of data and code are excellent.The authors have taken pains to acknowledge the large number of people involved in the (Darwin) Tree of Life endeavour.There are minor issues with some of the presentation, which could be improved in places.
In the section Genome sequence report, I assume that "(51.76,-1.34)" is a geographical reference, but the frame of reference is not given.Perhaps it could say "latitude = 51.76,longitude = -1.34".
Also in the section Genome sequence report.I wanted to check that this phrase is correct: "and decreasing the scaffold N50 by 45.60%", i.e. that after manual assembly the scaffold N50 reduced, presumably due to the breaking of very large scaffolds.The data wasn't there for me to confirm.
I'm afraid that even though I have seen them multiple times before I do not find the snail plots at all straightforward to understand.While I understand the desire to capture diverse and complicated information in a plot that can be automatically generated, I don't think the majority of readers will get much from this.Why are there three different blue colours for the GC content on the smaller contigs?Why is there not more red representing the largest contig and dark orange representing the N50?The assembly for the Tiger Cranefly (Sivell O, et.al., 2023 [Ref 1]) is quite similar, but the plots look very different.Is it possible to interpret the pale grey spiral and which orders of magnitude are represented by the white lines?The immediate solution to these problems would be to add more information to the legend to better guide the reader through what they are seeing.
In Table 1, the benchmark column is generally very useful for giving the reader a good idea of what is expected of a good assembly, however the values in the Sex Chromosomes and Organelles rows don't seem to make sense.We at least need a better explanation of what the possible values might be.Are the benchmarks met in these cases?

Yes
Are the protocols appropriate and is the work technically sound?Yes Are sufficient details of methods and materials provided to allow replication by others?Yes Are the datasets clearly presented in a useable and accessible format?Yes Competing Interests: No competing interests were disclosed.
Reviewer Expertise: Bioinformatics, genomics, transcriptomics, epigenomics, parasitology, developmental biology I confirm that I have read this submission and believe that I have an appropriate level of expertise to confirm that it is of an acceptable scientific standard.

Figure 2 .
Figure 2. Genome assembly of Nephrotoma appendiculata, idNepAppe1.1:metrics.The BlobToolKit Snailplot shows N50 metrics and BUSCO gene completeness.The main plot is divided into 1,000 size-ordered bins around the circumference with each bin representing 0.1% of the 1,138,061,071 bp assembly.The distribution of scaffold lengths is shown in dark grey with the plot radius scaled to the longest scaffold present in the assembly (384,599,746 bp, shown in red).Orange and pale-orange arcs show the N50 and N90 scaffold lengths (375,927,842 and 317,755,181 bp), respectively.The pale grey spiral shows the cumulative scaffold count on a log scale with white scale lines showing successive orders of magnitude.The blue and pale-blue area around the outside of the plot shows the distribution of GC, AT and N percentages in the same bins as the inner plot.A summary of complete, fragmented, duplicated and missing BUSCO genes in the diptera_odb10 set is shown in the top right.An interactive version of this figure is available at https://blobtoolkit.genomehubs.org/view/CAMZJQ01/dataset/CAMZJQ01/snail.

Figure 3 .
Figure 3. Genome assembly of Nephrotoma appendiculata, idNepAppe1.1:BlobToolKit GC-coverage plot.Scaffolds are coloured by phylum.Circles are sized in proportion to scaffold length.Histograms show the distribution of scaffold length sum along each axis.An interactive version of this figure is available at https://blobtoolkit.genomehubs.org/view/CAMZJQ01/dataset/CAMZJQ01/blob.

Figure 4 .
Figure 4. Genome assembly of Nephrotoma appendiculata, idNepAppe1.1:BlobToolKit cumulative sequence plot.The grey line shows cumulative length for all scaffolds.Coloured lines show cumulative lengths of scaffolds assigned to each phylum using the buscogenes taxrule.An interactive version of this figure is available at https://blobtoolkit.genomehubs.org/view/CAMZJQ01/dataset/CAMZJQ01/cumulative.

Figure 5 .
Figure 5. Genome assembly of Nephrotoma appendiculata, idNepAppe1.1:Hi-C contact map of the idNepAppe1.1 assembly, visualised using HiGlass.Chromosomes are shown in order of size from left to right and top to bottom.An interactive version of this figure may be viewed at https://genome-note-higlass.tol.sanger.ac.uk/l/?d=ENODhlutRSaWfOvEvECH7A.

Table 3
contains a list of relevant software tool versions and sources.

Table 3 . Software tools: versions and sources. Software tool Version Is the rationale for creating the dataset(s) clearly described? Yes Are the protocols appropriate and is the work technically sound? Yes Are sufficient details of methods and materials provided to allow replication by others? Yes Are the datasets clearly presented in a useable and accessible format? Yes Competing Interests:
No competing interests were disclosed.

have read this submission and believe that I have an appropriate level of expertise to confirm that it is of an acceptable scientific standard.
This is an open access peer review report distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.