The genome sequence of a Tiger Cranefly, Nephrotoma flavescens (Linnaeus, 1758)

We present a genome assembly from an individual male Nephrotoma flavescens (a Tiger Cranefly; Arthropoda; Insecta; Diptera; Tipulidae). The genome sequence is 1,051.3 megabases in span. Most of the assembly is scaffolded into four chromosomal pseudomolecules, including a partial X sex chromosome. The mitochondrial genome has also been assembled and is 18.9 kilobases in length. Gene annotation of this assembly on Ensembl identified 11,276 protein coding genes.


Background
Nephrotoma flavescens is a large elongate fly (wing length 11-14 mm) with long thin legs. It belongs to the family Tipulidae (Diptera) commonly called craneflies. Species from the genus Nephrotoma are yellow and black and are often referred to as Tiger craneflies. Nephrotoma flavescens can be distinguished from other Nephrotoma species by its pale brown stigma, black narrow dorsal stripe on the abdomen, broad black patch on the back of the head and shining black stripe on the prescutum (Stubbs, 2021).
The species is common in eastern England, but is less common and has a mainly coastal distribution in western England and Wales. In Scotland, N. flavescens is widespread in the south and east, but is uncommon in the west Highlands. It prefers dry conditions and can be found on calcareous grassland, dry neutral grassland, richer types of sandy heath or grassland, and also in rough verges or field edges. The adults can be found from June to July or August, rarely in September (Stubbs, 2021).
The female oviposits into the soil. In laboratory conditions in Lithuania, the eggs hatch after 23 days at 18°C to 25°C (Podeniene et al., 2014). The larvae, commonly called leatherjackets, are greyish-brown. They feed on plants, mainly the underground parts (Colyer & Hammond, 1968), potentially causing commercial losses (Colyer & Hammond, 1968;Hofsvang, 2010), although their economic impact may be questionable, as according to Stubbs (2021) this species avoids improved soils. The last instar larvae have been described by Podeniene (2003) and the first instar larvae by Podeniene et al. (2014).
The genome of Nephrotoma flavescens was sequenced as part of the Darwin Tree of Life Project, a collaborative effort to sequence all named eukaryotic species in the Atlantic Archipelago of Britain and Ireland. Here we present a chromosomally complete genome sequence based on one male specimen from an urban garden in Luton. This genome note will aid research on the phylogeny, taxonomy, biology and ecology of the species.

Genome sequence report
The genome was sequenced from one male Nephrotoma flavescens specimen (Figure 1) collected from Luton, UK (latitude 51.89, longitude -0.39). A total of 35-fold coverage in Pacific Biosciences single-molecule HiFi long reads was generated. Primary assembly contigs were scaffolded with chromosome conformation Hi-C data. Manual assembly curation corrected 214 missing or mis-joins and removed eight haplotypic duplications, reducing the assembly length by 0.26% and the scaffold number by 7.7%, and increasing the scaffold N50 by 1.35%.
The final assembly has a total length of 1,051.3 Mb in 1103 sequence scaffolds with a scaffold N50 of 328.0 Mb (Table 1). Most (90.97%) of the assembly sequence was assigned to four chromosomal-level scaffolds, representing three autosomes, and a partial X sex chromosome. Chromosome-scale scaffolds confirmed by the Hi-C data are named in order of size (Figure 2- Figure 5; Table 2). It is likely that there is a Y chromosome, and that X and Y are highly repetitive, fragmented and for the most part indistinguishable.
The assembly has a BUSCO v5.3.2 (Manni et al., 2021) completeness of 94.6% (single 93.0%, duplicated 1.6%) using the diptera_odb10 reference set. While not fully phased, the assembly deposited is of one haplotype. Contigs corresponding to the second haplotype have also been deposited.

Genome annotation report
The Nephrotoma flavescens (idNepFlae1.1) genome assembly was annotated using the Ensembl rapid annotation pipeline (Table 1; Ensembl Accession number GCA_932526605.1). The resulting annotation includes 19,886 transcribed mRNAs from 11,276 protein-coding and 2,990 non-coding genes.

Sample acquisition and nucleic acid extraction
One male Nephrotoma flavescens specimen ( Figure 1) was collected by Olga Sivell (Natural History Museum, London) on 2 June 2020 by netting in a private urban garden in Luton (51.89, -0.39). This specimen (NHMUK014111048; idNepFlae1) was used for genome sequencing. Another N. flavescens specimen was collected from the same place on 16 June 2020. This specimen (NHMUK014111058; idNepFlae2) was    . Hi-C data were also generated from head tissue of idNepFlae2 using the Arima v2 kit and sequenced on the Illumina NovaSeq 6000 instrument.

Genome assembly
Assembly was carried out with Hifiasm (Cheng et al., 2021) and haplotypic duplication was identified and removed with purge_dups (Guan et al., 2020). The assembly was scaffolded with Hi-C data (Rao et al., 2014) using YaHS (Zhou et al., 2023. The assembly was checked for contamination as described previously (Howe et al., 2021). Manual curation was performed using HiGlass (Kerpedjiev et al., 2018) and Pretext (Harry, 2022). The mitochondrial genome was assembled using MitoHiFi (Uliano-Silva et al., 2022), which performed annotation using MitoFinder (Allio et al., 2020). The genome was analysed, and BUSCO scores were generated within the BlobToolKit environment (Challis et al., 2020). Table 3 contains a list of all software tool versions used, where appropriate.

Genome annotation
The Ensembl gene annotation system (Aken et al., 2016) was used to generate annotation for the Nephrotoma flavescens assembly (GCA_932526605.1). Annotation was created primarily through alignment of transcriptomic data to the genome, with gap filling via protein-to-genome alignments of a select set of proteins from UniProt (UniProt Consortium, 2019).

Ethics and compliance issues
The materials that have contributed to this genome note have been supplied by a Darwin Tree of Life Partner. The submission of materials by a Darwin Tree of Life Partner is subject to the Darwin Tree of Life Project Sampling Code of Practice. By agreeing with and signing up to the Sampling Code of Practice, the Darwin Tree of Life Partner agrees they will meet the legal and ethical requirements and standards set out within this document in respect of all samples acquired The genome sequence is released openly for reuse. The Nephrotoma flavescens genome sequencing initiative is part of the Darwin Tree of Life (DToL) project. All raw sequence data and the assembly have been deposited in INSDC databases. Raw data and assembly accession identifiers are reported in Table 1.