The genome sequence of a stonefly, Nemoura dubitans (Morton, 1894)

We present a genome assembly from an individual female Nemoura dubitans (a stonefly; Arthropoda; Insecta; Plecoptera; Nemouridae). The genome sequence is 321.0 megabases in span. Most of the assembly is scaffolded into 6 chromosomal pseudomolecules, including the X sex chromosome. The mitochondrial genome has also been assembled and is 15.73 kilobases in length.


Background
Nemoura dubitans is a western Palaearctic species found across central Europe from France to Romania and north to Fennoscandia.It is absent from Wales, Scotland and Ireland, and has a highly localised distribution in the south of England.
It is predominantly found in shallow, heavily vegetated marshes, often where groundwater springs emerge at the surface.The water is often stagnant with large quantities of dead leaves and other rotting material present (Koese, 2008).There appears to be an association with peat soils with severe records from England being found in fens (Bratton, 1990).In the western Carpathians N. dubitans is found exclusively in fens overgrown with sedges and Sphagnum (Bojková & Helešic, 2009).However, in Slovakia this species is also found in sandy lowland streams and small rivers (Krno, 2004).In Finland sites with N. dubitans were mainly first-order streams with low levels of disturbance by human activities (Vuori et al., 2006).This species has also been found in reed-lined bays of small lakes in Germany (Zwick, 2004).
Very little is known about the life history of Nemoura dubitans.
It is thought that it has a univoltine life cycle with larvae present in the winter and spring (Graf et al., 2009;Krno, 2004).Adults emerge between April and June in England (Bratton, 1990;Hynes, 1977).
The genome of the stonefly, Nemoura dubitans, was sequenced as part of the Darwin Tree of Life Project, a collaborative effort to sequence all named eukaryotic species in the Atlantic Archipelago of Britain and Ireland.Here we present a chromosomally complete genome sequence for Namoura dubitans, based on one female specimen from Surlingham, UK.

Genome sequence report
The genome was sequenced from one female Nemoura dubitans (Figure 1) collected from Surlingham, UK (52.61,1.41).A total of 62-fold coverage in Pacific Biosciences single-molecule HiFi long reads and 192-fold coverage in 10X Genomics read clouds were generated.Primary assembly contigs were scaffolded with chromosome conformation Hi-C data.Manual assembly curation corrected 54 missing joins or mis-joins and removed one haplotypic duplication, reducing the scaffold number by 28.57%, and increasing the scaffold N50 by 15.84%.
The final assembly has a total length of 321.0 Mb in 40 sequence scaffolds with a scaffold N50 of 53.1 Mb (Table 1).Most (99.12%) of the assembly sequence was assigned to 6 chromosomal-level scaffolds, representing 5 autosomes and the X sex chromosome.Chromosome-scale scaffolds confirmed by the Hi-C data are named in order of size (Figure 2-Figure 5; Table 2).The assignment of chromosome X was based on alignments to the assembly of Nemurella pictetii (Macadam et al., 2022).While not fully phased, the assembly deposited is of one haplotype.Contigs corresponding to the second haplotype have also been deposited.The mitochondrial genome was also assembled and can be found as a contig within the multifasta file of the genome submission.
Metadata for specimens, spectral estimates, sequencing runs, contaminants and pre-curation assembly statistics can be found at https://links.tol.sanger.ac.uk/species/2014036.

Sample acquisition and nucleic acid extraction
The specimen used for genome sequencing was a female Nemoura dubitans (specimen ID NHMUK014361789, ToLID ipNemDubi1), collected from a river in Surlingham, UK (latitude 52.61, longitude 1.41) on 2019-04-11.A second female Nemoura dubitans was used for RNA sequencing (specimen ID NHMUK014361797, ToLID ipNemDubi2).Both specimens were collected and identified by Andrew Farr (Riverfly Recording Scheme) and dry frozen.
DNA was extracted at the Tree of Life laboratory, Wellcome Sanger Institute (WSI).The ipNemDubi1 sample was weighed and dissected on dry ice with tissue set aside for Hi-C  spectrophotometer and Qubit Fluorometer and Qubit dsDNA High Sensitivity Assay kit.Fragment size distribution was evaluated by running the sample on the FemtoPulse system.
RNA was extracted from whole organism tissue of ipNem-Dubi2 in the Tree of Life Laboratory at the WSI using TRIzol, according to the manufacturer's instructions.RNA was then eluted in 50 μl RNAse-free water and its concentration assessed using a Nanodrop spectrophotometer and Qubit Fluorometer using the Qubit RNA Broad-Range (BR) Assay kit.Analysis of the integrity of the RNA was done using Agilent RNA 6000 Pico Kit and Eukaryotic Total RNA assay.A Hi-C map for the final assembly was produced using bwa-mem2 (Vasimuddin et al., 2019) in the Cooler file format (Abdennur & Mirny, 2020).To assess the assembly metrics, the k-mer completeness and QV consensus quality values were calculated in Merqury (Rhie et al., 2020).This work was done using Nextflow (Di Tommaso et al., 2017) DSL2 pipelines "sanger-tol/ readmapping" (Surana et al., 2023a) and "sanger-tol/genomenote" (Surana et al., 2023b).The genome was analysed within the BlobToolKit environment (Challis et al., 2020) and BUSCO scores (Manni et al., 2021;Simão et al., 2015) were calculated.
Table 3 contains a list of relevant software tool versions and sources.

Wellcome Sanger Institute -Legal and Governance
The materials that have contributed to this genome note have been supplied by a Darwin Tree of Life Partner.The submission Further, the Wellcome Sanger Institute employs a process whereby due diligence is carried out proportionate to the nature of the materials themselves, and the circumstances under which they have been/are to be collected and provided for use.The purpose of this is to address and mitigate any potential legal and/or ethical implications of receipt and use of the materials as part of the research project, and to ensure that in doing so we align with best practice wherever possible.The overarching areas of consideration are: • Ethical review of provenance and sourcing of the material

Small comments
-RNA-seq is also generated for this species, however, this is not mentioned in the Results section.
-The article could benefit from expanding the background with a couple of sentences on the objectives and applications, besides being sequenced as a part of a consortium.If I'm not mistaken, this is the first genome available for this species, this could be mentioned too.
-Mention the sex determination of this specifies could be useful -The second female was also collected in the same location and on the same date?-Specify that these numbers after mentioning the collection site refers to longitude and latitude: (52.61, 1.41).
-Nemoura dubitans could be shortened to N. dubitans after the first mention in each section.
-Inconsistency between the tool name in the text and in Table 3: Pretext vs Pretext View, SALSA vs SALSA 2 -"bwa-mem 2" is missing from Table 3 Overall, this new genome resource by the Darwin Tree of Life Consortium is excellent.

Is the rationale for creating the dataset(s) clearly described? Partly
Are the protocols appropriate and is the work technically sound?Yes

Are sufficient details of methods and materials provided to allow replication by others? Yes
Are the datasets clearly presented in a useable and accessible format?

Yes
Competing Interests: No competing interests were disclosed.
Reviewer Expertise: bioinformatics, population genetics, epigenetics, transposable elements, adaptation I confirm that I have read this submission and believe that I have an appropriate level of expertise to confirm that it is of an acceptable scientific standard.

Figure 2 .
Figure 2. Genome assembly of Nemoura dubitans, ipNemDubi1.1:metrics.The BlobToolKit Snailplot shows N50 metrics and BUSCO gene completeness.The main plot is divided into 1,000 size-ordered bins around the circumference with each bin representing 0.1% of the 321,017,140 bp assembly.The distribution of scaffold lengths is shown in dark grey with the plot radius scaled to the longest scaffold present in the assembly (81,771,504 bp, shown in red).Orange and pale-orange arcs show the N50 and N90 scaffold lengths (53,077,913 and 43,408,134 bp), respectively.The pale grey spiral shows the cumulative scaffold count on a log scale with white scale lines showing successive orders of magnitude.The blue and pale-blue area around the outside of the plot shows the distribution of GC, AT and N percentages in the same bins as the inner plot.A summary of complete, fragmented, duplicated and missing BUSCO genes in the insecta_odb10 set is shown in the top right.An interactive version of this figure is available at https://blobtoolkit.genomehubs.org/view/ipNemDubi1.1/dataset/CAKLCQ01/snail.

Figure 5 .
Figure 5. Genome assembly of Nemoura dubitans, ipNemDubi1.1:Hi-C contact map of the ipNemDubi1.1 assembly, visualised using HiGlass.Chromosomes are shown in order of size from left to right and top to bottom.An interactive version of this figure may be viewed at https://genome-note-higlass.tol.sanger.ac.uk/l/?d=R0xG0doeQM-knjc_MG2W5A.

Darwin Tree of Life Project Sampling Code of Practice', which
can be found in full on the Darwin Tree of Life website here.By agreeing with and signing up to the Sampling Code of Practice, the Darwin Tree of Life Partner agrees they will meet the legal and ethical requirements and standards set out within this document in respect of all samples acquired for, and supplied to, the Darwin Tree of Life Project.