The genome sequence of the segmented worm, Sthenelais limicola (Ehlers, 1864)

We present a genome assembly from an individual Sthenelais limicola (the segmented worm; Annelida; Polychaeta; Phyllodocida; Sigalionidae). The genome sequence is 1,131 megabases in span. Most of the assembly is scaffolded into nine chromosomal pseudomolecules. The mitochondrial genome has also been assembled and is 16.7 kilobases in length.

Sthenelais limicola can be distinguished from other European species by the elytra having a smooth margin, bifurcate or notched, and a smooth surface with only a few microtubercles close to the point of attachment. Animals are generally colourless or white, with transparent elytra, often with a brownish patch on the posterior half, forming a V-shape or horseshoe appearance when combined with the appearance of the opposite elytron, and can reach up to 100 mm in size (Hartmann-Schröder, 1996). (Jumars et al., 2015) classified all Sigalionidae as carnivores, although the method of hunting and prey are unknown for S. limicola. Little data is available on reproduction in Sigalionidae, but they are gonochoric as far as it is known (Rouse et al., 2022). It is neither under threat nor considered as a non-native species anywhere in the world.
The genome of the segmented worm, Sthenelais limicola, was sequenced as part of the Darwin Tree of Life Project, a collaborative effort to sequence all named eukaryotic species in the Atlantic Archipelago of Britain and Ireland.

Genome sequence report
The genome was sequenced from one S. limicola ( Figure 1) collected from East Breakwater, Plymouth Sound, UK (50.34,. A total of 45-fold coverage in Pacific Biosciences single-molecule HiFi long reads was generated. Primary assembly contigs were scaffolded with chromosome conformation Hi-C data. Manual assembly curation corrected 444 missing or mis-joins and removed 210 haplotypic duplications, reducing the assembly length by 3.71% and the scaffold number by 72.31%, and decreasing the scaffold N50 by 2.18%. The final assembly has a total length of 1,131.1 Mb in 201 sequence scaffolds with a scaffold N50 of 137.0 Mb (Table 1). Most (99.59%) of the assembly sequence was assigned to nine chromosomal-level scaffolds. Chromosome-scale scaffolds confirmed by the Hi-C data are named in order of size (Figure 2- Figure 5; Table 2). The scaffold order and orientation is uncertain on chromosome 6 (0.39-8.86 Mb). Heterozygous inversion was observed on chromosome 7 (76.84-95.76 Mb). While not fully phased, the assembly deposited is of one haplotype. Contigs corresponding to the second haplotype have also been deposited. The assembly has a BUSCO v5.3.2 (Manni et al., 2021) completeness of 95.4% (single 94.3%, duplicated 1.0%) using the metazoa_odb10 reference set.

Methods
Sample acquisition and nucleic acid extraction An individual S. limicola (wpSthLimi1; Figure 1) was collected from East Breakwater, Plymouth Sound, UK (latitude 50.34, longitude -4.14) by Teresa Darbyshire (National Museum Wales) and Mitchell Brennan and Sean McTierney (Marine Biological Association) and identified by Teresa Darbyshire. The sample was collected from muddy substrate using a grab sampler (MV Sepia) and preserved in liquid nitrogen.
DNA was extracted at the Tree of Life laboratory, Wellcome Sanger Institute. The wpSthLimi1 sample was weighed and dissected on dry ice with tissue set aside for Hi-C sequencing. Mid-body tissue was cryogenically disrupted to a fine powder using a Covaris cryoPREP Automated Dry Pulveriser, receiving multiple impacts. High molecular weight (HMW)  instruments. Hi-C data were also generated from wpSthLimi1 using the Arima v2 kit and sequenced on the Illumina NovaSeq 6000 instrument.

Genome assembly
Assembly was carried out with Hifiasm (Cheng et al., 2021) and haplotypic duplication was identified and removed with purge_dups (Guan et al., 2020). The assembly was then scaffolded with Hi-C data (Rao et al., 2014) using YaHS (Zhou et al., 2022. The assembly was checked for contamination and corrected using the gEVAL system (Chow et al., 2016) as described previously (Howe et al., 2021). Manual curation was performed using gEVAL, HiGlass (Kerpedjiev et al., 2018) and Pretext (Harry, 2022). The mitochondrial genome was assembled using MitoHiFi (Uliano-Silva et al., 2022), which performed annotation using MitoFinder (Allio et al., 2020). The genome was analysed and BUSCO scores generated within the BlobToolKit environment (Challis et al., 2020). Table 3 contains a list of all software tool versions used, where appropriate.

Torsten H Struck
University of Oslo, Oslo, Oslo, Norway The paper by Darbyshire et al presents the nuclear and mitochondrial genome of an annelid scale worm at the reference level and to my knowledge it is the first for the annelid group Aphroditiformia. It is all in all very well described and has all essential statistics. However, I would like to make a few suggestions, which will help to increase repeatability of the study and all researchers not so familiar with genomics research to use this study as their starting point for conducting their genome projects. Especially, annelid researcher might use such a paper as a starting point. At present, this however not possible as a lot information is lacking describing the bioinformatic parts of this project.
In the description of the genome assembly are the settings for the different programs not given. Moreover, also any custom-made scripts used for the analysis are not mentioned. However, given the different data sources it has to be assumed that these scripts are present. The settings and scripts should be made publicly available to allow repetition of the analyses as the authors did.
Additionally, while the lab methods have been described in sufficient detail, several details on the bioinformatic analyses are missing. Which programs have been used for the mapping, blast/diamond and BUSCO analyses needed for blobtools analysis? BlobTools does not do it by itself. Which programs were used to assess the different quality parameters such as N50, kmer completeness, consensus quality and so forth? Also for the HiC, analyses crucial steps in preparation of the data for YAHS analysis are not described. For some one, new to the field it will very challenging to repeat the analyses easily. If these methods have been published in a different paper the authors should refer to this paper and indicate any deviations from it.
Other points to consider are: The members of the barcoding team are listed as authors. However, the barcoding approach is not described in the paper at all and also none of its result. Did the barcode confirm the morphological species? This is to be assumed as the policy of DToL is to sequence only species with a matching barcode. Nonetheless a statement should be made in the paper. Moreover,