The genome sequence of the scale worm, Lepidonotus clava (Montagu, 1808)

We present a genome assembly from an individual Lepidonotus clava (scale worm; Annelida; Polychaeta; Phyllodocida; Polynoidae). The genome sequence is 1,044 megabases in span. Most of the assembly is scaffolded into 18 chromosomal pseudomolecules. The mitochondrial genome has also been assembled and is 15.6 kilobases in length.


Background
Lepidonotus clava (Montagu, 1808) is a large-bodied (up to 45 mm), robust scale worm in the sub-family Lepidonotinae of Family Polynoidae. It was first described from the South Devon coast in England and is distributed around the UK and Ireland. Further afield, it is widely recorded along other European North Sea coasts, the Mediterranean, Red Sea and the west coast of Africa as well as the North Pacific (Barnich & Fiege, 2003;Chambers & Muir, 1997;Parapar et al., 2015). It is typically found on hard substrates (Chambers & Muir, 1997), but is also known from a variety of other habitats including muddy and sandy substrates, algal holdfasts, seagrass beds and mussel beds (Parapar et al., 2015) and is found from the littoral down to 160 m (Fauvel, 1936). It is not considered to be under threat or invasive as a non-native species anywhere in the world.
Lepidonotus species all have lateral antennae inserted terminally, 12 pairs of elytra and a smooth body. The 12 pairs of leathery elytra on L. clava are attached firmly to the body and are not shed easily as in some other scale worms. L. clava can be distinguished from other UK and European Lepidonotus by the elytra (surface covered with conical macro-and microtubercles and margin with no fringing papillae) and the neurochaetae all having unidentate tips. L. clava are carnivores, utilising a muscular evertible pharynx armed with a pair of jaws (Jumars et al., 2015), and are known to feed on other polynoids (Chambers & Muir, 1997). They are gonochoric and females with eggs have been reported from the west coast of Scotland in February and July (Chambers & Muir, 1997).
The genome of L. clava was sequenced as part of the Darwin Tree of Life Project, a collaborative effort to sequence all named eukaryotic species in the Atlantic Archipelago of Britain and Ireland.

Genome sequence report
The genome was sequenced from a single L. clava (Figure 1) collected from Batten Bay (latitude 50.3554, longitude -4.1251). A total of 48-fold coverage in Pacific Biosciences single-molecule HiFi long reads and 32-fold coverage in 10X Genomics read clouds was generated. Primary assembly contigs were scaffolded with chromosome conformation Hi-C data. Manual assembly curation corrected 300 missing joins or misjoins and removed 41 haplotypic duplications, reducing the assembly length by 3.27% and the scaffold number by 33.88%, and increasing the scaffold N50 by 3.14%.
The final assembly has a total length of 1,010 Mb in 400 sequence scaffolds with a scaffold N50 of 55 Mb (Table 1). Most (98.94%) of the assembly sequence was assigned to 18 chromosomal-level scaffolds, named in order of size (Figure 2- Figure 5; Table 2). Repetitive scaffolds have been inserted into chromosome 15 (~13.7 Mb), and the order and orientation of these scaffolds is unclear. The assembly has a BUSCO v5.3.2 (Manni et al., 2021) completeness of 95.7% (single 95.3%, duplicated 0.4%) using the metazoa_odb10 reference set. While not fully phased, the assembly deposited is of one haplotype. Contigs corresponding to the second haplotype have also been deposited.

Sample acquisition and nucleic acid extraction
An individual adult L. clava (wpLepClav1) was collected from Batten Bay, Mount Batten, Devon, UK (latitude 50.3554, longitude -4.1251) by John Bishop, Nova Mieszkowska and Patrick Adkins, (all Marine Biological Association), and Teresa Derbyshire and Anna Holmes (both National Museum Wales). The specimen was identified by Teresa Derbyshire by means of macroscopic and microscopic morphology and DNA barcoding (COI). The sample was taken from a rock crevice by hand and preserved by freezing in liquid nitrogen.
DNA was extracted at the Tree of Life laboratory, Wellcome Sanger Institute. The wpLepClav1 sample was weighed and dissected on dry ice with tissue set aside for Hi-C sequencing. Muscle tissue was disrupted using a Nippi Powermasher fitted with a BioMasher pestle. Fragment size analysis of 0.01-0.5 ng of DNA was then performed using an Agilent FemtoPulse. High molecular weight (HMW) DNA was extracted using the Qiagen MagAttract HMW DNA extraction kit. Low molecular weight DNA was removed from a 20-ng aliquot of extracted DNA using 0.8X AMpure XP purification kit prior to 10X Chromium sequencing; a minimum of   Biosciences SEQUEL II (HiFi), Illumina NovaSeq 6000 (DNA 10X and RNA-Seq) instruments. Hi-C data were also generated from mid-body tissue of wpLepClav1 using the Arima v2 kit and sequenced on the Illumina NovaSeq 6000 instrument.

Genome assembly
Assembly was carried out with Hifiasm (Cheng et al., 2021) and haplotypic duplication was identified and removed with purge_dups (Guan et al., 2020). One round of polishing was performed by aligning 10X Genomics read data to the assembly with Long ranger ALIGN, calling variants with freebayes (Garrison & Marth, 2012). The assembly was then scaffolded with Hi-C data (Rao et al., 2014) using YaHS (Zhou et al., 2022). The assembly was checked for contamination and corrected using the gEVAL system (Chow et al., 2016) as described previously (Howe et al., 2021). Manual curation (Howe et al., 2021) was performed using gEVAL, HiGlass (Kerpedjiev et al., 2018) and Pretext (Harry, 2022). The mitochondrial genome was assembled using MitoHiFi (Uliano- Silva et al., 2021), which performed annotation using MitoFinder (Allio et al., 2020). The genome was analysed and BUSCO scores generated within the BlobToolKit environment (Challis et al.). Table 3 contains a list of all software tool versions used, where appropriate.