The genome sequence of the turban top shell, Gibbula magus (Linnaeus, 1758)

We present a genome assembly from an individual Gibbula magus (the turban top shell; Mollusca; Gastropoda; Trochida; Trochidae). The genome sequence is 1,470 megabases in span. Most of the assembly is scaffolded into 18 chromosomal pseudomolecules. The mitochondrial genome has also been assembled and is 16.1 kilobases in length. Gene annotation of this assembly on Ensembl identified 41,167 protein coding genes.


Background
Gibbula magus is a marine gastropod mollusc in the Trochidae family, known as topshells (NBN Atlas, 2021). They are usually sublittoral, and found on muddy sandy or gravel, on algae or under stones, where they feed on microphytes (Smith, 2015). They can be found in the intertidal zone at extreme low spring tides and down to depths of 70 m (Wilson, 2017). In Great Britain, G. magus is seldom found on the east coast, occurring almost exclusively on the south and west coasts, and on all coasts in Ireland (de Kluijver et al., 2000;NBN Atlas, 2021). The breeding times of G. magus varies geographically; June at Roscoff and spring and autumn at Plymouth. Fertilisation happens externally and there is a brief free-living trochophore larval stage, however, little else is known about these early life stages (Smith, 2015). The fringe present on the body of G. magus is thought to protect against detritus and may be able to sense poor water conditions (Smith, 2015).
The Gibbula genus can be difficult to identify due to high variability in the shell morphology (the main identifying feature) unspecific or missing type material and vague original descriptions (Affenzeller et al., 2017). Adult G. magus are the largest of the trochid species found in UK waters. They have a flattened spire and a large umbilicus, and diagonal lines of pink coloured dots along the whorls of the shell. The taxonomic divides of the Trochidae family have also been investigated, with changes to the groupings and names of several species (Affenzeller et al., 2017;Anistratenko, 2005). Quality genomic data can add more information and detail for how these decisions are made, and support the work of taxonomists (Coates et al., 2018). Here we present a chromosomally complete genome sequence for G. magus, based on a specimen from Gann Bay, Pembrokeshire, UK.

Genome sequence report
The genome was sequenced from a Gibbula magus specimen ( Figure 1) collected from Gann Bay, Pembrokeshire, UK (latitude 51.71, longitude -5.17). A total of 35-fold coverage in Pacific Biosciences single-molecule HiFi long reads and 36-fold coverage in 10X Genomics read clouds was generated. Primary assembly contigs were scaffolded with chromosome conformation Hi-C data. Manual assembly curation corrected 477 missing joins or misjoins and removed 93 haplotypic duplications, reducing the assembly length by 4.9% and the scaffold number by 54.52%, and increasing the scaffold N50 by 3.38%.
The final assembly has a total length of 1,470.4 Mb in 151 sequence scaffolds with a scaffold N50 of 80.5 Mb (Table 1). Most (99.51%) of the assembly sequence was assigned to 9 chromosomal-scale scaffolds. Chromosome-scale scaffolds confirmed by the Hi-C data are named in order of size (Figure 2- Figure 5; Table 2). While not fully phased, the assembly deposited is of one haplotype. Contigs corresponding to the second haplotype have also been deposited. The mitochondrial genome was also assembled.
The assembly has a BUSCO v5.3.2 (Manni et al., 2021) completeness of 84.2% (single 83.4%, duplicated 0.8%) using the OrthoDB-v10 mollusca reference set. BUSCO loci identified as fragmented accounted for a further 4.8% of loci tested. This low BUSCO score may be due to low conservation of orthologues between G. magus and the molluscan species in the reference set, or underperformance of the BUSCO gene finder given the particular gene structures in this species. The assembly is validated by the other assembly quality metrics (k-mer completeness 99.98%, consensus quality (QV) 52.2) shown in Table 1.
The resulting annotation includes 41,235 transcribed mRNAs from 41,167 protein-coding genes.

Sample acquisition and nucleic acid extraction
The collectors of the G. magus specimen (xgGibMagu1) used for genome sequencing were Patrick Adkins and Joanna Harley (Marine Biological Association) and Teresa Darbyshire and Anna Holmes (Amgueddfa Cymru), and the specimen was then identified by Patrick Adkins and Joanna Harley. The  AMpure XP purification kit prior to 10X Chromium sequencing; a minimum of 50 ng DNA was submitted for 10X sequencing. HMW DNA was sheared into an average fragment size between 12-20 kb in a Megaruptor 3 system with speed setting 30. Sheared DNA was purified by solid-phase reversible immobilisation using AMPure PB beads with a 1.8X ratio of beads to sample to remove the shorter fragments and concentrate the DNA sample. The concentration of the sheared and purified DNA was assessed using a Nanodrop spectrophotometer and Qubit Fluorometer and Qubit dsDNA High Sensitivity Assay kit. Fragment size distribution was evaluated by running the sample on the FemtoPulse system. RNA was extracted from muscle tissue of xgGibMagu1 in the Tree of Life Laboratory at the WSI using TRIzol, according to the manufacturer's instructions. RNA was then eluted in 50 μl RNAse-free water and its concentration RNA assessed using a Nanodrop spectrophotometer and Qubit Fluorometer using the Qubit RNA Broad-Range (BR) Assay kit. Analysis of the integrity of the RNA was done using Agilent RNA 6000 Pico Kit and Eukaryotic Total RNA assay.

Sequencing
Pacific Biosciences HiFi circular consensus and 10X Genomics read cloud DNA sequencing libraries were constructed  according to the manufacturers' instructions. Poly(A) RNA-Seq libraries were constructed using the NEB Ultra II RNA Library Prep kit. DNA and RNA sequencing was performed by the Scientific Operations core at the WSI on Pacific Biosciences SEQUEL II (HiFi), Illumina NovaSeq 6000 (RNA-Seq and 10X) instruments. Hi-C data were also generated from muscle tissue of xgGibMagu1 using the Arima v2 kit and sequenced on the Illumina NovaSeq 6000 instrument.

Genome assembly
Assembly was carried out with Hifiasm (Cheng et al., 2021) and haplotypic duplication was identified and removed with purge_dups (Guan et al., 2020). One round of polishing was performed by aligning 10X Genomics read data to the assembly with Long Ranger ALIGN, calling variants with freebayes (Garrison & Marth, 2012). The assembly was then scaffolded with Hi-C data (Rao et al., 2014) using YaHS (Zhou et al., 2022). The assembly was checked for contamination and corrected as described previously (Howe et al., 2021). Manual curation was performed using HiGlass (Kerpedjiev et al., 2018) and Pretext (Harry, 2022). The mitochondrial genome was assembled using MitoHiFi (Uliano-Silva et al., 2022), which performed annotation using MitoFinder (Allio et al., 2020). The genome was analysed and BUSCO scores were generated within the BlobToolKit environment (Challis et al., 2020). Table 3 contains a list of all software tool versions used, where appropriate.

Genome annotation
The Ensembl gene annotation system (Aken et al., 2016) was used to generate annotation for the G. magus assembly (GCA_936450465.1). Annotation was created primarily through alignment of transcriptomic data to the genome, with