The genome sequence of the grey top shell, Steromphala cineraria (Linnaeus, 1758)

We present a genome assembly from an individual Steromphala cineraria (the grey topshell; Mollusca; Gastropoda; Trochida; Trochidae). The genome sequence is 1,270 megabases in span. Most of the assembly (99.23%) is scaffolded into 18 chromosomal pseudomolecules.


Background
Steromphala cineraria (Linnaeus, 1758), commonly called the grey topshell, is a gastropod common to rocky shores in the UK.It typically occurs among boulders and cobbles on the lowshore and sub-tidally, where it grazes among Fucus and Laminaria species.Intertidally, it is most common on the lower shore, but can also be found in pools higher on the shore.Sub-tidally it extends to depths of 130 m, although it is most common in the kelp forests between 30 m and low water spring tide (Fretter & Graham, 1976).Its geographical distribution ranges from northern Norway to southern Portugal, becoming rarer at its range edges due to thermal limits being approached (Høisaeter, 2009;Nekhaev, 2013).
An important grazing species, S. cineraria is distinguished from other species of trochids by its bluntly conical shell and grey/yellowish finely striped patterning on the shell.In smaller shells, the umbilicus is large, becoming smaller and elliptical with age and in large shells sometimes becoming overgrown by the columellar lip (Fretter & Graham, 1976).
As S. cineraria is found across a large range of latitudes, it is exposed to a wide range of thermal environments in temperature, both due to time of year and geographical distribution.It is important to understand how populations may change in response to climate change, especially in its southern and northern range limits, and the knock-on effects this may have on macroalgae due to changes in grazing populations (Mieszkowska et al., 2007).A high quality genome sequence for this species will allow future studies to understand more about the mechanisms driving the observed response of this species to a changing climate.

Genome sequence report
The genome was sequenced from a single S. cineraria (Figure 1) collected from Mount Batten, Devon, UK (latitude 50.36084, longitude -4.12833).A total of 42-fold coverage in Pacific Biosciences single-molecule long reads and 35-fold coverage in 10X Genomics read clouds were generated.Primary assembly contigs were scaffolded with chromosome conformation Hi-C data.Manual assembly curation corrected 408 missing/misjoins and removed 70 haplotypic duplications, reducing the assembly size by 2.51% and the scaffold number by 53.835%, and increasing the scaffold N50 by 123.17%.
The final assembly has a total length of 1,270 Mb in 283 sequence scaffolds with a scaffold N50 of 70.7 Mb (Table 1).Of the assembly sequence, 99.23% was assigned to 18 chromosomal-level scaffolds (numbered by sequence length) (Figure 2-Figure 5; Table 2).Viewing the high-resolution Pretext map shows that there are large inversions between sister chromatids can be seen on chromosome 5 at 29.7-60.7 Mb and chromosome 11 at 17.7-39.7Mb.Possible inversions are also seen on chromosome 11 at Mb 3.4-39.4and 18-66 Mb.The assembly has a BUSCO v5.1.2 (Manni et al., 2021) completeness of 85.4% (single 84.6%, duplicated 0.8%) using the mollusca_odb10 reference set (n=5295).However, we believe that this relatively low BUSCO score is a result of limitations with the current mollusca_odb10 geneset.Using the metazoa_odb10 reference set (n=954), the assembly has a completeness of 97.6% (single 97.0%, duplicated 0.6%), which we believe is evidence of high completeness.While not fully phased, the assembly deposited is of one haplotype.Contigs corresponding to the second haplotype have also been deposited.

Sample acquisition and nucleic acid extraction
A single S. cineraria specimen (xgSteCine2) was collected from Mount Batten, Devon, UK (latitude 50.36084, longitude -4.12833) by Rob Mrowicki (Natural History Museum),

Amendments from Version 2
We have responded to the reviewers' comments with respect to the geographic range of the species, the visibility of lettering in Figure 1, and we have included a link to the TOLQC page, which supplies required information about the genome sequencing and assembly, including k-mer frequency spectra analyses.Patrick Adkins and Joanna Harley (both Marine Biological Association), by hand.The samples were identified by the same individual and snap-frozen in liquid nitrogen.
DNA was extracted at the Tree of Life laboratory, Wellcome Sanger Institute.The xgSteCine2 sample was weighed and dissected on dry ice with tissue set aside for Hi-C and RNA sequencing.Muscle tissue was cryogenically disrupted to a fine powder using a Covaris cryoPREP Automated Dry Pulveriser, receiving multiple impacts.Fragment size analysis of 0.01-0.5 ng of DNA was then performed using an Agilent FemtoPulse.High molecular weight (HMW) DNA was extracted using the Qiagen MagAttract HMW DNA extraction kit.Low molecular weight DNA was removed from a 200-ng aliquot of extracted DNA using 0.8X AMpure XP purification kit prior to 10X Chromium sequencing; a minimum of 50 ng DNA was submitted for 10X sequencing.HMW DNA was sheared into an average fragment size between 12-20 kb in a Megaruptor 3 system with speed setting 30.Sheared DNA was purified by solid-phase reversible immobilisation using AMPure PB beads with a 1.8X ratio of beads to sample to remove the shorter fragments and concentrate the DNA sample.The concentration of the sheared and purified DNA was assessed using a Nanodrop spectrophotometer and Qubit Fluorometer and Qubit dsDNA High Sensitivity Assay kit.
Fragment size distribution was evaluated by running the sample on the FemtoPulse system.
RNA was extracted from muscle tissue in the Tree of Life Laboratory at the WSI using TRIzol, according to the manufacturer's instructions.RNA was then eluted in 50 μl RNAse-free water and its concentration RNA assessed using a Nanodrop spectrophotometer and Qubit Fluorometer using the Qubit RNA Broad-Range (BR) Assay kit.Analysis of the integrity of the RNA was done using Agilent RNA 6000 Pico Kit and Eukaryotic Total RNA assay.

Sequencing
Pacific Biosciences HiFi circular consensus and 10X Genomics read cloud sequencing libraries were constructed according

Taro Maeda
Institute for Advanced Biosciences, Keio University, Mizukami, Japan "A total of 42-fold coverage in Pacific Biosciences".Please clarify the predicted genome size of this species and the prediction method.
What kind "-m" options were used for the gene model prediction for the BUSCO analysis?Did you use RNA-seq data during the BUSCO analysis?Please describe the number of gap regions (N-base) on the chromosomal-level scaffolds.
Is the rationale for creating the dataset(s) clearly described?Yes Are the protocols appropriate and is the work technically sound?Yes

Are sufficient details of methods and materials provided to allow replication by others? Partly
Are the datasets clearly presented in a useable and accessible format?Yes Competing Interests: No competing interests were disclosed.
Reviewer Expertise: Sacoglossa, Genome, Symbiosis, Kleptoplasty I confirm that I have read this submission and believe that I have an appropriate level of expertise to confirm that it is of an acceptable scientific standard, however I have significant reservations, as outlined above.

André Gomes-dos-Santos
Interdisciplinary Centre of Marine and Environmental Research, University of Porto, Porto, Portugal Thank you for the opportunity to review this Data Note.In the Manuscript entitled " The genome sequence of the grey top shell, Steromphala cineraria (Linnaeus, 1758) ", the Authors Adkins et al, produced the high quality, chromosome level assembly of the grey top shell, using the de novo approach of the Darwin Tree of Life Project.The genome here presented has been made publicly available and is undoubtedly an important genomic resource.However, as seen in detail below, I have some concerns regarding possible technical issues.Thus, I am in favour of indexing, but only after the raised issues are answered.

Abstract
Please also include the overall stats of the mtDNA assembly in the abstract.

Background
Regarding the species distribution, the authors report "Its geographical distribution ranges from southern Portugal and north to the White Sea in northern Russia, becoming rarer at its range edges as thermal limits are approached (Nekhaev, 2013)."However, in the Nekhaev 2013, the following is stated (please note that Gibbula cineraria is a synonym of Steromphala cineraria): "Distribution: In Atlantic G. cineraria is distributed from Morocco to Northern Norway [Fretter,Graham, 1977;Poppe, Goto, 1991].It is common along the Norway coast, but live animals were not found in East Finnmark [Høisoeter, 2009]." ○ "Remarks: This species was previously reported from Ura Bay [Knipowitsch, 1900] but not mentioned in recent Russian faunistic and taxonomic literature [Galkin, 1955;Golikov, 1995;Golikovet al., 2001;Kantor, Sysoev, 2006]."

○
It is important that the authors revised this information accordingly to the citations they provided or any other recent citation that may support their claim.
Regarding the last sentence of the Background: "A high quality genome sequence for this species will allow future studies to understand more about the mechanisms driving the observed response of this species to a changing climate."Although I recognize the fundamental importance of having a high-quality genome assembly to study such adaptative responses, by itself it is not enough.Especially given that no genome annotation is provided (which I understand is not a requirement).Given that, I would recommend changing the statement, so it is understood that the genome is important as it serves as a basal tool for future studies on the subject.Regarding sequencing coverage, i.e., 42-fold in PacBio and 35-fold in 10X Genomics.Were these estimations based on any previous expectation of genome size?Have the authors produced a kmer frequency spectrum analysis with the 10X Genomic reads?How did the authors decided the amount of sequencing output to produce?Molluscan genome sizes are highly variable, so it is customary to have guided assumptions prior to sequencing.

Genome sequence report
Regarding the chromosome level scaffolding, given that SALSA does not require prior knowledge of the number of chromosomes and the high percentage of scaffolds assigned to chromosomes, the results seem highly reliable.However, I wonder if any prior estimation of the number of chromosomes for this group is available, that may further support the results.
Regarding the statement "Large inversions between sister chromatids can be seen on chromosome 5 at 29.7-60.7 Mb and chromosome 11 at 17.7-39.7Mb.Possible inversions are also seen on chromosome 11 at Mb 3.4-39.4and 18-66 Mb."This is not evident from the Hi-C contact map.How did the authors determine this?
Are the datasets clearly presented in a useable and accessible format?Yes Competing Interests: No competing interests were disclosed.
Reviewer Expertise: Bioinformatics, Phylogenomics, Genomics I confirm that I have read this submission and believe that I have an appropriate level of expertise to confirm that it is of an acceptable scientific standard, however I have significant reservations, as outlined above.

Figure 1 .
Figure 1.Image of the xgSteCine2 specimen.(A) Image taken of the specimen prior to collection.(B-D) Image of the shell of the specimen following preservation and processing.

Figure 2 .
Figure 2. Genome assembly of Steromphala cineraria, xgSteCine2.1:metrics.The BlobToolKit Snailplot shows N50 metrics and BUSCO gene completeness.The main plot is divided into 1,000 size-ordered bins around the circumference with each bin representing 0.1% of the 1,270,504,078 bp assembly.The distribution of scaffold lengths is shown in dark grey with the plot radius scaled to the longest scaffold present in the assembly (98,775,408 bp, shown in red).Orange and pale-orange arcs show the N50 and N90 scaffold lengths (70,748,747 and 57,769,559 bp), respectively.The pale grey spiral shows the cumulative scaffold count on a log scale with white scale lines showing successive orders of magnitude.The blue and pale-blue area around the outside of the plot shows the distribution of GC, AT and N percentages in the same bins as the inner plot.A summary of complete, fragmented, duplicated and missing BUSCO genes in the mollusca_odb10 set is shown in the top right.An interactive version of this figure is available at https://blobtoolkit.genomehubs.org/view/Steromphala%20cineraria/dataset/CAKAJN01/snail.

Figure 3 .
Figure 3. Genome assembly of Steromphala cineraria, xgSteCine2.1.GC coverage.BlobToolKit GC-coverage plot.Scaffolds are coloured by phylum.Circles are sized in proportion to scaffold length.Histograms show the distribution of scaffold length sum along each axis.An interactive version of this figure is available at https://blobtoolkit.genomehubs.org/view/Steromphala%20cineraria/dataset/CAKAJN01/blob.

Figure 4 .
Figure 4. Genome assembly of Steromphala cineraria, xgSteCine2.1:cumulative sequence.BlobToolKit cumulative sequence plot.The grey line shows cumulative length for all scaffolds.Coloured lines show cumulative lengths of scaffolds assigned to each phylum using the buscogenes taxrule.An interactive version of this figure is available at https://blobtoolkit.genomehubs.org/view/Steromphala%20cineraria/dataset/CAKAJN01/cumulative.

Figure 5 .
Figure 5. Genome assembly of Steromphala cineraria, xgSteCine2.1:Hi-C contact map.Hi-C contact map of the xgSteCine2.1 assembly, visualised in HiGlass.Chromosomes are shown in size order from left to right and top to bottom.

Figure 1 -
Figure 1 -Images B-D have very poor resolution, the text on the labels is very hard to read.I recommend new photos.

Table 3 . Software tools used.
the Darwin Tree of Life Project Sampling Code of Practice.By agreeing with and signing up to the Sampling Code of Practice, the Darwin Tree of Life Partner agrees they will meet the legal and ethical requirements and standards set out within this document in respect of all samples acquired for, and supplied to, the Darwin Tree of Life Project.The genome sequence is released openly for reuse.The S. cineraria genome sequencing initiative is part of the Darwin Tree of Life (DToL) project.All raw sequence data and the assembly have been deposited in INSDC databases.The genome will be annotated with the RNA-Seq data and presented through the Ensembl pipeline at the European Bioinformatics Institute.Raw data and assembly accession identifiers are reported in Table1. to