The genome sequence of the grey gurnard, Eutrigla gurnardus (Linnaeus, 1758)

We present a genome assembly from an individual Eutrigla gurnardus (the grey gurnard; Chordata; Actinopteri; Scorpaeniformes; Triglidae). The genome sequence is 680.5 megabases in span. Most of the assembly is scaffolded into 24 chromosomal pseudomolecules. The mitochondrial genome has also been assembled and is 16.51 kilobases in length.


Background
Eutrigla gurnardus, commonly called the Grey Gurnard, is a demersal teleost fish belonging to the family Triglidae (Figure 1).It is distributed in shelf seas and coastal waters from Norway, Greenland and Iceland in the north, to as far south as Mauritania and the Azores, including the Mediterranean and black sea (Blanc & Hureau, 1979;Froese & Pauly, 2023;Hureau, 1986;Neudecker & Stein, 2011).E. gurnardus is commonly found on sandy/muddy-sand bottoms around the entire of the UK.It is traditionally more common in southern sites in the UK.However, its distribution has been shown to be changing in response to climatic changes (Perry et al., 2005).
E. gurnardus is typically grey or brownish in colour (occasionally dull red) and whitish ventrally.As with other members of the Triglidae family it has modified pectoral fins, the first three rays of which form slender tactile processes.It can be distinguished from other members of UK Triglidae by having short pectoral fins which do not reach as far back as the anal fin and by having spines along its lateral line.
A significant predator around much of the UK (Daan et al., 1990), E. gurnardus feeds on a range of invertebrate species (mostly crustaceans) and small fish (Weinert et al., 2010).During competition for food, it has been shown to make vocalisations in the form of grunts, growls and knocks (Amorim et al., 2004).
In the past gurnards have not been split into their respective species when landed and have instead been treated as a single group.It has therefore been difficult to interpret fisheries data when trying to regard individual gurnard species.Even today, due to the low economic value of this species, it is likely that landing data does not accurately reflect the actual numbers of individuals caught, with the majority of gurnards being discarded after being caught alongside target demersal species such as flatfish (Enever et al., 2007;Enever et al., 2009;ICES, 2006;ICES, 2012).This genome represents the first of its kind for this Eutrigla gurnardus.It was sequenced as part of the Darwin Tree of Life Project, a collaborative effort to sequence all named eukaryotic species in the Atlantic Archipelago of Britain and Ireland.

Genome sequence report
The genome was sequenced from one Eutrigla gurnardus collected using a trawl in Whitsand Bay, English Channel, UK (latitude 50.26, longitude -3.95).A total of 34-fold coverage in Pacific Biosciences single-molecule HiFi long reads was generated.Primary assembly contigs were scaffolded with chromosome conformation Hi-C data.Manual assembly curation corrected 24 missing joins or mis-joins and removed 3 haplotypic duplications, reducing the scaffold number by 4.20%.
The final assembly has a total length of 680.5 Mb in 318 sequence scaffolds with a scaffold N50 of 29.2 Mb (Table 1).The snail plot in Figure 2 provides a summary of the assembly statistics, while the distribution of assembly scaffolds on GC proportion and coverage is shown in Figure 3.The cumulative assembly plot in Figure 4 shows curves for subsets of scaffolds assigned to different phyla.Most (96.33%) of the assembly sequence was assigned to 24 chromosomal-level scaffolds.Chromosome-scale scaffolds confirmed by the Hi-C data are named in order of size (Figure 5; Table 2).While not fully phased, the assembly deposited is of one haplotype.Contigs corresponding to the second haplotype have also been deposited.The mitochondrial genome was also assembled and can be found as a contig within the multifasta file of the genome submission.

Sample acquisition and nucleic acid extraction
A Eutrigla gurnardus specimen (specimen ID MBA-211118-001A, ToLID fEutGur1) was collected from Whitsand Bay, English Channel, UK (latitude 50.26, longitude -3.95) on 29 April 2022.The specimen was taken from its habitat of sand and broken shell using an otter trawl deployed from RV Sepia.The specimen was identified by Rachel Brittain, Patrick Adkins and Joanna Harley (Marine Biological Association) based on gross morphology.The fish was first anesthetised and then overdosed using Aquased (2-phenoxyethanol).Destruction  of the brain was used as a secondary method to ensure the animal was deceased before tissue sampling took place as in accordance with Schedule 1 methodology under the home office licence.Samples taken from the animal were preserved on dry ice.HMW DNA was extracted in the WSI Scientific Operations core using the Automated MagAttract v2 protocol (Oatley et al., 2023).The DNA was sheared into an average fragment size of 12-20 kb in a Megaruptor 3 system with speed       2013) and uses these annotations to select the final mitochondrial contig and to ensure the general quality of the sequence.

Final assembly evaluation
The final assembly was post-processed and evaluated with the three Nextflow (Di Tommaso et al., 2017) DSL2 pipelines "sanger-tol/readmapping" (Surana et al., 2023a), "sanger-tol/ genomenote" (Surana et al., 2023b) The sanger-tol/blobtoolkit pipeline is a Nextflow port of the previous Snakemake Blobtoolkit pipeline (Challis et al., 2020).It aligns the PacBio reads with SAMtools and mini-map2 (Li, 2018) and generates coverage tracks for regions of fixed size.In parallel, it queries the GoaT database (Challis et al., 2023) to identify all matching BUSCO lineages to run BUSCO (Manni et al., 2021).For the three domain-level BUSCO lineage, the pipeline aligns the BUSCO genes to the Uniprot Reference Proteomes database (Bateman et al., 2023) with DIAMOND (Buchfink et al., 2021) blastp.The genome is also split into chunks according to the density of the BUSCO genes from the closest taxonomically lineage, and each chunk is aligned to the Uniprot Reference Proteomes database with DIAMOND blastx.Genome sequences that have no hit are then chunked with seqtk and aligned to the NT database with blastn (Altschul et al., 1990).All those outputs are combined with the blobtools suite into a blobdir for visualisation.
Table 3 contains a list of relevant software tool versions and sources.

Wellcome Sanger Institute -Legal and Governance
The materials that have contributed to this genome note have been supplied by a Darwin Tree of Life Partner.The submission of materials by a Darwin Tree of Life Partner is subject to the 'Darwin Tree of Life Project Sampling Code of Practice', which can be found in full on the Darwin Tree of Life website here.By agreeing with and signing up to the Sampling Code of Practice, the Darwin Tree of Life Partner agrees they will meet the legal and ethical requirements and standards set out within this document in respect of all samples acquired for, and supplied to, the Darwin Tree of Life Project.
Further, the Wellcome Sanger Institute employs a process whereby due diligence is carried out proportionate to the nature of the materials themselves, and the circumstances under which they have been/are to be collected and provided for use.The purpose of this is to address and mitigate any potential legal and/or ethical implications of receipt and use of the materials as part of the research project, and to ensure that in doing so we align with best practice wherever possible.The overarching areas of consideration are: • Ethical review of provenance and sourcing of the material

Fernando Cruz
Centro Nacional de Análisis Genómico (CNAG), Barcelona, Spain Congratultaions to the authors.After Chelidonichthys spinosus, this is the second chromosomelevel reference genome available for the gunards (Family Triglidae) to date.The fEutGur1.1 assembly constitutes a valuable reference genome for this group and for Fish Genomics in general.The assembly meets by far the quality standards of the EBP and VGP.

I have three comments and recommendations about this note:
1. Please specify that by gunards you refer to Family Triglidae in the text.Non experts in fish taxonomy will appreciate this.
2. Please distinguish or split the total 24 missing joins or miss-joins into missing joins (to my knowledge "translocations) and mis-joins (to my knowledeg miss-assemblies).

Giacomo Bernardi
University of California Santa Cruz, Santa Cruz, California, USA This is a very straightforward report of a genome assembly.
I have two very minor comments that pertain to the introduction: The reported completeness and contiguity statistics support the excellent quality of this new genomic resource.I only have one main concern: the presented Hi-C contact map (Figure 5) does not show the correct scaffolding of contigs into chromosomes.On this contact map, no chromosomes are visible, only the diagonal.Inspecting the complete set of supplemental QC figures available online (https://tolqc.cog.sanger.ac.uk/darwin/fish/Eutrigla_gurnardus/), the postscaffolding Hi-C map "Juicebox YaHS Hi-C map: fEutGur1" appears more informative, with chromosomes clearly visible.However, a small mis-assembly can be suspected on this map: the first ~third of chr1 seems to have more contact with chr24 than with the rest of chr1.This might have been corrected during the described manual curation rounds, but it cannot be verified on the presented contact map.
The methods are overall comprehensive.The Hi-C scaffolding steps could maybe be explained in more details: before scaffolding with YaHS, how were the Hi-C reads mapped and filtered?Finally, regarding the Quality Value (QV) it might be worth specifying that this relates to base accuracy (as opposed to structural accuracy), if I understood correctly.

Is the rationale for creating the dataset(s) clearly described? Yes
Are the protocols appropriate and is the work technically sound?Yes

Are sufficient details of methods and materials provided to allow replication by others? Partly
Are the datasets clearly presented in a useable and accessible format?Yes Competing Interests: No competing interests were disclosed.
The workflow for high molecular weight (HMW) DNA extraction at the Wellcome Sanger Institute (WSI) Tree of Life Core Laboratory includes a sequence of core procedures: sample preparation; sample homogenisation, DNA extraction, fragmentation, and clean-up.The sample was prepared for extraction at the Tree of Life Core Laboratory: tissue from the gills of fEutGur1 sample was weighed and dissected on dry ice(Jay et al., 2023)  and homogenised using a PowerMasher II tissue disruptor(Denton et al., 2023a).

Figure 2 .
Figure 2. Genome assembly of Eutrigla gurnardus, fEutGur1.1:metrics.The BlobToolKit snail plot shows N50 metrics and BUSCO gene completeness.The main plot is divided into 1,000 size-ordered bins around the circumference with each bin representing 0.1% of the 680,483,561 bp assembly.The distribution of scaffold lengths is shown in dark grey with the plot radius scaled to the longest scaffold present in the assembly (37,750,515 bp, shown in red).Orange and pale-orange arcs show the N50 and N90 scaffold lengths (29,198,404 and 20,162,171 bp), respectively.The pale grey spiral shows the cumulative scaffold count on a log scale with white scale lines showing successive orders of magnitude.The blue and pale-blue area around the outside of the plot shows the distribution of GC, AT and N percentages in the same bins as the inner plot.A summary of complete, fragmented, duplicated and missing BUSCO genes in the actinopterygii_odb10 set is shown in the top right.An interactive version of this figure is available at https://blobtoolkit.genomehubs.org/view/CAUPSS01/dataset/CAUPSS01/snail.

Figure 3 .
Figure 3. Genome assembly of Eutrigla gurnardus, fEutGur1.1:BlobToolKit GC-coverage plot.Sequences are coloured by phylum.Circles are sized in proportion to sequence length.Histograms show the distribution of sequence length sum along each axis.An interactive version of this figure is available at https://blobtoolkit.genomehubs.org/view/CAUPSS01/dataset/CAUPSS01/blob.

Figure 4 .
Figure 4. Genome assembly of Eutrigla gurnardus, fEutGur1.1:BlobToolKit cumulative sequence plot.The grey line shows cumulative length for all sequences.Coloured lines show cumulative lengths of sequences assigned to each phylum using the buscogenes taxrule.An interactive version of this figure is available at https://blobtoolkit.genomehubs.org/view/CAUPSS01/dataset/CAUPSS01/cumulative.

Figure 5 .
Figure 5. Genome assembly of Eutrigla gurnardus, fEutGur1.1:Hi-C contact map of the fEutGur1.1 assembly, visualised using HiGlass.Chromosomes are shown in order of size from left to right and top to bottom.An interactive version of this figure may be viewed at https://genome-note-higlass.tol.sanger.ac.uk/l/?d=T0CMcjkMSfGbGvPjIDJGGg.
Fig 1. is the picture of the fish used for the genome not available?○what does 'this' mean in the sentence : This genome represents the first of its kind for this Eutrigla gurnardus my interpretation is that there might be crytic species that are expected and 'this' is the genome of 'this' cryptic species?○ Is the rationale for creating the dataset(s) clearly described?Yes Are the protocols appropriate and is the work technically sound?Yes Are sufficient details of methods and materials provided to allow replication by others?Yes Are the datasets clearly presented in a useable and accessible format?Yes Competing Interests: No competing interests were disclosed.Reviewer Expertise: Fish genomics I confirm that I have read this submission and believe that I have an appropriate level of expertise to confirm that it is of an acceptable scientific standard.Reviewer Report 31 July 2024 https://doi.org/10.21956/wellcomeopenres.24740.r89419© 2024 Parey E. This is an open access peer review report distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.Elise PareyUniversity College London, London, England, UK This data note from the Darwin Tree of Life Consortium presents a high-quality chromosomesscale genome assembly for the grey gurnard, as part of the consortium's efforts to comprehensively sequence the UK eukaryotic biodiversity.

the rationale for creating the dataset(s) clearly described? Yes Are the protocols appropriate and is the work technically sound? Yes Are sufficient details of methods and materials provided to allow replication by others? Yes Are the datasets clearly presented in a useable and accessible format? Yes Competing Interests:
No competing interests were disclosed.

have read this submission and believe that I have an appropriate level of expertise to confirm that it is of an acceptable scientific standard.
This is an open access peer review report distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.