The genome sequence of the Pacific oyster, Magallana gigas (Thunberg, 1793)

We present a genome assembly from an individual Magallana gigas (the Pacific oyster; Mollusca; Bivalvia; Ostreida; Ostreidae). The genome sequence is 564.0 megabases in span. Most of the assembly is scaffolded into 10 chromosomal pseudomolecules. The mitochondrial genome has also been assembled and is 18.23 kilobases in length.


Background
The Pacific oyster Magallana gigas, formerly known as Crassostrea gigas, is an invasive species now commonly found across Europe.Its native range is in Northeast and Southeast Asia.M gigas has a deeply cupped left valve and a flat right valve, both with corresponding deep ridges along its margins (Hughes, 2008).It settles on hard surfaces and can form reef structures, where it feeds by filter feeding at high filtration rates (Troost, 2010).
The oyster was first introduced to Europe after a decline in the European oyster (Ostrea edulis) due to disease and severe winters.Since its introduction in 1964, it has been introduced in 66 countries, with wild populations found in 17 of these (Wood et al., 2021).Many studies have investigated competition between the Pacific and the European oysters, with some suggesting that the similar feeding preferences between the two oysters make them highly competitive, although others suggest that their co-habitation can allow niche partitioning within different tidal zones (Ezgeta-Balić et al., 2020;Zwerschke et al., 2018).This may be further explored by comparing the now-published genomes of both species (Adkins et al., 2023).
M. gigas has the highest annual production of all aquaculture organisms in the world, credited to its plasticity in highly exploited environments full of anthropogenic stressors (Hedgecock et al., 2005;Tran et al., 2022).With new genomic data becoming available, studies have used the chromosome-level genome of M. gigas to examine its plasticity in response to environmental stressors and disease (Li et al., 2021;Qi et al., 2021;Yao et al., 2022).A complete genome sequence will thereby provide further information to elucidate these functions, which will be beneficial in understanding its role in aquaculture and its threat to wild populations.

Genome sequence report
The genome was sequenced from a specimen of Magallana gigas (Figure 1) collected from Noss Mayo, Devon, UK (50.31,.A total of 48-fold coverage in Pacific Biosciences single-molecule HiFi long reads was generated.Primary assembly contigs were scaffolded with chromosome conformation Hi-C data.Manual assembly curation corrected 108 missing joins or mis-joins and removed 68 haplotypic duplications, reducing the assembly length by 13.48% and the scaffold number by 55.22%, and decreasing the scaffold N50 by 5.73%. The final assembly has a total length of 564.0 Mb in 29 sequence scaffolds with a scaffold N50 of 57.3 Mb (Table 1).The snail plot in Figure 2 provides a summary of the assembly statistics, while the distribution of assembly scaffolds on GC proportion and coverage is shown in Figure 3.The cumulative assembly plot in Figure 4 shows curves for subsets of scaffolds assigned to different phyla.Most (99.85%) of the assembly sequence was assigned to 10 chromosomal-level scaffolds.Chromosome-scale scaffolds confirmed by the Hi-C data are named in order of size (Figure 5; Table 2).While not fully phased, the assembly deposited is of one haplotype.Contigs corresponding to the second haplotype have also been deposited.The mitochondrial genome was also assembled and can be found as a contig within the multifasta file of the genome submission.
The estimated Quality Value (QV) of the final assembly is 59.7 with k-mer completeness of 100.0%, and the assembly has a BUSCO v completeness of 96.5% (single = 96.2%,duplicated = 0.3%), using the mollusca_odb10 reference set (n = 5,295).HMW DNA was extracted using the Manual MagAttract v1 protocol (Strickland et al., 2023b).DNA was sheared into an average fragment size of 12-20 kb in a Megaruptor 3 system  The sanger-tol/blobtoolkit pipeline is a Nextflow port of the previous Snakemake Blobtoolkit pipeline (Challis et al., 2020).It aligns the PacBio reads with SAMtools and minimap2  (Li, 2018) and generates coverage tracks for regions of fixed size.In parallel, it queries the GoaT database (Challis et al., 2023) to identify all matching BUSCO lineages to run BUSCO (Manni et al., 2021).

Software tool Version
Life website here.By agreeing with and signing up to the Sampling Code of Practice, the Darwin Tree of Life Partner agrees they will meet the legal and ethical requirements and standards set out within this document in respect of all samples acquired for, and supplied to, the Darwin Tree of Life Project.
Further, the Wellcome Sanger Institute employs a process whereby due diligence is carried out proportionate to the nature of the materials themselves, and the circumstances under which they have been/are to be collected and provided for use.
The purpose of this is to address and mitigate any potential legal and/or ethical implications of receipt and use of the materials as part of the research project, and to ensure that in doing so we align with best practice wherever possible.The overarching areas of consideration are: • Ethical review of provenance and sourcing of the material

Qinggang Xue
Zhejiang Wanli University, Ningbo, Zhejiang, China This is a data note for the genome sequence of a Pacific oyster individual collected in UK.Based on the reported assessment parameters, it was a high-quality genome assembly.However, there are a few points that should be further addressed or clarified.There are already 2 high quality chromosome-level genome sequence assemblies in the public sequence databases, and one of them was done in the UK.What were the rationales for the present assembly? 1.
It should be made more direct and clearer that the note represented the third complete genome assembly of a same oyster species.This can help to avoid potential confusions in the research community as a different scientific name was used in this work.It should also be addressed that renaming Crassostrea gigas as Magallana gigas is not yet a scientific consensus in the oyster research community.

2.
Because of the existence 2 previous assemblies, readers may expect some brief comparison between this new completed one and the previous two.

3.
The authors detailed the pipelines for assembly evaluation, but did not describe how manual assembly curation was done.The latter is an important piece of information.

Are the datasets clearly presented in a useable and accessible format? Yes
Competing Interests: No competing interests were disclosed.
Reviewer Expertise: Mollusk physiology and immunity, molecular and functional evolution of immune and stress response related proteins in mollusks I confirm that I have read this submission and believe that I have an appropriate level of expertise to confirm that it is of an acceptable scientific standard, however I have significant reservations, as outlined above.Reviewer Expertise: genomics of marine invertebrates, especially molluscs and annelids.
I confirm that I have read this submission and believe that I have an appropriate level of expertise to confirm that it is of an acceptable scientific standard.

Minor editorial imperfections:
There is an extra "v" between "BUSCO" and "completeness" under "Genome sequencing report".

○
There is extraneous text (authors' comments) in the capture of Figure 1.

○
The online version of Figure 4 at https://blobtoolkit.genomehubs.org/view/Magallana_gigas/dataset/GCA_963853765.1/cumulative is not really interactive (at least on my screen), so dubbing it as such in the figure capture is not accurate.Reviewer Expertise: Host-pathogen interactions, Shellfish genomics, Shellfish aquaculture I confirm that I have read this submission and believe that I have an appropriate level of expertise to confirm that it is of an acceptable scientific standard.

Figure 1 .
Figure 1.Photographs of the Magallana gigas (xbMagGiga1) specimen used for genome sequencing.A shows the whole animal attached to the substrate where it was found.(I think we can get rid of B, I don't think this animal was used for sequencing) C shows the animal after being removed from the substrate.D shows the internal view of both valves with tissue still attached after dissection.

Figure 2 .
Figure 2. Genome assembly of Magallana gigas, xbMagGiga1.1:metrics.The BlobToolKit snail plot shows N50 metrics and BUSCO gene completeness.The main plot is divided into 1,000 size-ordered bins around the circumference with each bin representing 0.1% of the 564,004,028 bp assembly.The distribution of scaffold lengths is shown in dark grey with the plot radius scaled to the longest scaffold present in the assembly (76,070,991 bp, shown in red).Orange and pale-orange arcs show the N50 and N90 scaffold lengths (57,274,926 and 50,364,239 bp), respectively.The pale grey spiral shows the cumulative scaffold count on a log scale with white scale lines showing successive orders of magnitude.The blue and pale-blue area around the outside of the plot shows the distribution of GC, AT and N percentages in the same bins as the inner plot.A summary of complete, fragmented, duplicated and missing BUSCO genes in the mollusca_odb10 set is shown in the top right.An interactive version of this figure is available at https://blobtoolkit.genomehubs.org/view/Magallana_gigas/dataset/GCA_963853765.1/snail.

Figure 3 .
Figure 3. Genome assembly of Magallana gigas, xbMagGiga1.1:BlobToolKit GC-coverage plot.Sequences are coloured by phylum.Circles are sized in proportion to sequence length.Histograms show the distribution of sequence length sum along each axis.An interactive version of this figure is available at https://blobtoolkit.genomehubs.org/view/Magallana_gigas/dataset/GCA_963853765.1/blob.

Figure 4 .
Figure 4. Genome assembly of Magallana gigas, xbMagGiga1.1:BlobToolKit cumulative sequence plot.The grey line shows cumulative length for all sequences.Coloured lines show cumulative lengths of sequences assigned to each phylum using the buscogenes taxrule.An interactive version of this figure is available at https://blobtoolkit.genomehubs.org/view/Magallana_gigas/dataset/GCA_963853765.1/cumulative.

Figure 5 .
Figure 5. Genome assembly of Magallana gigas, xbMagGiga1.1:Hi-C contact map of the xbMagGiga1.1 assembly, visualised using HiGlass.Chromosomes are shown in order of size from left to right and top to bottom.An interactive version of this figure may be viewed at https://genome-note-higlass.tol.sanger.ac.uk/l/?d=BM-Ctut_Q3GGlAPdqRWtLQ.
For the three domain-level BUSCO lineage, the pipeline aligns the BUSCO genes to the Uniprot Reference Proteomes database(Bateman et al., 2023) withDIAMOND (Buchfink et al., 2021)  blastp.The genome is also split into chunks according to the density of the BUSCO genes from the closest taxonomically lineage, and each chunk is aligned to the Uniprot Reference Proteomes database with DIAMOND blastx.Genome sequences that have no hit are then chunked with seqtk and aligned to the NT database with blastn(Altschul et al., 1990).All those outputs are combined with the blobtools suite into a blobdir for visualisation.All three pipelines were developed using the nf-core tooling(Ewels et al., 2020), use MultiQC (Ewels et al., 2016), and make extensive use of the Conda package manager, the Bioconda initiative(Grüning et al., 2018), the Biocontainers infrastructure (da VeigaLeprevost et al., 2017), and the Docker (Merkel, 2014) and Singularity(Kurtzer et al., 2017)   containerisation solutions.

2 .
Peñaloza C, Gutierrez AP, Eöry L, Wang S, et al.: A chromosome-level genome assembly for the Pacific oyster Crassostrea gigas.Gigascience.2021; 10 (3).PubMed Abstract | Publisher Full Text Is the rationale for creating the dataset(s) clearly described?Partly Are the protocols appropriate and is the work technically sound?Yes Are sufficient details of methods and materials provided to allow replication by others?Yes Are the datasets clearly presented in a useable and accessible format?Yes Competing Interests: No competing interests were disclosed.

○
Is the rationale for creating the dataset(s) clearly described?PartlyAre the protocols appropriate and is the work technically sound?YesAre sufficient details of methods and materials provided to allow replication by others?YesAre the datasets clearly presented in a useable and accessible format?YesCompeting Interests: No competing interests were disclosed.

Table 3
Wellcome Sanger Institute -Legal and GovernanceThe materials that have contributed to this genome note have been supplied by a Darwin Tree of Life Partner.The submission of materials by a Darwin Tree of Life Partner is subject to the 'Darwin Tree of Life Project Sampling Code of Practice', which can be found in full on the Darwin Tree of

Open Peer Review Current Peer Review Status: Version 1
This is an open access peer review report distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.