Dataset supporting description of the new mussel species of genus Gigantidas (Bivalvia: Mytilidae) and metagenomic data of bacterial community in the host mussel gill tissue

This article contains supplementary data from the research paper entitled “A newly discovered Gigantidas bivalve mussel from the Onnuri Vent Field on the northern Central Indian Ridge” [1], describes a new mussel species within the subfamily Bathymodiolinae named Gigantidas vrijenhoeki. Data are comprised of two parts: 1) shell image and molecular analyses of G. vrijenhoeki and 2) metagenomic community analyses of gill-associated symbiotic bacteria on G. vrijenhoeki. G. vrijenhoeki data were obtained from type specimens described in Jang et al. 2020 [1]. The molecular analysis was conducted by calculating genetic distance at intra- and inter-specific level within genus Gigantidas based on the sequence data of two mitochondrial genes (COI and ND4). The metagenomic dataset of gill-associated symbionts were generated by Illumina Miseq sequencing of the V3-V4 region of 16S rRNA from 12 specimens of G. vrijenhoeki collected from the same vent site, Onnuri Vent Field.


a b s t r a c t
This article contains supplementary data from the research paper entitled "A newly discovered Gigantidas bivalve mussel from the Onnuri Vent Field on the northern Central Indian Ridge" [1] , describes a new mussel species within the subfamily Bathymodiolinae named Gigantidas vrijenhoeki . Data are comprised of two parts: 1) shell image and molecular analyses of G. vrijenhoeki and 2) metagenomic community analyses of gill-associated symbiotic bacteria on G. vrijenhoeki. G. vrijenhoeki data were obtained from type specimens described in Jang et al. 2020 [1] . The molecular analysis was conducted by calculating genetic distance at intra-and interspecific level within genus Gigantidas based on the sequence data of two mitochondrial genes ( COI and ND4 ). The metagenomic dataset of gill-associated symbionts were generated by Illumina Miseq sequencing of the V3-V4 region of 16S rRNA from 12 specimens of G. vrijenhoeki collected from the same vent site, Onnuri Vent Field.  Table   Subject Ecology, Evolution, Behavior and Systematics Specific subject area Morphology, molecular evolution, metagenomics, bacterial community analysis Type of data Table  Image Raw DNA sequences How data were acquired Applied Biosystems 3730xl DNA Analyzer (Applied Biosystems Inc, South Korea) for sequencing five gene fragments of the mytilid mussel, Gigantidas vrijenhoeki , and MEGA X software for calculating genetic distances at intra-and inter specific level. Illumina Miseq platform with 2 × 300 bp paired-end protocol and microbiome taxonomic profiling pipeline in EzBioCloud (ChunLab, Inc., Seoul, Korea) for bacterial community analysis of the gill-associated symbionts.

Data format
Raw and Analyzed Parameters for data collection The morphological image and genomic DNA of G. vrijenhoeki were obtained from samples preserved in 95% ethanol. Bacterial community analyses were conducted using gill tissue from samples stored at −80 °C.

Description of data collection
Mussel samples ( Gigantidas vrijenhoeki ) were collected by a video-guided hydraulic grab (Oktopu, Germany). The genomic DNA of mussel specimens was amplified using two mitochondrial genetic markers for COI and ND4 genes, and the microbial 16S rRNA sequences were amplified using V3-V4 primers.

Data source location
Onnuri Value of the Data • These data present comprehensive information on both Gigantidas vrijenhoeki and its bacterial symbionts, which is a new species of genus Gigantidas first discovered at the Central Indian Ridge. • These data could be utilized to research the biodiversity and genetic diversity of vent fauna, and the phylogenetic history of bathymodioline mussels and gill associated symbiotic bacteria. • These data would provide useful information to understand the evolutionary and ecological process of host mussel species and symbiotic bacteria system under the effect of environment.

Data Description
The data in this article were produced using the newly discovered hydrothermal vent mussel, Gigantidas vrijenhoeki , at the Onnuri Vent Field on the northern Central Indian Ridge. Figure 1 represents shell images of three type specimens of G. vrijenhoeki reported in Jang et al., 2020 (paratype #1, #7, and # 10), which highlight the shell variation with growth. We estimated the genetic distance of G. vrijenhoeki at intraspecific and interspecific levels within genus Gigantidas based on mitochondrial COI and ND4 genes using 11 specimens. The mitochondrial DNA sequences of taxa within Gigantidas were downloaded from GenBank of NCBI. Table 1 provides the accession number of each sequence used in this article. Tables 2 and 3 present the genetic distance at the intraspecific and interspecific levels, respectively. Table 4 presents the microbial community composition in the gill tissue of G. vrijenhoeki at order and species levels. The community analyses were based on the V3-V4 region of the 16S rRNA gene. Raw data were deposited in NCBI.

Sample collection
All mussel specimens were collected from the Onnuri Vent Field (11 °24.88'S, 66 °25.42'E) in the Indian Ocean via video-guided hydraulic grab (Oktopu, Germany) during the Korea Institute of Ocean Science and Technology (KIOST) research cruise (Dive number: GTV1809) in 2018. Eleven type specimens of Gigantidas vrijenhoeki were immediately preserved in 95% ethanol at −20 °C and transported to a land-based laboratory. Twelve additional specimens of G. vrijenhoeki and one specimen of Bathymodiolus marisindicus were frozen at −80 °C in an ultra-low freezer on Table 1 GenBank accession number of sequences used to calculate genetic distance.  board for bacterial community analysis. Following this, they were transported to a land-based laboratory on dry ice and stored in −80 °C.

DNA extraction
Genomic DNA of G. vrijenhoeki was extracted from the adductor tissue of eleven type specimens to estimate genetic distance. In addition, genomic DNA was extracted from the gill tissue of an additional twelve specimens for bacterial community analysis. The genomic DNA of B. marisindicus was extracted from the gill tissue of one specimen to compare the bacterial community composition between the two species. DNA extraction was performed using the Qiagen DNeasy Tissue kit (Qiagen Inc., Hilden, Germany).  Table 4 Relative abundance (%) among 16S rRNA reads obtained mussel gill at order level and at species level. Mitochondrial COI and ND4 genes were amplified for molecular analysis. The COI gene was amplified using HCO2148 (5 -CCYCTAGGRTCATAAAAAGA-3 ) and LCO1560 (5 -ATRCTDATTCGWATTGA-3 ) primers [2] . The ND4 gene was amplified using ArgBL (5 -CAAGACCCTTGATTTCGGCTCA-3 ) and NAP2H (5 -TGGAGCTTCTACGTGRGCTTT-3 ) primers [3] . The PCR was performed in a 20 μl solution that included 2 μl of 10 × Taq polymerase buffer, 1 μl of 2.5 mM stock solution of dNTPs, 1 μl of each primer (10 μmol/L), 1

Intra-and Inter-specific genetic distance
Genetic distance was estimated at intraspecific level of G. vrijenhoeki and interspecific level among species within genus Gigantidas . The intraspecific genetic distance was calculated based on mitochondrial COI (532 bp) and ND4 (511 bp) sequences. The interspecific genetic distance was calculated based on COI (401 bp) and ND4 (423 bp) with sequence data of Gigantidas species downloaded from NCBI ( Table 1 ). Both estimations of pairwise genetic distance were based on the Kimura-2 parameter (K2P) model implemented in MEGA X.

Metagenome sequencing
The PCR was conducted with genomic DNA from the gill tissue of mussel specimens, twelve G. vrijenhoeki and one B. marisindicus. 16S rRNA sequences of symbiotic bacteria were amplified using universal primers of the Illumina protocol targeting the V3-V4 region ( www.illumina.com , 16S Metagenomic Sequencing Library Preparation, Part #15044223, revB). The amplicons were sequenced using the Illumina Miseq platform with Miseq Reagent Kit v3 (600 cycles) and a 2 × 300 bp paired-end protocol. The paired-end reads were deposited in the NCBI Sequence Read Archive (SRA) under BioProject PRJNA613556.
The raw sequences were analyzed through the microbiome taxonomic profiling pipeline in EzBioCloud ( https://www.ezbiocloud.net , Chunlab, Inc., Seoul, Korea). Paired-end reads were filtered by quality (Q < 25) [4] , and merged using PANDAseq software [5] . Primers are then trimmed with Chunlab's in-house program at a similarity cut off of 0.8. A denoising step was conducted using Dude-seq software with 0.5% error-correction criteria [6] and non-redundant reads are extracted by UCLUST-clustering [7] . After denoising and dereplication, the taxonomic assignment of sequences was performed using USEARCH [7] with a 97% similarity cut-off for species level identification against the EzBioCloud 16S database. Cutoff values are obtained from Yarza et al. [8] . Chimera sequences were removed using the UCHIME algorithm. Sequence data were clustered using CD-HIT [9] and UCLUST [7] .

Declaration of Competing Interest
The authors declare that they have no known competing financial interests or personal relationships which have, or could be perceived to have, influenced the work reported in this article.