Whole genome sequence data of a marine bacterium, Marinobacter adhaerens PBVC038, associated with toxic harmful algal bloom

Marinobacter adhaerens (PBVC038) was isolated from a harmful algal bloom event caused by the toxic dinoflagellate Pyrodinium bahamense var. compressum (P. bahamense) in Sepanggar Bay, Sabah, Malaysia, in December 2012. Blooms of P. bahamense are frequently linked to paralytic shellfish poisoning, resulting in morbidity and mortality. Prior experimental evidence has implicated the role of symbiotic bacteria in bloom dynamics and the synthesis of biotoxins. The draft genome sequence data of a harmful algal bloom-associated bacterium, Marinobacter adhaerens PBVC038 is presented here. The genome is made up of 21 contigs with an estimated 4,246,508 bases in genome size and a GC content of 57.19%. The raw data files can be retrieved from the National Center for Biotechnology Information (NCBI) under the Bioproject number PRJNA320140. The assessment of bacterial communities associated with harmful algal bloom should be studied more extensively as more data is needed to ascertain the functions of these associated bacteria during a bloom event.


a b s t r a c t
Marinobacter adhaerens (PBVC038) was isolated from a harmful algal bloom event caused by the toxic dinoflagellate Pyrodinium bahamense var. compressum ( P. bahamense ) in Sepanggar Bay, Sabah, Malaysia, in December 2012. Blooms of P. bahamense are frequently linked to paralytic shellfish poisoning, resulting in morbidity and mortality. Prior experimental evidence has implicated the role of symbiotic bacteria in bloom dynamics and the synthesis of biotoxins. The draft genome sequence data of a harmful algal bloom-associated bacterium, Marinobacter adhaerens PBVC038 is presented here. The genome is made up of 21 contigs with an estimated 4 , 246 , 508 bases in genome size and a GC content of 57.19%. The raw data files can be retrieved from the National Center for Biotechnology Information (NCBI) under the Bioproject number PRJNA320140. The assessment of bacterial communities associated with harmful algal bloom should be studied more extensively as more data is needed to ascertain the functions of these associated bacteria during a bloom event.

Value of the Data
• The genome analysis showed the presence of Marinobacter adhaerens strain PBVC038 in a toxic algal bloom associated with the P. bahamense in Sepanggar Bay, Kota Kinabalu, Sabah. • The dataset aids in understanding the function of an associated bacterium, Marinobacter adhaerens , during a toxic bloom, by analyzing the genes associated with bloom dynamics and the biosynthesis of toxins. • Marine researchers can use the data to compare with the genomes of other harmful algal bloom-associated bacteria. • Researchers could use the genome information provided here to comprehend the geneticrelated mechanisms of a marine bacterium responsible for harmful algal bloom and toxin production.

Objective
The primary goal of generating this dataset was to study how the associated bacteria contribute to bloom dynamics and the production of biotoxins based on next-generation sequencing. Previous studies on the associated bacteria of the same harmful marine microalga, P. bahamense var. compressum , used the 16S rRNA gene as a DNA barcoding marker and 16S metagenome sequencing to study the associated bacterial community. The studies uncovered several intriguing bacterial species previously reported to be associated with toxic, harmful algal bloom and involved directly and indirectly with toxin production. Hence, this genome dataset was acquired to analyze the genes associated with bloom dynamics and toxin biosynthesis comprehensively.

Data Description
A harmful algal bloom of the toxic microalga, P. bahamense , occurred in the coastal waters of west Sabah, Malaysia, in December 2012. Following the detection of the toxic bloom in the coastal waters, the state Department of Fisheries issued a warning to the public to prevent them from consuming any shellfish or bivalves harvested from the affected areas. The shellfish samples collected from the affected regions have toxin levels as high as 4010 Mouse Units (MU), where 400 MU is considered the lowest amount harmful to humans [2] . We isolated a rod-shaped marine bacterium, Marinobacter adhaerens PBVC038 ( Fig. 1 ), from the harmful alga, P. bahamense , and confirmed its identity through 16S rRNA Sanger sequencing [3] . Bacteria from the genus Marinobacter are the largest genus in the family Marinobacteraceae, and are frequently reported to be associated with harmful algal bloom [4] . Next, we performed whole genome sequencing of the associated bacterium. The whole-genome sequencing generated approximately 593.3 Mb total bases, totaling about 1,203,928 raw pair-end reads. The draft genome has a total size of 4,246,508 bp after assembly, with 21 contigs, a GC% of 57.2, and an N50 score of 1,0 0 0,299 bp. The sequencing coverage was 100 ×. CheckM analysis confirmed that the genome has 100% completeness and 0% contamination. The graphical representation in Fig. 2 summarizes the functional annotation of the M. adhaerens PBVC038 genome, where 3891 protein-coding genes were identified ( Table 1 ). A core genome phylogenetic tree was constructed using PBVC038 and a group of M. adhaerens genomes retrieved from the NCBI databases ( Fig. 3 ). The phylogeny showed that M. adhaerens PBVC038 was grouped with the Brazilian M. adhaerens t76_800, Saudi Arabian M. adhaerens MES15 and American M. adhaerens ih7, with the bootstrap value of 85-100%. The strain PBVC038 had an average nucleotide iden-    tity (ANI) of more than 95% with its closely related M. adhaerens strains t76_800, MES15, and ih7 ( Table 2 ).

Study Area
The bacterium was isolated from Sepanggar Bay, Sabah, Malaysia, one of the affected regions impacted by the 2012 harmful algal bloom. During the sample period, high paralytic shellfish toxin or saxitoxin levels of up to 3300 Mouse Units were detected [5 , 6] .

Isolation of Bacterium M. adhaerens PBVC038
Seawater samples from the bloom areas were collected using a plankton net of 20 μm mesh size and stored in sterile sampling bottles. Single cells of P. bahamense were isolated from the newly acquired seawater samples under a light microscope using the drawn-out micropipettes [7] . The unialgal culture of P. bahamense was established under the following conditions: the temperature at 28 °C in a 12-hour shift of light-dark cycle and light intensity of 150 μEm -2 s −1 . A total of 1 ml of P. bahamense culture at mid-exponential growth phase was aliquot onto a sterile marine agar (Difco) and incubated for at least 72 h at 37 °C. Once the bacteria had grown, the individual bacterial colonies were sub-cultured onto a fresh, sterile marine agar plate. The bacterium PBVC038 was effectively recovered from the agar plate after an overnight 37 °C incubation period.

Scanning Electron Microscopy (SEM)
In order to prepare for SEM, the bacterium was cultivated in marine broth (Difco) and agitated at 200 rpm in an orbital shaker, with an incubation temperature of 25 °C. The bacterial cells were collected by centrifuging at 11,0 0 0 × g for 15 min after two days of incubation. The sample preparation for SEM was carried out according to Yik et al. [8] . The bacterial sample was fixed in a solution of 5% glutaraldehyde and 0.1 M phosphate buffer (pH 7.2), which was then left to sit at 4 °C for 24 h. The sample was washed with phosphate buffer before being gradually dehydrated in 35, 50, 75, and 95% ethanol solutions. Next, absolute ethanol was used twice to complete the dehydration process. Each ethanol solution wash step took five minutes to complete. The sample was mounted onto an SEM specimen stub and covered in gold after being air dried in a desiccator at room temperature for 16 h. The sample was viewed with a Hitachi S-3400 N scanning electron microscope.

DNA Extraction
For DNA extraction, the overnight (15-16 h) bacterial cell culture was pooled by centrifuging at 50 0 0 × g for 10 min. The DNA extraction method was carried out according to protocols described in the Qiagen DNeasy Blood and Tissue Kit manual (Qiagen Biotechnology). The quality of the extracted DNA was examined using the Qubit dsDNA HS Assay kit (Thermo Fisher Scientific).

Genome Sequencing, Assembly, and Annotation
A standard genomic 250 bp paired-end library was produced after DNA extraction using the Illumina Nextera XT DNA Library Preparation Kit. Libraries were quantified and normalized to 4 nM before pooling and sequencing the libraries using the Illumina MiSeq system. Quality filtering of the acquired sequencing reads was conducted using the Fastq Quality Filter in Fastx Toolkit [9] . The adapter sequence was removed using Scythe v0.994 [10] , whereas the trimming of the reads based on a Q trimming threshold of 25 was performed using Trimmomatic v0.35 [11] . Iterative De Brujin Graph De Novo Assembler IDBA-UD software was used to de novo assemble the trimmed reads [12] , and the completeness of the genome was analyzed using CheckM [13] . The functional annotations were performed using PROKKA [14] and the RAST server [15] .

Phylogenetic Tree Analysis
A total of 21 genomes of Marinobacter strains were subjected to pan-genome analysis using Roary v3.11.2 [16] . The input files used were the GFF3 (format generated from the annotated assembly (Prodigal) of Marinobacter adhaerens PBVC038) and GenBank files obtained from the NCBI website (20 strains). The threshold of 99% was used to determine core genes across all strains, whereas BLASTP was used to compare the sequences of different strains using a user-defined percentage of sequence identity of 95% [17] . The division of homologous groups comprising paralogs into groups of true orthologs was carried out using data on conserved gene neighbourhoods. Gene presence in the accessory genome was used to group the strains together, which was weighted by the total and shared genomes. A core-genome phylogenetic tree was constructed using the maximum likelihood method in MEGAX [18] with a bootstrap analysis of 10 0 0 replicates. Average nucleotide identities were calculated using ANI calculator ( http://enve-omics.ce.gatech.edu/ani/ ) [19] .

Ethics Statement
Since the work provided here did not involve employing human participants, animal trials, or data obtained from social media platforms, no ethical statements were required for this data.

Declaration of Competing Interest
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Data Availability
Draft Genome Sequence of Marinobacter adhaerens, strain PBVC038 (Original data) (Sequence Read Archive (SRA) of National Center of Biotechnology Information).