Data on genome assembly and annotation of Marinobacter sp. strain CA1 isolated from indigenous diatom found in whiteleg shrimp pond in Malaysia

Marinobacter is a genus belonging to the class Gammaproteobacteria and the family Alteromonadaceae. This genus is a Gram-negative bacterium which can be found in a wide range of marine and saline water environments. Here, we present the genome sequence of Marinobacter sp. strain CA1 that was isolated from the indigenous diatom found in the whiteleg shrimp, Penaeus vannamei (Bonne 1931) pond in Malaysia. Genome sequencing was done using Pacbio and Illumina sequencing platforms. De novo hybrid assembly based on Pacbio long reads and high quality Illumina short reads yielded a complete circular chromosome with 4.7 M base pair (bp) in size. This genome was submitted to Genbank, NCBI database and can be accessed under accession number: NZ_CP071785 and the BioProject acession number PRJNA710741. This data could be useful for other studies on microalgae and bacteria interaction, and comparative genomics analysis of other Marinobacter species.


a b s t r a c t
Marinobacter is a genus belonging to the class Gammaproteobacteria and the family Alteromonadaceae. This genus is a Gram-negative bacterium which can be found in a wide range of marine and saline water environments. Here, we present the genome sequence of Marinobacter sp. strain CA1 that was isolated from the indigenous diatom found in the whiteleg shrimp, Penaeus vannamei (Bonne 1931) pond in Malaysia. Genome sequencing was done using Pacbio and Illumina sequencing platforms. De novo hybrid assembly based on Pacbio long reads and high quality Illumina short reads yielded a complete circular chromosome with 4.7 M base pair (bp) in size. This genome was submitted to Genbank, NCBI database and can be accessed under accession number: NZ_CP071785 and the BioProject acession number PRJNA710741. This data could be useful for other studies on microalgae and bacteria interaction, and comparative genomics analysis of other Marinobacter species.

Value of the Data
• The data of Marinobacter sp. is useful to describe its genomic variation and would contribute to the understanding of its potential as a quorum quencher. • The genome data deposited to a public biological database may benefit researchers working on genomics analysis of bacteria from marine environments, specifically those associated with microalgae. • The data of Marinobacter sp. can be used for comparative genomic analysis useful to be included in comparative genomic analysis to understand the evolution of this bacterial species, genetic adaptation through mutation or horizontal gene transfer, and possible biotechnology application based on its genomic content.

Data Description
We present here the complete genome assembly of Marinobacter sp. strain CA1 that is associated with a diatom species isolated from the whiteleg shrimp, Penaeus vannamei pond in Malaysia. The whole genome sequencing of Marinobacter sp. CA1 generated 4,648,785 pairedend reads and 45,563 reads from Illumina and PacBio platforms, respectively. The genome Marinobacter sp. strain CA1 was sequenced to 100 × depth using Illumina and 50 × depth using Pacbio platform. De novo hybrid assembly of high-quality Illumina short reads and Pacbio long reads yielded a complete circular chromosome with 4,721,802 bp in length and a GC content of 60.99%. The assembly coverage exhibited 47 × which was calculated based on overall 225,261,560 bases that mapped to the genome. A total of 12 ribosomal RNAs (rRNAs), 57 transfer RNAs  (tRNAs) and 4297 genes were predicted and annotated using Prokaryotic Genome Annotation Pipeline (PGAP) pipeline in National Center for Biotechnology Information (NCBI). Based on the 16S rRNA sequence phylogenetic tree of this isolate and other 13 homologous 16S rRNA sequences, this bacteria was found to be closely related with Marinobacter sp. Bu15_23 supported with 100% bootstrap values in both Neighbour-Joining and Maximum-Likelihood trees ( Fig. 1 ) and the data made available in Newick format in Supplementary data. Two betalactone, 1 redox-cofactor and 1 ectoine clusters were identified under biosynthetic gene clusters (BGCs) using AntiSMASH. The betalactone cluster identified from nucleotide 826,411 to 853,236 is 20% homologous to known cluster that produces fengycin secondary metabolite. Redox-cofactor cluster located from nucleotide 3,864,498 to 3,886,676 is 13% similar to the known cluster that produces lankacidin C secondary metabolite ( Table 1 ).

Sample preparation
The diatom sample was isolated from a shrimp farm located at the East Coast of Peninsular Malaysia. A total of 100 μL suspensions was inoculated into 10 mL Marine Broth (MB) (Difco TM , Detroit, USA). The inoculum was incubated at 28 °C overnight under constant agitation.
Then, 100 μL of the sample from the overnight grown culture was plated on Marine Agar (MA) (Difco TM , Detroit, USA) using spread plate technique, incubated at 28 °C overnight before the single colony was inoculated into 10 mL of MB. The inoculum was incubated at 28 °C overnight under constant agitation. The pure inoculum was centrifuged at 4500 g for 10 min and the supernatants were discarded. The cell-pellets (in triplicates) were submerged in 10 mL of molecular grade ethanol (SYSTERM, Malaysia) and submitted to Apical Scientific Sdn. Bhd. (Selangor, Seri Kembangan, Malaysia) for sequencing.

Genome sequencing and assembly
Genome sequencing was done using Pacbio and Illumina sequencing platforms. For Illumina paired end data, the pre-processing step to remove sequence adaptors, reads with length shorter than 50 bp, reads with quality score lower than Qv = 20 was carried out using SolexaQA ++ v . 3.1.7.2 tool [1] . De novo hybrid assembly of PacBio long reads and high quality short reads from Illumina was done using Unicycler v.0.4.9.b program [2] .

Gene prediction and annotation
The genome was submitted to Genbank, NCBI database. Genome annotation was done using Prokaryotic Genome Annotation Pipeline (PGAP) in NCBI [3] . Using this pipeline, tRNA and rRNA sequences were predicted using tRNAscan-SE and BLASTN against the rRNA reference set, respectively. GeneMarkS + was used to predict the protein-coding before annotated using BLASTP search against protein databases. The protein sequences were named based on similar rules adopted by the Uniprot-SwissProt Protein Knowledgebase [4] .

Comparative genome analysis
For phylogenetic analysis, nucleotide sequences of other bacteria that are homologous to 16S rRNA sequence of the bacteria were searched using BLASTN program against NCBI 16S rRNA database. The sequences then were aligned using MUSCLE v3.8.3.1 software [5] before the output alignment was used as an input to construct Neighbor-Joining and Maximum-Likelihood phylogenetic trees using phylogeny.fr [6] and PhyML 3.0 [7] , respectively, with 10 0 0 bootstrap samples. Search of gene clusters for secondary metabolite production was conducted using the antiSMASH 5.0 program [8] .

Ethics Statement
This work did not involve the use of human subjects or animal experiments.

Declaration of Competing Interest
The authors declare that they have no known competing financial interests or personal relationships which could be perceived to influence the work reported in this article.