Genome sequence data of Bacillus sp. CCB-MMP212 isolated from Malaysian mangrove: A potential strain in arsenic resistance with ArsI, C•As lyase

Bacillus sp. CCB-MMP212 is a Gram-positive bacterium isolated from mangrove sediment in Matang Perak, Malaysia (4.85496°E, 100.73495°N). Genome sequencing was performed using the Oxford Nanopore and Illumina platforms. The assembled genome was annotated using the rapid annotation subsystem technology server (RAST) (rast.nmpdr.org). The genome size of the Bacillus sp. CCB-MMP212 was 6,151,644 base pairs (bp) with a G+C content of 34.75%. The genome includes 6,311 coding sequences and 58 RNAs. The sequence has been deposited at Genbank with the accession number of JALDQE000000000. Interestingly, an arsenic resistance (ars) operon consisted of arsenic resistance operon repressor (arsR), ACR3 family arsenite efflux transporter (arsB), and arsenate reductase (arsC) genes were found in the genome. In addition, the arsenic inducible gene (arsI), which encoded a dioxygenase with C•As lyase activity, was also found in the ars operon. The enzyme is crucial for the methylation of methylarsonous acid [MAs(III)] and trivalent roxarsone [Rox(III)]. This dataset reveals the genetic ability of this strain in arsenic resistance. To the best of our knowledge, the arsI encoding C•As lyase is rarely reported within the genus Bacillus. Therefore, the dataset presented in this manuscript provides further insight into the arsenic resistance mechanisms of the genus Bacillus.


a b s t r a c t
Bacillus sp. CCB-MMP212 is a Gram-positive bacterium isolated from mangrove sediment in Matang Perak, Malaysia (4.854 96 °E, 100.734 95 °N). Genome sequencing was performed using the Oxford Nanopore and Illumina platforms. The assembled genome was annotated using the rapid annotation subsystem technology server (RAST) (rast.nmpdr.org). The genome size of the Bacillus sp. CCB-MMP212 was 6,151,644 base pairs (bp) with a G + C content of 34.75%. The genome includes 6,311 coding sequences and 58 RNAs. The sequence has been deposited at Genbank with the accession number of JALDQE0 0 0 0 0 0 0 0 0. Interestingly, an arsenic resistance (ars) operon consisted of arsenic resistance operon repressor ( ars R), ACR3 family arsenite efflux transporter ( ars B), and arsenate reductase ( ars C) genes were found in the genome. In addition, the arsenic inducible gene ( ars I), which encoded a dioxygenase with C • As lyase activity, was also found in the ars operon. The enzyme is crucial for the methylation of methylarsonous acid [MAs(III)] and trivalent roxarsone [Rox(III)]. This dataset reveals the genetic ability of this strain in arsenic resistance. To the best of our knowledge, the ars I encoding C • As lyase is rarely reported within the genus Bacillus . Therefore, the dataset presented in this manuscript provides further insight into the arsenic resistance mechanisms of the genus

Value of the Data
• The whole-genome sequence of Bacillus sp. CCB-MMP212 could provide valuable information to researchers working on the Bacillus strain with the potential for arsenic resistance. • The Bacillus sp. CCB-MMP212 could be a referral strain for the ars I encoding C • As lyase in the genus Bacillus . • The whole-genome sequence of Bacillus sp. CCB-MMP212 can contribute to the understanding of molecular information and related characteristics of this strain. • The data can be used by researchers working in the field of Microbiology, Genomics, and Molecular Biology.

Data Description
Bacillus sp. CCB-MMP212 was isolated from mangrove sediment during the microbial diversity investigation of Matang Mangrove Forest, Perak, Malaysia. This study presents the complete whole-genome sequence of Bacillus sp. CCB-MMP212. The genome sequencing was performed using the Oxford Nanopore and Illumina platforms. The assembled genome was annotated using the rapid annotation with the RAST server (RAST) (rast.nmpdr.org) [1] . The result shows that the genome contained 6,151,644 base pairs (bp) with a G + C content of 34.75%. The genome includes 6,311 coding sequences and 58 RNAs. The assembly statistics and genomic features of Bacillus sp. CCB-MMP212 were summarised in Table 1 . Bacillus sp. CCB-MMP212 wholegenome sequence was used to construct an accurate evolutionary relationship with other bacterial whole genomes closely related to Bacillus species using the Type Strain Genome Server, (TYGS) ( https://tygs.dsmz.de ) [2] . Fig. 1 shows that Bacillus sp. CCB-MMP212 is closely related    categories were amino acids and derivatives (384), carbohydrates (281), cofactors, vitamins, prosthetic groups, and pigments (158). Interestingly, an ars operon consisting of asrR, I, B , and C was present in the genome ( Table 3 ). Yoshinaga and colleagues reported that trivalent organoarsenicals, such as MAs(III) and Rox(III), are degraded to As(III) by ArsI with C • As lyase activity [6] . Then, As(III) might be released from the cell by an arsenite efflux permease, ArsB. Thus, bacteria with C • As lyase, including CCB-MMP212, might play an important role in arsenic biogeocycle through the degradation of environmental organoarsenicals.

Sample collection
Bacillus sp. CCB-MMP212 was isolated from sediment in Matang Forest Mangrove, Perak, Malaysia. The strain was deposited in the Centre for Chemical Biology-Microbial Biodiversity Library (CCB-MBL) in freeze-dried form and was stored in 40% glycerol stock at −80 °C.

DNA Extraction
The DNA extraction was performed according to the method of Sokolov [7] with slight modifications. Bacterial resuspension was spun down and supernatant (ethanol) was removed via decantation. The pellet was resuspended in 500 μL of lysis buffer (50 mM NaCl, 50 mM Tris-HCl pH8, 50 mM EDTA, 2% SDS) and incubated for 30 min at 60 °C. A volume of 3 μL RNAse A (10 mg/mL) was added to the lysate and incubated for 10 min at room temperature. A volume of 50 μL (0.1x vol) saturated KCl was added at 4 °C for 5 min to remove the salt. The lysate was extracted once with an equal volume of chloroform to remove the remaining proteins. The aqueous layer containing the DNA was mixed with an equal volume of isopropanol and 20 μL of solid-phase reversible immobilization (SPRI) bead to promote the binding of DNA onto the solid carboxylated layer [8] . The mixture was incubated for 10 min at room temperature. Then the mixture was placed on a magnetic rack for 2 min and the supernatant was discarded. The bound magnetic bead was washed twice with 75% ethanol. The bead was resuspended in 100 μL of TE buffer, then incubated at 50 °C for 5 min to extract the DNA.

Nanopore and Illumina library preparation and genome sequencing
According to the manufacturer's instructions (Oxford Nanopore, UK), approximately 400 ng of DNA as measured by Qubit was fragmented with the Nanopore rapid barcoding kit. On a Nanopore Flongle flow cell, the sample was sequenced. Guppy v4.4.1 was used to extract the fast5 file (high accuracy mode) [9] . Approximately 100 ng of DNA was fragmented to 350 bp using a Bioruptor, then the NEB Ultra II library preparation kit for Illumina was used according to the manufacturer's instructions (NEB, Ipswich, MA). Each sample was sequenced on a No-vaSEQ60 0 0 (Illumina, San Diego, CA), yielding approximately 1 gb of paired-end data (2 ×150 bp).

Hybrid De novo assembly -Nanopore and Illumina
Raw nanopore reads were quality-and length-filtered to retain reads with scores of 7 or higher that were longer than 2,0 0 0 bp. The filtered Nanopore was then used in combination with the Illumina reads for hybrid assembly with Unicycler (default settings) [10] . Contigs shorter than 500 bp were removed, and the filtered assembly was used for further analysis.

Funding Sources
This work was supported by the Short-Term Grant ( 304/PCCB/6315540 ) by Universiti Sains Malaysia awarded to Nor Azura.

Declaration of Competing Interest
The authors declared that they have no conflicts of interest.