Genome sequence dataset of Bacillus altitudinis strain ST14 isolated from Tunggak River in Gebeng Industrial Park, Kuantan, Pahang

Bacillus sp. has been reported to be involved in the biodegradation of various hydrocarbon pollutants which can potentially be useful in cleaning up hydrocarbon pollutants. Here we report the draft genome of Bacillus altitudinis strain ST14 isolated from Tunggak river, Gebeng, Kuantan, an area in close proximity to industrial activities. Genome sequencing was conducted using Illumina NovaSEQ 6000 technology. Structural genes in the genome were described, including rRNAs, tRNAs, and ncRNAs. Bacillus altitudinis strain ST14 was sequenced with a length of 3,801,811 bp containing 3,891 coding sequences (CDS). Functional gene annotation reported the presence of six enzymes involved in the degradation of aromatic compounds often found in hydrocarbon pollutants.


Specifications
Biology Specific subject area Microbiology, Genomics, Biotechnology Type of data

Value of the Data
• The draft genome sequence of Bacillus altitudinis strain ST14 provides an understanding of bacteria isolated from Tunggak river, an area in close proximity to industrial activities. • The data obtained from the draft genome of Bacillus altitudinis strain ST14 can be useful to conduct extensive research on biodegradation study. • Data on the genome sequence of Bacillus altitudinis strain ST14 is useful for researchers to better understand the genetic features of the bacterium and insights into its biodegradation properties. • Data can be used for comparative genomics, proteomics, and other evolutionary studies with another Bacillus sp. involved in biodegradation.

Objective
This dataset is generated to investigate the presence of genes involved in the degradation of hydrocarbons in the bacterial isolates, Bacillus altitudinis strain ST14. This can be useful to determine the ability of this bacterium to degrade hydrocarbon, thus, might be used as a reference to develop an effective strategy for eliminating hydrocarbons from the environment.

Data Description
The draft genome of Bacillus altitudinis strain ST14 was sequenced with a coverage of 100.0X and consisted of 17 contigs with an accumulated length of 3,801,811 bp. The draft genome was assembled with an N 50 value of 818,136 bp and 41.24% GC content ( Table 1 ). Fig. 1 shows the sequence quality control assessed in the FASTQC format.
Gene annotation and prediction reveal the presence of six enzymes responsible for aromatic compound degradation in the draft genome, including three alcohol dehydrogenase, one

Experimental Design, Materials and Methods
Tunggak river is located adjacent to Gebeng Industrial Park Kuantan, an area in close proximity to industrial activities. Aromatic hydrocarbons are ubiquitous anthropogenic pollutants that are commonly related to industrial activities and become a global concern due to their harmful and toxic properties [1] . Prolonged exposure to these pollutants leads to an adverse effect on the environment and human beings which needs to be eliminated [2] . The discovery of indigenous bacteria capable of degrading hydrocarbons offers a promising solution to eliminate these pollutants from the environment [3][4][5] . Bacillus sp. has been reported to be involved in the biodegradation of various hydrocarbon pollutants which can potentially be useful in cleaning up hydrocarbon pollutants [6 , 7] . Therefore, the draft genome of Bacillus altitudinis strain St14 was reported in this finding.
Bacillus altitudinis strain ST14 was isolated from water samples obtained from the Tunggak river that was reported to be polluted with oil contamination [8] . The isolate was grown at 30 °C for 20 h on minimal salt medium (MSM) agar enriched with 1% engine oil as the sole carbon source. The obtained colony was streaked on nutrient agar to obtain the pure culture that was able to grow in hydrocarbons. Genomic DNA of Bacillus altitudinis strain ST14 was extracted using Nucleospin® Tissue DNA Extraction Kit (Macherey-Nagel, Düren, Germany) from an overnight culture grown in Luria-Bertani broth at 30 °C and 150 rpm. The NEB Ultra II DNA Library Prep Kit (NEB, Ipswich, MA) was used to construct the DNA library, and whole-genome sequencing was performed on an Illumina NovaSEQ 60 0 0 platform (San Diego, CA). The raw data quality control was conducted using fastQC software version 0.11.9. The reads were quality trimmed using fastp software version 0.21 [9] , and the trimmed reads were used for de novo assembly in SPAdes software version 3.15.0 [10] . Taxonomic identification of the draft genome was conducted using the Jspecies web server based on the BLAST average nucleotide identity (ANIb) algorithm. Annotation of structural genes ( i..e., RNAs, CDS) was carried out using NCBI PGAP [11] . The functional gene annotation related to aromatic compound degradation was conducted using eggNOG [12] and KEGG [13] databases.

Ethics Statements
Ethics statements is not applicable for this study.

Declaration of Competing Interest
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.