Data on the genome analysis of the probiotic strain Bacillus subtilis GM5

In the present study, we report data on the draft genome sequence of a lipopeptide producing rhizospheric Bacillus subtilis GM5 isolate. The genome consists of 4,271,280 bp with a GC-pair content of 43.3%. A total of 4518 genes including 75 tRNA genes, 3 operons coding for rRNA genes and 56 pseudogenes were annotated. Gene clusters responsible for the biosynthesis of secondary metabolites were validated. Six of the thirty-three clusters identified in the genome code for antimicrobial non-ribosomal peptides synthesis. The Whole Genome Shotgun project of B. subtilis GM5 has been deposited in the NCBI database under the accession number NZ_NKJH00000000 (https://www.ncbi.nlm.nih.gov/nuccore/NZ_NKJH00000000.1).


a b s t r a c t
In the present study, we report data on the draft genome sequence of a lipopeptide producing rhizospheric Bacillus subtilis GM5 isolate. The genome consists of 4,271,280 bp with a GC-pair content of 43.3%. A total of 4518 genes including 75 tRNA genes, 3 operons coding for rRNA genes and 56 pseudogenes were annotated. Gene clusters responsible for the biosynthesis of secondary metabolites were validated. Six of the thirty-three clusters identified in the genome code for antimicrobial non-ribosomal peptides synthesis. The Whole Genome Shotgun project of B. subtilis GM5 has been deposited in the NCBI database under the accession number NZ_NKJH00000000 (https://www.ncbi.nlm.nih.gov/nuccore/NZ_ NKJH00000000. 1 Value of the data The data on the B. subtilis GM5 genome is resourceful and can be utilized in understanding their potential biotechnological applications. B. subtilis GM5 was concluded to be a promising strain for use as probiotics.
33 potential clusters of secondary metabolite synthesis have been identified in the genome of B.
subtilis GM5 strain, including six gene clusters participating in the biosynthesis of non-ribosomal peptide synthetase (NRPS).
The data demonstrated here can be used by other researchers working or studying in the field of genome analysis.
The data presented expands the molecular information on the diversity of B. subtilis.

Data
The Bacillus subtilis GM5 isolated from the rhizosphere of potatoes possesses remarkable antimicrobial properties [1] and a probiotic potential [2]. Previous studies have shown that GM5 shows antagonism to pathogenic and opportunistic enterobacteria via the production of cyclic lipo-and dipeptides [1,2]. The strain was resistant to 1-10% chicken bile and to a wide range of the ambient pH. B. subtilis GM5 possessed proteolytic and phytate-hydrolyzing activity and proved to be safe for model animals. Scanning electron microscopy (SEM) of isolated culture showed the presence of rod-shaped cells that are approximately 0.63-0.71 mm in width and 1.80-2.50 mm in length (Fig. 1).
Based on the homology of the 16 S rRNA gene, strain GM5 shares similarity with B. subtilis 168 (98% for a 1010 bp sequence). Using a 16 S rRNA based tree, the phylogenetic affiliation of B. subtilis GM5 to closely related species within the genus is exhibited (Fig. 2).
The RAST server predicted 4,479 coding sequences, of which 2101 coding sequences (47%) were annotated as seed subsystem features and 2378 coding sequences (53%) annotated as outside of the seed subsystem (Fig. 3).   Table 1 Comparison of the genomic feature of B. subtilis GM5 strain with various Bacillus strains. The information regarding the reference genomes was received from PGAAP and the antiSMASH server [12].
Thus, the listed gene clusters found in the genome of B. subtilis GM5 are responsible for the synthesis of the non-ribosomal cyclic lipopeptides: fengycin, plipastin, surfactin, bacillaene, bacilysin and bacillibactin dipeptide. Thus, an important feature of the B. subtilis GM5 genome lies in the fact that much of its genetic material is devoted to the biosynthesis of secondary metabolites. By utilizing these metabolites, B. subtilis GM5 is able to suppress pathogenic and conditionally pathogenic microflora, capable of causing intestinal dysbiosis in chickens. The B. subtilis GM5 strain has great prospects as a potential probiotic.

Morphological analysis
The bacterial morphology was investigated using electron scanning microscopy according to the method described in Ref. [10].

Phylogeny analysis
The initial phylogeny of the isolate GM5 was studied using 16 S rRNA analysis. Phylogenetic analysis of the strain GM5 was performed using MEGA 7.0.14 software. Phylogenetic tree was generated using the Maximum likelihood (ML) algorithm with 1000 bootstrap iterations. The strain was deposited in the museum of the laboratory "Biosynthesis and Bioengineering of Enzymes" (Kazan Federal University, Russia).

Genomic DNA preparation
B. subtilis strain GM5 was inoculated in 20 ml of LB medium and grown overnight at 37°C with rocking rate of 200 rpm. 10 mL were centrifuged at 5000 Â g for 10 min at 4°C and genomic DNA was extracted using Kit NucleoSpin s Microbial DNA. The quality of the final DNA sample were evaluated by gel electrophoresis (1.5% agarose gel) and DNA concentration was estimated by using a NanoDrop 2000с Spectrophotometer (Thermo Scientific). In total, 500 ng/mL of genomic DNA was received and sent for the sequencing.

Genome sequencing and assembly
Whole-genome sequencing of strain GM5 and analysis of genes responsible for the synthesis of antimicrobial peptides were conducted. DNA sequencing was performed using Illumina MiSeq technology by the paired-end method. The quality of the sequencing was checked using the FastQC_v0.11.3. software. De novo assembly and analysis of contigs were carried out using assembler SPAdes_v3.8.1. The statistics of the assembly was calculated with the QUAST_v2.3 program.

Genome annotation
The genome was annotated using the PGAAP NCBI program and the RAST web server [11]. The antiSMASH program (antibiotics & Secondary Metabolite Analysis Shell) [12] was used to analyze the clusters of the antimicrobial metabolites of the GM5 strain, Bacillus subtilis TO-A JPC, Bacillus coagulans S-lac, Bacillus toyonensis BCT-7112. The complete strain genome (FASTA file) was uploaded to the public web version of antiSMASH. The result of the analysis is presented on an interactive HTML page with SVG graphics, and the various parts of the analysis have been displayed in different panels for each gene cluster.