Correcting names of bacteria deposited in National Microbial Repositories: an analysed sequence data necessary for taxonomic re-categorization of misclassified bacteria-ONE example, genus Lysinibacillus

A report on 16S rRNA gene sequence re-analysis and digitalization is presented using Lysinibacillus species (one example) deposited in National Microbial Repositories in India. Lysinibacillus species 16S rRNA gene sequences were digitalized to provide quick response (QR) codes, Chaose Game Representation (CGR) and Frequency of Chaose Game Representation (FCGR). GC percentage, phylogenetic analysis, and principal component analysis (PCA) are tools used for the differentiation and reclassification of the strains under investigation. The seven reasons supporting the statements made by us as misclassified Lysinibacillus species deposited in National Microbial Depositories are given in this paper. Based on seven reasons, bacteria deposited in National Microbial Repositories such as Lysinibacillus and many other needs reanalyses for their exact identity. Leaves of identity with type strains of related species shows difference 2 to 8 % suggesting that reclassification is needed to correctly assign species names to the analyzed Lysinibacillus strains available in National Microbial Repositories.

16S rRNA Bacteria Culture collection DDH Digitalization a b s t r a c t A report on 16S rRNA gene sequence re-analysis and digitalization is presented using Lysinibacillus species (one example) deposited in National Microbial Repositories in India. Lysinibacillus species 16S rRNA gene sequences were digitalized to provide quick response (QR) codes, Chaose Game Representation (CGR) and Frequency of Chaose Game Representation (FCGR). GC percentage, phylogenetic analysis, and principal component analysis (PCA) are tools used for the differentiation and reclassification of the strains under investigation. The seven reasons supporting the statements made by us as misclassified Lysinibacillus species deposited in National Microbial Depositories are given in this paper. Based on seven reasons, bacteria deposited in National Microbial Repositories such as Lysinibacillus and many other needs reanalyses for their exact identity. Leaves of identity with type strains of related species shows difference 2 to 8 % suggesting that reclassification is needed to correctly assign species names to the analyzed Lysinibacillus strains available in National Microbial Repositories

Data
Data analysis was started in early 2016. Lysinibacillus species 16S rRNA gene sequence accession number were picked from respective Microbial Repositories web catalogue. 16S rRNA gene sequences of Lisinibacillus species were downloaded from NCBI website ( https://www.ncbi.nlm.nih.gov/nuccore) from January-May in the year 2016.
The thoroughly investigated dataset of this article provides information on the misclassified and misplaced bacteria in the microbial culture collections/repositories in India. Figs. 1-6 and Tables 1-6 explain datasets of the misclassified bacteria. Table 7 Output of sequence data on EzBioCloud's Identify service (http://www.ezbiocloud.net/ identify) database supporting our finding paper is tabulated.

Experimental design, materials and methods
Twenty-four Lysinibacillus strains deposited in renowned microbial culture collections in India were used as a model case for this study (Table 1).

Background
At present, the 16S rRNA genes are the key for the taxonomic categorization of Bacteria and Archaea. This is due to the existence of extensive sequence information on 16S rRNA genes in public repositories [1] and well curated databases [2]. Nevertheless, the identification of unknown or newly sequenced strains involves comparison with these databases and often a subjective and/or ambiguous set when differentiating novel strains by their 16S rRNA gene sequence. For instance, some 16S rRNA gene sequences are too short limiting the information that can be extracted for comparison and identification. Thus, the accurate identification or classification of strains needs a simple and quick pipeline besides more advanced procedures involving polyphasic approaches (including phenotypic and genomic techniques) for the definitive classification of species. The aim in microbial strain identification and differentiation is to have an available pipeline for unambiguous classification. This paper describes new types of analyses for strain differentiation based on sequence analyses which are easy to perform.

Results
QR codes prepared from 16S rDNA sequences of Lysinibacillus species were unique. Any user can scan QR code using a smart phone and retrieve the sequence (Fig. 1).
CGR and FCGR were used for visual interpretation of the appearance of nucleotides in 16S rRNA genes. Each CGR image has four corners. Upper two corner from left to right were C and T/U, while lower two corners from left to right were A and G. Each CGR square has four sub-squares for nucleotides viz. C, G, A and T/U. A number of dots appeared in sub-square is directly proportional to the number of nucleotides. Distribution of each nucleotide in sub-square indicates the appearance of base pairs in the analyzed gene i.e. sequence, number and percentage (Fig. 2).
Unlike CGR, FCGR presents a different type of visual datasets. Distribution of nucleotides in these matrices is diverse among the studied strains. The FCGR scale indicates from poorly represented dinucleotides (white or light colored) to frequently observed dinucleotides (darkest squares) (Fig. 3).
The BLAST analysis of JQ964026, JQ964029, JX081387 and GU815938 sequences showed 93%, 92%, 90% and 90% identity with existing species and type strains. This was confirmed from phylogenetic analysis, principal component analysis and GGDC-DDH results. The phylogenetic tree was constructed including Lysinibacillus and phylogenetically related species with bootstrap values corresponding to 1000 replicates (Fig. 5).
The 16S rRNA gene sequences JQ964026, JQ964029, JX081387 and GU815938 showed identities lower than 97% (90-93% with existing species and type strains) (Table 3) suggesting that they could potentially belong to different species. Table 3 suggests a clear distinction between Lysinibacillus strains below the expected level for species differentiation. Results of Principal Component Analysis comparing the 16S rRNA gene sequences (Fig. 6) revealed different groups which could be related to major novel species or taxa within the Lysinibacillus genus.
Most of these strains were isolated from environmental samples such as boron containing soil, forest humus collected from Gyeryong Mountain in Korea, Environmental Treatment Plant Naroda G.I. D.C., Ahmedabad, Gujarat (India) and textile mill effluent contaminated soil etc., followed by acclimatization on the presence of different chemicals such as Boron, Sodium Chloride, Xylan, dyes etc [15][16][17][18]. This information suggests that different adaptations could result in differential strains with distinctive 16S rRNA gene sequences. GGDC-DDH analysis with type strains indicated all species has G þC difference ranged from 15.44 to 20.86 (Tables 4 to 6). These analyses suggest that the Lysinibacillus strains could represent distinct species deposited in Indian Microbial Repositories. Thus, there is a gap of information on accurate classification within this genus and specifically on this group of strains that have been used as a model case to describe this current identification issue.   (g) Doubtful contigs or single long and unassembled sequence Based on above seven reasons, bacteria deposited in National Microbial Repositories such as Lysinibacillus either need to be re-sequenced for 16S rRNA gene and should be reanalysed on EzBioCloud's database for their exact identity or identified using appropriate valid techniques (Table 7).   Table 7 Output of sequence data on EzBioCloud's Identify service (http://www.ezbiocloud.net/identify) database supporting our finding.

Discussion
This study provides a pipeline to structure 16S rRNA gene sequence information constructing digitalized datasets on Lysinibacillus strains currently present in several culture collections (GSBTM Gujarat, NCMR-NCCS Pune and NCIM-NCL Pune) in India and many other National Culture Collections in the world. This information contributes to identify, compare, evaluate, interpret strain, species differentiation for novel isolates from environmental samples and make compulsory rule to investigate the correct identity of bacteria with them. Differentiation of bacteria obtained from an environment results in a relatively complicated task when those bacteria are phylogenetically closely related among them. This issue gets enhanced when comparing and classifying bacteria related to poorly curated sequence data and scarcely analyzed strains lacking a fulfillment of polyphasic recommendations. An easy differentiating pipeline represents a greatly useful tool for a large number of applications including species classification of new isolates from natural and artificial environments. The type of digitalized data from this study can be produced for any prokaryotic species and eukaryote sequence data. It could be expanded to the use of genomes or different genes or sets of genes. Overall, the enlisted data and protocol will be useful to research and industry. The proposed pipeline greatly contributes to simplify the identification and differentiation of unclassified strains and the needs for reclassification of some previously isolated microorganisms, including the detection of microbes based on 16 S rRNA gene sequence information from microbial community surveys. The proposed approach can increase its specificity and applicability as needed using different genes or genome sequence information. Thus, this protocol allows the phenotype and genotype characteristic for reintroduction and taxonomic categorization of species in current pipeline.

Conflicts of interests
The author declares there are no any conflicts of interest.