Digital data for Quick Response (QR) codes of thermophiles to identify and compare the bacterial species isolated from Unkeshwar hot springs (India)

16S rRNA sequences of morphologically and biochemically identified 21 thermophilic bacteria isolated from Unkeshwar hot springs (19°85′N and 78°25′E), Dist. Nanded (India) has been deposited in NCBI repository. The 16S rRNA gene sequences were used to generate QR codes for sequences (FASTA format and full Gene Bank information). Diversity among the isolates is compared with known isolates and evaluated using CGR, FCGR and PCA i.e. visual comparison and evaluation respectively. Considerable biodiversity was observed among the identified bacteria isolated from Unkeshwar hot springs. The hyperlinked QR codes, CGR, FCGR and PCA of all the isolates are made available to the users on a portal https://sites.google.com/site/bhagwanrekadwad/.


Type of data
This generated digital information provides a baseline to any researchers by reducing time and cost on identification and comparison of bacterial diversity in hot springs.
The DNA sequence data digitization is a standard, fast and reliable tool for identification of microorganisms up to species level using short DNA sequences.

Experimental design, materials and methods
The Sanger's dideoxy method was adopted for DNA sequencing. 16S rRNA gene sequence analysis was carried out to confirm the identity of bacteria using morphological and biochemical tests. The bacterial cultures were enriched in a nutrient agar medium and the DNA was extracted using a phenol-chloroform method with slight modification. The method was modified as follow. About 2 mL of cell pellet from each enrichment culture of isolate was suspended in extraction buffer containing (100 mM Tris-HCl, pH 8.0, 100 mM Na 2 EDTA (pH 8.0) and Proteinase K (Nitrogen, USA) at the final concentration of 100 mg/mL. The resulting mixture was incubated at 55°C for 2 h with continuous shaking. To this 0.5 M NaCl was added and incubated at 72°C for 30 min. Subsequently, DNA was extracted by phenol:chloroform:isoamyl alcohol (1:1:1). It was washed twice with 70% ethanol and dissolved in Tris-EDTA buffer. The DNA was analyzed by electrophoresis in a 0.8% agarose gel stained with ethidium bromide and visualized under UV trans-illuminator. The 16S rDNA of the enriched strains were amplified with two different pair of eubacteria specific primers (forward primer 530 F: 5 0 GTGCCAGCAGCCGCGG 3 0 and reverse primer 1392 R: 5 0 ACGGGCGGTGTGTAC 3 0 and forward primer Bac 8F: 5 0 AGAGTTTGATCCTGGCTCAG 3 0 and reverse primer 1492 R: 5 0 GGTTACCTTGTTACGACTT 3 0 ). The PCR conditions used were an initial denaturation at 94°C for two minutes, followed by 35 cycles of denaturation at 95°C for one minute, annealing at 55°C for one minute and extension at 72°C for one minute. Finally, extension was given at 72°C for 10 min. The PCR products were electrophoresed in 1% (w/v) agarose gel containing ethidium bromide (1 mg mL À 1 ) so as to get fragments of DNA. The resulting products were purified and directly sequenced on the Amplified Biosystem Model 3730 XI (96 capillaries) DNA sequencer (Amplified Biosystems, Inc., Foster City, Calif, USA). The sequences of bacterial isolates were determined through a BLAST search. Nucleotide sequences were aligned using the software MEGA 6. The phylogenetic tree was constructed by the neighbor-joining method using a distance Matrix from the alignment. Tree files were generated by PHYLIP and viewed by TREEVIEW program. Bootstrap analysis was also carried out to know the evolutionary history of bacteria [1][2][3][4].

Data
The DNA QR codes of identified bacterial species were generated using DNA BarID downloaded from NEERI-CSIR, Nagpur website. The generated QR codes for the species (Table 1) of bacteria have unique QR codes (Table 2) which do not resembles with any other species or strains in any database. Using these QR codes any smart user can scan QR code and read more information on bacterial species. This information is useful to identify and compare the QR-coded isolates or sequences isolated from hot spring environment/extremes. Table 2 QR code generated for FASTA format sequences and Gene Bank (full) information using DNA BarID software.
The generated data were compared with other visual techniques such as CGR and FCGR. The phylogenetic tree was constructed using MEGA6 and PCA for comparative analysis (Figs. 1-3).

Digitization and microbial diversity informatics
QR codes for 16S rRNA gene sequences in FASTA format and for full Gene Bank information was generated using DNA BarID software developed by Purohit et al. [5]. The diversity of microorganisms isolated from various hot springs including Unkeshwar, District Nanded, India (19°85 0 N and 78°25 0 E) were observed and compared using phylogenetic tree and PCA (Figs. 4 and 5).

QR codes hyper links
The QR codes were hyperlinked using Microsoft word processor software. The QR codes of 21 identified bacteria available to any user on a portal https://sites.google.com/site/bhagwanrekadwad/.

Bacterial sequences
The FASTA format sequences and Gene Bank (full) information of 16S rRNA sequences of 21 isolated bacteria identified by us are taken for digitization. 16S rRNA sequences of identified strains submitted to NCBI repository with accession numbers JN392966-JN392971, KC120909-KC120919, KM998072-KM998074 and KP053645. Using 16S rRNA sequences, the generated QR codes, CGR, FCGR and PCA were made available to any user on website https://sites.google.com/site/bhagwanrekadwad/.   with other species isolated from hot springs). The evolutionary history was inferred using the Neighbor-Joining method [6]. The bootstrap consensus tree inferred from 1000 replicates is taken to represent the evolutionary history of the taxa analyzed [7]. Branches corresponding to partitions reproduced in less than 50% bootstrap replicates are collapsed. The evolutionary distances were computed using the Maximum Composite Likelihood method [8] and are in the units of the number of base substitutions per site. The analysis involved 65 nucleotide sequences. All positions containing gaps and missing data were eliminated. There were a total of 591 positions in the final dataset. Evolutionary analyses were conducted in MEGA6 [9].