Metagenome dataset of lateritic soil microbiota from Sadaipur, Birbhum, West Bengal, India

The data represents the bacterial community profile obtained through metagenomic sequencing of soil sample, collected from the ‘Rarh’ region of West Bengal, which is characterized by the lateritic badlands dating back to the late Pleistocene. Taxonomic binning and operational taxonomic unit (OTU) prediction of the Illumina sequencing data indicated the abundance Proteobacteria (61%) followed closely by Bacterioidetes (35%). The top two most abundant genera identified, were Sphingobacterium and Acinetobacter respectively. Chemical properties of soil, such as pH, organic carbon content, available nitrogen, phosphorus, and potassium were also analyzed for enabling future researchers to correlate the abundance of microbial taxa with the prevalent conditions. These findings can be effectively used to formulate strategic microbiome engineering through bioaugmentation for a sustainable agricultural system.


a b s t r a c t
The data represents the bacterial community profile obtained through metagenomic sequencing of soil sample, collected from the 'Rarh' region of West Bengal, which is characterized by the lateritic badlands dating back to the late Pleistocene. Taxonomic binning and operational taxonomic unit (OTU) prediction of the Illumina sequencing data indicated the abundance Proteobacteria (61%) followed closely by Bacterioidetes (35%). The top two most abundant genera identified, were Sphingobacterium and Acinetobacter respectively. Chemical properties of soil, such as pH, organic carbon content, available nitrogen, phosphorus, and potassium were also analyzed for enabling future researchers to correlate the abundance of microbial taxa with the prevalent conditions. These findings can be effectively used to formulate strategic microbiome engineering through bioaugmentation for a sus-

Value of the Data
• The presented dataset is the first report on the status of microbiome in lateritic soil of 'Rarh region' of West Bengal. The soil is low in mineral nutrients, organic carbon content and has very low water holding capacity making agricultural practices difficult. Agricultural and environmental biologists will be benefitted from the present data. • These data will help researchers to mitigate the problem of poor fertility by exploration of different microbial consortia and organic additives incorporated in traditional agricultural methods. • From this data set, it may be possible to find out potentially beneficial soil bacteria having novel genes coding for enzymes with nutrient enhancing ability. A proper experiment can be designed based on these findings for utilization of these microbes to improve soil productivity for a sustainable agriculture.

Data Description
Loose, friable, nutrient-depleted lateritic soil [1] was collected, and analyzed for the chemical properties like pH, organic carbon content, available nitrogen, phosphorus and potassium. ( Table 1 ). The same soil was used for sequencing using Illumina Miseq platform, and a total of 1,60,609 reads were obtained, out of which 197 reads did not pass quality filtering step. Finally 1,60,412 reads were subsequently used for analysis ( Fig. 1 a). Only 2% of the reads represented Archaeal members and rest 98% were from Bacterial phylum. Proteobacterial abundance was predominant ( Fig. 1 b). At the genus level, Sphingobacterium (35%), Acinetobacter (31%) and Pseudomonas (7%) were the top three members in the total distribution of genera in the sample ( Fig. 1 c). Among the Proteobacterial members, Gammaproteobacteria (59%) was the most abun-  dant followed by alpha, and beta, with predominance of Sphingobacteriaceae, Moraxallecaeae, Enterobacteriaceae, and Pseudomonadiales ( Fig. 1 d). In the Bacillus clade , Bacillus cereus (17%) and Bacillus thuringensis (6%) were identified to be the most abundant members ( Fig. 1 e).

Soil collection and chemical analysis
Soil sample was collected using the protocol as recommended by TNAU-2013 [2] . Chemical analysis was performed to estimate the pH, organic carbon content, available nitrogen, potassium, and phosphorus using the standard methods as followed in [3] .

Next generation sequencing and metagenomic analysis
Metagenomic DNA extraction from the sieved soils was carried out using the PowerSoil TM DNA isolation kit (MOBIO), following which the sample was sequenced using the manufacturer's protocol. DNA quality was analyzed by Nanodrop and then evaluated on agarose gel. It was quantified using QUBIT. The library preparation was carried out using Illumina standardized V3-V4 regions of the 16S rRNA gene library protocol. The enriched library was further quantified and validated using qPCR and Agilent Bioanalyzer (DNA 10 0 0 chip). The library generated containing V3-V4 amplicons [Primer Details: 16S Amplicon PCR Forward Primer = 5 TCGTCGGCAGCGTCAGATGTGTATAAGAGACAGCCTACGGGNGGCWGCAG;16S Amplicon PCR Reverse Primer = 5 GTCTCGTGGGCTCGGAGATGTGTATAAGAGACAGGACTACHVGGGTATCTAATCC and adaptor sequences: Forward overhang: 5 TCGTCGGCAGCGTCAGATGTGTATAAGAGACAG (locus specific sequence) Reverse overhang: 5 GTCTCGTGGGCTCGGAGATGTGTATAAGAGACAG(locus specific sequence) 341F = CCTACGGGNGGCWGCAG and 805R = GACTACHVGGGTATCTAATCC] was sequenced on Illumina MiSeq using reagent kit V3 for generating 2 × 300 bp read length. The sequenced raw data files were passed through Quality check using the FASTQC pipeline, and sequences which passed the quality screening were then used for assembly using the SILVSngs (1.3) platform. It involved homopolymer removal [4] along with discarding of artifacts and contaminations [5] . Qiime was used to cluster the operational taxonomic units (OTUs), and KRONA charts were generated to analyze the microbial abundances. The pipeline followed was in accordance with the methods described in [6] and [7] .

Declaration of Competing Interest
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.