Whole genome sequencing data of native isolates of Bacillus and Trichoderma having potential biocontrol and plant growth promotion activities in rice

Six native isolates of Trichoderma and Bacillus having potential for biocontrol and plant growth-promoting activities in rice were isolated from different rice growing regions of India. These isolates were screened for their efficiency in both in vitro and in vivo conditions for three years. The identity of the isolates was confirmed both by morphological and molecular characterization. Three Bacillus spp. viz., Bacillus velenzensis strain BIK2, Bacillus cabrialesii strain BIK3 and Bacillus paralicheniformis strain BIK4 and Trichoderma spp. viz., Trichoderma asperellum strain TAIK1, and T. asperellum strain TAIK5, native to the Telangana state, in Southern India except for strain TAIK4 (Rewa district in the state of Madhya Pradesh in Central India). These promising isolates were subjected for whole genome sequencing using the Illumina platform and data was presented. The data was emanated for Trichoderma asperellum (TAIK1), Trichoderma asperellum (TAIK4), Trichoderma asperellum (TAIK5), Bacillus velezensis (BIK2), Bacillus cabrialesii (BIK3) and Bacillus paralicheniformis (BIK4) isolates had an average 100X coverage of 109X, 150X and 116X; 1447X, 905X and 585X respectively. Further studies on the annotation of the data obtained in correlation with the lab and field performance of these microbes would enable them to be used in metagenomics studies to compare their performance under natural conditions with different microbiota and popular rice varieties. Bioformulation of these strains would be more appropriate with the availability of this genomic data.


a b s t r a c t
Six native isolates of Trichoderma and Bacillus having potential for biocontrol and plant growth-promoting activities in rice were isolated from different rice growing regions of India. These isolates were screened for their efficiency in both in vitro and in vivo conditions for three years. The identity of the isolates was confirmed both by morphological and molecular characterization. Three Bacillus spp. viz., Bacillus velenzensis strain BIK2, Bacillus cabrialesii strain BIK3 and Bacillus paralicheniformis strain BIK4 and Trichoderma spp. viz., Trichoderma asperellum strain TAIK1, and T. asperellum strain TAIK5, native to the Telangana state, in Southern India except for strain TAIK4 (Rewa district in the state of Madhya Pradesh in Central India). These promising isolates were subjected for whole genome sequencing using the Illumina platform and data was presented. The data was emanated for Trichoderma asperellum (TAIK1), Trichoderma asperellum (TAIK4), Trichoderma asperellum (TAIK5), Bacillus velezensis (BIK2), Bacillus cabrialesii (BIK3) and Bacillus paralicheniformis (BIK4) isolates had an average 100X coverage of 109X, 150X and 116X; 1447X, 905X and 585X respectively. Further studies on the annotation of the data obtained in correlation with the lab and field performance of these microbes would enable them to be used in metagenomics studies to compare their performance under natural conditions with different microbiota and popular rice varieties. Bioformulation of these strains would be more appropriate with the availability of this genomic data.

Value of the Data
• This whole genome sequence data of six isolates of native biocontrol agents viz. , three Bacillus and three Trichoderma isolates serve as an important source towards an understanding of these bioagents which suppress the plant pathogens like Rhizoctonia solani and Xanthomonas oryzae pv. oryzae in rice and in addition induces plant growth promotion in rice.
• The data is useful in the annotation of the genes involved in the pathways of enzymes, effector proteins and metabolites/alkaloids, involved in the bioagent-host plant-pathogen interactions from the perspective of these antagonistic bioagents • The data provides valuable information on these native bioagents and enables their efficient use by all the stakeholders including the biopesticide industries to use them as biocontrol agents and as biofertilizers in sustainable eco-friendly cultivation of rice. The genomic data of these potential bioagents submitted will help in the breeding of cultivars that respond well to the bioagents when applied. For instance, TAIK1 application on 30 th day of transplantation released growth promoting substances and also suppress the infection induced by R. solani and S. oryzae. It has also been reported that the bioagents application needs to be standardised for different varieties [1] .

Data Description
Biological control is the process of using friendly bioagents or their products to suppress the pathogens leading to the sustainable integrated management of plant diseases [2] . Species belonging to the genera Trichoderma, Bacillus and Pseudomonas are more commonly found in the plant rhizosphere that helps in the growth promotion of the plants and induces resistance/tolerance against biotic and abiotic stresses. Members of the genus Bacillus , a common soil saprophytic gram-positive bacterium and Trichoderma a saprophytic fungus in rhizosphere soil, are used for their plant growth promotion and biocontrol qualities that make them a better alternative to chemical pesticides in long term use [3] .
In this manuscript, we report the whole genome sequencing (WGS) data of three Bacillus isolates (BIK2, BIK3 and BIK4) and Trichoderma isolates (TAIK1, TAIK4 and TAIK5) collected from different states of India using standard dilution method [4] . The geographic data of the sampling sites and the origin of the isolates are represented as Fig. 1 . Detailed statistics of three Bacillus isolates viz., BIK2, BIK3 and BIK4 and three Trichoderma isolates viz., TAIK1, TAIK4 and TAIK5 were presented in Tables 2 and 3 .

Culture and DNA extraction
Bacillus and Trichoderma isolates were obtained from the rice rhizosphere of different regions of India, using the standard serial dilution method ( Fig. 1 ). Trichoderma specific medium (TSM) and peptone yeast extract medium (PYEM) was used as a selective medium for the isolation and purification of fungal and bacterial antagonists, respectively [4] . Key morphological and microscopic characters were used for the identification of Trichoderma isolates [5] and Bacillus isolates [6] ( Fig. 2 ; Table 1 ). For whole genome sequencing, genomic DNA from the three Bacillus and three Trichoderma strains were isolated using DNA isolation kit NucleoSpin® microbial DNA kit as per the manufacturer's protocol (Macherey-Nagel, Germany). The DNA libraries for Whole Genome Sequencing were processed using standard protocols and sequenced using the HiSeq 2500 instrumentation platform (Agri Genome Labs Private Limited, Kochi, India).

Whole genome sequencing
Whole Genome Sequencing (WGS) of three Bacillus isolates resulted in 20, 274, 842; 12, 674, 497 and 17, 571, 991 raw reads for BIK2, BIK3 and BIK4 respectively. The quality of raw sequence reads were assessed using Fast QC and then pre-processed using AdapterRemovalV2 version 2.3.1 tool [7] ( Fig. 3 ) generating 20,260,548; 12,667,151 and 17,551,922 clean reads for BIK2, BIK3 and BIK4 with an average read length of 150 bp respectively, representing coverage of 1447X, 905X and 585X folds. The cleaned reads were de novo assembled using the Unicycler ver. 0.4.8 assembler [8] and CDSs in the assembled contigs were predicted using prodigal version 2.6.3 [10] . Completeness of the genome assembly was assessed by BUSCO ver. 4.0.6 [9] and quality of the genome assembly was assessed by QUAST ver. 4.6 [10] . Protein encoding genes were predicted using Prodigal ver. 2.6.3 [11] .
For the Trichoderma strains TAIK1, TAIK4 and TAIK5, a total of 15, 230, 394; 16, 467, 915 and 20, 615, 262 raw reads were generated and the quality of these raw sequence reads were assessed using Fast QC and then pre-processed using AdapterRemovalV2 version 2.3.1 tool [7] ( Fig. 4 ) resulting in 11,502,933; 14,374,041; 18,498,253 clean reads respectively with an average read length of 150 bp, representing coverage of 109X, 150X and 116X folds. De novo assembly was performed using the Velvet assembler version 1.2.10 ( https://angus.readthedocs.io/en/2016/ week3/LN _ assembly.html ) and CDSs in the assembled contigs were predicted using Augustus assembler version 3.4.0 ( http://bioinf.uni-greifswald.de/augustus/ ). Completeness of the genome assembly was assessed by BUSCO ver. 4.0.6 [8] and quality of the genome assembly was assessed by QUAST ver. 4.6 [10] . Protein encoding genes were predicted using Prodigal ver. 2.6.3 [11] . Organism annotation was determined from the predicted genes which were compared with the Uniprot database using BlastX version 2.6.0 ( ftp://ftp.ncbi.nlm.nih.gov/blast/executables/blast+/ ) program with E-value cut offset to 10 −3 and subsequent filtering was done for the best hits based on the query coverage, identity and similarity score.

Funding Information
This work was supported by ICAR-Indian Institute of Rice Research, Hyderabad, India.

Ethical Statement
Not applicable.

Declaration of Competing Interest
The authors declare that they have no known competing financial interests or personal relationships which have or could be perceived to have influenced the work reported in this article.

Supplementary materials
Supplementary material associated with this article can be found in the online version at doi: 10.1016/j.dib.2022.107923 .