Whole genome sequence data of Streptomyces californicus TBG-201, a chitinolytic actinomycete isolated from the Vandanam sacred groves of Alleppey District, Kerala, India

This study presents the complete genome sequence of Streptomyces californicus TBG-201 isolated from the soil samples of Vandanam sacred groves in Alleppey District, Kerala, India. The organism has potent chitinolytic activity. The genome of S. californicus TBG-201 was sequenced using the Illumina HiSeq-2500 platform with 2 × 150bp pair-end protocol and assembled using Velvet version 1.2.10.0. The assembled genome has a 7.99 Mb total length, a G+C content of 72.60%, and 6683 protein-coding genes, 116 pseudogenes, 31 rRNAs, and 66 tRNAs. AntiSMASH analysis revealed abundant biosynthetic gene clusters, while the dbCAN meta server was used to detect carbohydrate-active enzyme coding genes. The NCBI Prokaryotic Genome Annotation Pipeline was used for genome annotation. The presence of numerous genes coding for chitin degradation indicates the chitinolytic ability of this strain. The genome data have been deposited in NCBI with the accession number JAJDST000000000.


a b s t r a c t
This study presents the complete genome sequence of Streptomyces californicus TBG-201 isolated from the soil samples of Vandanam sacred groves in Alleppey District, Kerala, India. The organism has potent chitinolytic activity. The genome of S. californicus TBG-201 was sequenced using the Illumina HiSeq-2500 platform with 2 × 150bp pair-end protocol and assembled using Velvet version 1.2.10.0. The assembled genome has a 7.99 Mb total length, a G + C content of 72.60%, and 6683 protein-coding genes, 116 pseudogenes, 31 rRNAs, and 66 tRNAs. AntiSMASH analysis revealed abundant biosynthetic gene clusters, while the dbCAN meta server was used to detect carbohydrate-active enzyme coding genes. The NCBI Prokaryotic Genome Annotation Pipeline was used for genome annotation. The presence of numerous genes coding for chitin degradation indicates the chitinolytic ability of this strain. The genome data have been deposited in NCBI with the accession number JAJDST0 0 0 0 0 0 0 0 0.

Value of the Data
• The isolate S. californicus TBG-201 is a potent chitinase producer, which makes it a significant candidate for biotechnological applications. The genome contains genes coding for chitin degradation. The presence of the GH19 chitinase gene shows that it can produce family-19 chitinases, which are very similar to plant chitinase-C. Chitinase-19 has received much attention recently because of its potential use in the biocontrol of phytopathogens like insects and fungi. • Thirty-five biosynthetic gene clusters were identified from the genome using AntiSMASH, which suggests the potential of the organism to produce a wide range of secondary metabolites. Various carbohydrate-active enzymes were identified in the genome by CAZy analysis which provides an understanding of the organism's carbohydrate metabolism and potential biotechnological applications. The genome data can be used for elucidating specific genomic and functional analysis. • Whole genome sequence data of S. californicus TBG-201 can benefit researchers and scientists for functional genomics and enzyme research. The data also provide insights for the researchers on the potential applications of S. californicus TBG-201. • The genome sequence data of S. californicus TBG-201 can be primarily used for research on various biotechnological applications. The presence of several gene clusters, genes for chitin degradation, and other carbohydrate-active enzymes in the genome indicates the organism's ability to produce numerous secondary metabolites and degrade chitin and other complex carbohydrates which may be experimentally studied.

Objective
S. californicus TBG-201 was isolated in our laboratory from the soil samples of Vandanam sacred groves of Alleppey District in Kerala and was found to be a potent chitinase producer. The organism's whole genome was sequenced to understand better the genetic basis of the isolate's chitinolytic activity. The genome assembly was annotated using NCBI PGAP to identify the protein-coding genes, rRNAs, tRNAs, and pseudogenes. The biosynthetic gene clusters were identified using antiSMASH, which suggested the potential of the organism to produce a broad spectrum of secondary metabolites. The genes for carbohydrate-active enzymes were identified using CAZy analysis. Overall, the generation of this dataset was motivated by the need to understand the genetic basis of the chitinolytic activity of S. californicus TBG-201, which has potential biotechnological applications.

Data Description
Whole genome sequence data of the chitinolytic actinomycete, S. californicus TBG-201, is reported here. The pre-processing of data after quality control gave 3,976,878 reads with 555.71MB of base pairs for R1 and 503.25MB of base pairs for R2. The de novo assembly resulted in 50 scaffolds, 129 contigs, and an N50 value of 154,990. Velvet assembly was done using a k-mer value of 79, resulting in a genome with 7,994,281 base pairs with a genome coverage of 99.5x. The BUSCO score was C: 95.3% (S: 93.9%, D: 1.4%, F: 0.7%, M: 4.0%, N: 148). The sequence was deposited in GenBank under the accession number JAJDST0 0 0 0 0 0 0 0 0. The functional annotations and gene predictions using the NCBI prokaryotic genome annotation pipeline are available at GenBank. The general features of the genome assembly are given in Table 1 . The genes coding for proteins associated with chitin degradation in the S. californicus TBG-201 genome, as obtained from NCBI PGAP annotation, are shown in Table 2 .
The annotation of the constitutive modules of CAZymes from the gene sequence is primarily used to assess and identify an organism's capacity to produce complex carbohydrate-degrading enzymes. The meta server dbCAN combines three cutting-edge tools for CAZome annotation: (i) HMMER search against the dbCAN HMM (hidden Markov model) database; (ii) DIAMOND search against the CAZy pre-annotated CAZyme sequence database; and (iii) Hotpep search against the conserved CAZyme short peptide database. The three methods' outputs were combined to get the best possible results from automated CAZyme annotation. Only the ones detected by at least two methods were selected and given in Table 3 .
Thirty-five biosynthetic gene clusters, including those for antibiotics, melanin, antifungal compounds, siderophore, geosmin, carotenoid, osmolyte, and terpenes, were identified using the AntiSMASH tool ( Table 4 ). Many of them codes for secondary metabolites that have less than 20% similarity to known compounds. That indicates the novelty of metabolites offering the possibility of discovering new bioactive compounds.    Auxiliary activity AA10 (5) The neighbor-joining tree based on 16S rDNA gene sequences shows that the strain TBG-201 is highly similar to S. californicus strain FDAARGOS 1210 ( Fig. 1 ). To confirm the taxonomic identity of strain TBG-201, digital DNA-DNA hybridization (dDDH) was done. The dDDH values d4 for S. puniceus strain DSM 40083 and S. floridae NRRL 2423 are 88.4% for both. S. puniceus [1] and S. floridae [2] are synonyms for S. californicus [3] . The strain TBG-201 (JAJDST0 0 0 0 0 0 0 0 0) belongs to the known species S. californicus ( Figs. 2 and 3 ) . The average nucleotide identity (ANI) value of S. californicus TBG-201 was found to be 98.65% with S. californicus strain FDAARGOS_1210 and 97.69% with Streptomyces sp. CB04723, the closest phylogenetic neighbors. These values are higher than the generally accepted species threshold level of 96%, indicating that the strain TBG-201 (JAJDST0 0 0 0 0 0 0 0 0) belongs to the known species S. californicus .  [4] . The bootstrap consensus tree inferred from 10 0 0 replicates represents the evolutionary history. Next to each branch is the percentage of replicate trees in which the related taxa grouped together in the bootstrap test (10 0 0 replicates) [5] . The evolutionary distances were calculated using the Jukes-Cantor method [6] and are in the units of the number of base substitutions per site. The analysis involved 17 nucleotide sequences. Codon positions included were 1st + 2nd + 3rd and noncoding. For each sequence pair, the ambiguous positions were eliminated. There were a total of 1474 positions in the final dataset.

Culture maintenance
S. californicus TBG-201 was grown and maintained on ISP2 agar media (Yeast extract Malt extract agar) at 28 ± 2 °C. Stock cultures were maintained at -80 °C in a 50% glycerol stock.

Genomic DNA extraction
S. californicus TBG-201, grown in YEME Medium with 34% sucrose and 0.5% glycine, was used to isolate high molecular weight genomic DNA for whole genome sequencing. The organism was incubated at 28 ± 2 °C at 180 rpm for five days, and the genomic DNA was extracted using the CTAB method [8] .

Phylogenetic and comparative genomic analysis
The gene sequence encoding the 16S rDNA of S. californicus TBG-201 was retrieved from Gen-Bank. The NCBI BLAST tool ( https://blast.ncbi.nlm.nih.gov/Blast.cgi ) was used to retrieve closely related sequences from GenBank, and similar sequences were then aligned using the ClustalW. MEGA6 was used to construct the evolutionary tree [16] . Type Strain Genome Server (TYGS) ( http://tygs.dsmz.de ) was used for whole genome-based taxonomy analysis [17] . The average Nucleotide Identity (ANI) value was calculated using CJ Bioscience's online Average Nucleotide Identity calculator that uses the OrthoANIu algorithm ( https://www.ezbiocloud.net/tools/ani ) [18] .

Ethics Statements
Not applicable.

Declaration of Competing Interest
The authors of this paper state that they do not have any financial or personal interest that could have influenced their work or created a conflict of interest.