Molecular characterization, phylogenetic and in silico sequence analysis data of trehalose biosynthesis genes; otsA and otsB from the deep sea halophilic actinobacteria, Streptomyces qinglanensis NIOT-DSA03

Trehalose, a non-reducing disaccharide (α-D-glucopyranosyl-(1→1)-α-D-glucopyranoside) is a natural compound, which serves as a protective substance in halophilic bacterial cells. Trehalose biosynthesis genes (otsA and otsB) were PCR amplified from the genomic DNA of deep sea actinobacteria, Streptomyces qinglanensis NIOT-DSA03. The amplified genes were cloned and nucleotide sequences were determined. In silico sequence and phylogenetic analysis of nucleotides and amino acids of otsA and otsB sequences of S. qinglanensis were also determined. The experimental data described in this study will be helpful to develop a recombinant expression system to produce trehalose for biotechnological applications.


Specifications
Applied

Value of the Data
• The dataset describes the importance of major osmolyte, trehalose in protecting the proteins and cellular membranes in prokaryotes from inactivation or denaturation by the environmental stress. • The dataset provides information about the osmolyte in Streptomyces qinglanensis NIOT-DSA03 and its application in derma-pharmacy industries. • In this data we have provided the detailed information regarding the gene sequence and its protein structure. This data may be used for further heterologous gene expression studies.

Data Description
The otsA and otsB genes encode the, α-trehalose-phosphate synthase and trehalose-6phosphate phosphohydrolase respectively. Together these proteins constitute the trehalose biosynthetic pathway. The trehalose biosynthesis genes otsA and otsB were PCR amplified and are encoded by polynucleotides of 1278 bp and 879 bp ( Fig. 1 ). The otsA and otsB genes encodes proteins of 425 and 292 amino acids with calculated molecular masses of 46625, 30228 Daltons ( Fig. 2 a & b). After PCR amplification, the products were purified from the agarose gel and cloned into pDrive cloning vector. The white colonies were selected and screened for the presence of insert by PCR amplification using specific primers, which gave specific product. The recombinant transformants of otsA and otsB genes were also confirmed by double digestion with Sac I and Bam HI restriction enzymes, which released full gene along with flanking region of the vector. The otsA and otsB sequences generated in this study have been deposited in the GenBank database with the accession numbers MN017301 and MN023141. The otsA and otsB sequences from S. qinglanensis NIOT-DSA03 were analyzed with reported amino acid sequences of other actinobacteria viz.  The phylogenetic tree at nucleotide and amino acid level of otsA revealed the phylogenetic similarity of otsA gene from S. qinglanensis NIOT-DSA03 with other organisms. The bacterial species switched to different clusters for otsA gene at nucleotide and amino acid level indicating divergence among the organisms and the degree of divergence in the sequences. S. qinglanensis, S. venezuelae and S. lIncolnensis were grouped in the same cluster in both the phylogenetic trees ( Fig. 3 a). The phylogenetic tree of nucleotide and amino acid sequences of otsB gene also revealed the grouping of S. qinglanensis, S. alfalfa and S. lIncolnensis in a single cluster as that of otsB . In phylogenetic tree analysis, a diverged mode of clustering was observed ( Fig. 3 b).
On phylogenetic analysis, the otsA and otsB genes of S. qinglanensis was found to be highly conserved among the bacterial species. The otsB gene was found to have highest similarity between bacterial species compared to the otsA gene. Based on phylogenetic analysis, S. qinglanensis and S. lincolnensis were found to be clustered together for otsA and otsB genes. The genes involved in the biosynthesis of trehalose in S. qinglanensis and S. venezuelae are comparatively well conserved compared to other bacteria both at nucleotide and amino acid level [2] .
Prediction of secondary structure was performed with the PROTEAN program (Discovery studio 3.5). The secondary structure of otsA and otsB proteins were predicted to have the alphahelical structure with maximum hydrophilic molecules. The prediction analysis also revealed the presence of many acidic amino acids; regions with high antigenicity and very high backbone chain flexibility. Upon analysis of otsA protein, the predicted charge at pH 7.0 was " + 9.54" with the isoelectric point of 5.82. Common amino acids include 14% glutamic acid, 10% leucine, 11% each of alanine, threonine, valine and 17% each of phenylalanine, glycine, proline and glutamine ( Fig. 4 a). In otsB protein, the predicted charge at pH 7.0 was " + 20.14" with the isoelectric point of 5.64. The amino acid composition includes 52% glycine, 41% glutamic acid, 37% leucine, 39% threonine and 29% lysine ( Fig 4 b). This prediction result also showed considerable similarity with the reported trehalose biosynthesis enzymes from actinobacteria.
Three dimensional structure prediction of the trehalose synthase enzyme suggests that the tertiary structure was highly compatible with the secondary structure prediction analysis. The structure was validated using Ramachandran plot and the plot suggested that none of the residues were present in the disallowed region. This deduces that the modelled structure shares high level of similarity with the structures that have been already reported. Homology analysis of trehalose synthase enzyme with Protein Data Base (PDB) revealed the maximum of 100% and minimum of 21% identity with the PDB templates ( Fig. 5 a & b). acc ggc ggc tgg acg ccg gtc gtg ctg cat gtg aag gac gac ttc gcg agg tcg ctg gcc A gcc tac cgg atg gcg gac gtc gcg ctg gtg aac ccc gta cgg gac ggg atg aac ctg gtg V gcc aag gag gtc ccc gtg gtc tcg gac gcg ggc tgc gtg ctg gtg ctc tcc cgg gag gcg A ggc gcc cat gag gaa ctc gcg ccc gac gtg ctg tcc gtg aac ccg ttc gac gtg cgg gag A G atg ctg ggc gcc gac cgc acc gcg ttc ctg acc gcg cgc tgg gcc cgg ctg ttc gcc gag A E tgc tgc gcg cgg gtg ctg ggc gcg acg gtc gag ggc ggc ggc ccg ccg cgc gag gac gcc

Bacterial strain, growth conditions, DNA isolation and plasmids
S. qinglanensis NIOT-DSA03 was isolated from the deep sea sediment sample obtained during the cruise of the Barren Island, Andaman and Nicobar (A & N) Islands in the ocean research vessel Sagar Manjusha. Using box cores at a depth of 1,840 m (12 °12.90 'N, 093 °48.92 E), sediment samples were collected from the seafloor. In the ISP 1 medium, the isolate was grown aerobically and the genomic DNA was isolated following the modified Kutchma et al., procedure. [3] . Using the universal Eubacterial primers, 16S F (5 -ACTCAAGGAATTGACGG-3 ) and 16S R (5 -TACGGCTACCTGTTACGACTT-3 ), the 16S rDNA was amplified by polymerase chain reaction. According to the instructions of the manufacturer in the InsTAclone PCR Cloning Kit, the 16S rDNA amplicon was cloned into a T/A cloning vector (MBI Fermentas, USA). Using the dye termination process, DNA sequencing was carried out on an ABI PRISM 377 genetic analyzer (Applied atg tcg ctt ccg ccg cct tcc gcg cat ccg acg ctg ccc gaa cct gag acc gag gcc ggg M S L P P P S  A  H  P  T  L  P  E  P  E  T  E  A  G  cgc gcc ggg ctc gcc gcc gtc cgc gcc gac ccc gcg cgc acc gtg ctc gca ctc gac ttc  R  A  G  L  A  A  V  R  A  D  P  A  R  T  V  L  A  L  D  F  gac ggc acc ctc gcg ccc atc gtc ggc gat ccg cgg gac gcc cgg gcg cac ccc gag gcg  D  G  T  L  A  P  I  V  G  D  P  R  D  A  R  A  H  P  E  A  gtt ccc gtg ctg gcg cgg ctc gcc ccg cgg ctg gcc ggc gtc gcc gtg atc acc ggc cgc  V  P  V  L  A  R  L  A  P  R  L  A  G  V  A  V  I  T  G R ccg gcg gcg gaa gcc gtc cgg tac ggc ggc ctc gaa ggc gcc gcc gga ctg gag gga ctc L acc gtc ctg ggc gcg tac ggg gcc gaa cgg tgg gac gcg gcg gac ccg gtc gtc cgc gcc A ccc gaa ccc ccg gcc ggg gtg gcc gcg gta cgg gcc gaa ctc ccc ggg ctg ctc cgg gag L R E tcc gaa gcg ccg gaa ggc acc tgg acg gag gac aag ggc cgg gcg ctc gct gtg cac acc T cgg cgt gcg gcc tcg ccc gac gag gcc ctc gac cgg ctg cgg gag ccg ctg tac acg ctg T L gcc gag cgg cac gga ctc gtg gtg gaa ccg ggg cgc atg gtg ctc gaa ctc cgg ccg ccg R P P ggc gcc gac aag ggc gcc gcg ctc acc ggc ttc gtc cgc gag cgg gcc gcc acc gcc gtc A V gtc tac gcc ggt gac gac cgc ggc gat ctg ccc gcc tac gcc gct gtc acg gcc ctg cgc R gcc gag ggc gta ccg gga ctg ctg ctc tac agc gcc ccc gag gcg gag gcc gag gcg gtt A V ccc gag ctc cgc gac gga gcc gac ctc cgg gtc ccg ggc ccg gcc gga ctg gtg gcc tgg Biosystems, USA). In a homology search with the available sequences in GenBank using BLAST provided for pair identities by NCBI, the acquired 16S rDNA sequences were used. As the host strain for transformation in cloning, the Escherichia coli JM109 strain was used and the plasmid pDrive was used as the cloning vector.

Characterization of recombinant plasmids
The recombinant plasmids with pDrive-otsA and otsB gene constructs were double digested with Sac I and Bam HI enzymes. The restriction digestion was carried out with a final volume of  tion enzyme and the remaining Milli Q autoclaved water. In a water bath, reaction mixtures were incubated overnight at 37 °C and the digested products were analyzed on 1.5 percent agarose gel along with 100 bp DNA ladder and documented in the gel documentation system (UVP BioSpectrum Imaging system, USA). The restriction digested trehalose biosynthesis genes were gel eluted, purified and sequenced on an ABI PRISM 377 genetic analyzer (Applied Biosystems Inc., USA).

In silico sequence analysis of trehalose biosynthesis genes
The nucleotide sequences acquired were compared with database sequences using NCBI's BLAST ( http://www.ncbi.nlm.nih.gov ) program and were aligned and clustered using CLUSTALW [5] . In order to measure the percentage identities between nucleotide and amino acid sequences, alignments were imported into the GeneDoc program ( https://genedoc.software.informer.com/ 2.7/ ) and the BioEdit 7.05 program ( www.mbio.ncsu.edu/BioEdit/ ). Using the ProtParam tool ( http://www.expasy.org/tools/protparam.html ), the molecular masses and theoretical pI values of the polypeptides were predicted. Prediction of secondary structure was performed with the PROTEAN program (DNASTAR, Inc., Madison). The three dimensional structure was predicted through homology modelling approach using MODELLER program (Discovery Studio Modeling Environment 4.0, San Diego: Accelrys Software Inc., 2013).

Declaration of Competing Interest
The authors declare no competing interests.