Microbial Carboxylesterases: An Insight into Thermal Adaptation Using In Silico Approach

Carboxyl ester hydrolases are diverse group of enzymes that catalyse the hydrolysis of broad range of substrates. Carboxylesterase (CE’s) is a lipolytic enzyme of α/β hydrolase protein fold superfamily which also have functionally different enzymes like proteases, lipases, esterases, peroxidases, dehalogenases, epoxide hydrolases, etc. CE’s are also known as true esterases and are nonspecific in their action because of their affinity towards wide and overlapping substrates [13]. Their broader substrate tolerance leads to assumption that they have evolved to enable access to carbon sources and show their role in the catalytic pathways [4]. Plants, animals and microorganisms accomplish ester bond hydrolysis with the help of these enzymes [5]. On the basis of source from where these are obtained, they are classified into carboxylesterase, arylesterase, cholesterol esterase, cocaine esterase, feruolyl esterases, acetylcholine esterase, tenin esterase [6]. Initially microbial CE’s were classified into eight families on the basis of sequence homology but later on these enzymes were further classified into 21 families. These enzymes act on small and water soluble esters [7,8]. Esterase catalysed reactions are useful in many industrial processes like in pharmaceuticals, dairy, beverages and perfume industry. Esterases do not require cofactor that make them attractive biocatalysts in industries [5] and their high regio and stereo selectivity helps them to be efficient biocatalysts in the synthesis of optically active form of fine chemicals [4]. Many clinically important drugs are subject to catalysis by CE’s hydrolysis like cocaine, aspirin, heroin, etc. Bacterial esterases especially CE’s plays an important role in the efficient hydrolysis of neurotoxic organophosphates (OP’s). This detoxification reaction proves beneficial in environmental monitoring [9,10]. Esterases have been obtained from all environmental conditions from normal to extreme. However, less characterized and sequenced data have been obtained from hyperthermophiles and most of them have been obtained from metagenomics samples. Since enzymes have wider range of applicability so require careful investigation of physiochemical properties as well as amino acid composition is required. It can be done experimentally or using in silico approach. Although, number of studies have been performed till date to differentiate mesophiles and thermophiles but these enzymes are still lacking comparative analysis at sequence level. The present study has been carried out to compare ten carboxylesterases at primary level using in silico approach and to identify the amino acids which contribute towards thermal adaptation of carboxylesterases.


Introduction
Carboxyl ester hydrolases are diverse group of enzymes that catalyse the hydrolysis of broad range of substrates. Carboxylesterase (CE's) is a lipolytic enzyme of α/β hydrolase protein fold superfamily which also have functionally different enzymes like proteases, lipases, esterases, peroxidases, dehalogenases, epoxide hydrolases, etc. CE's are also known as true esterases and are nonspecific in their action because of their affinity towards wide and overlapping substrates [1][2][3]. Their broader substrate tolerance leads to assumption that they have evolved to enable access to carbon sources and show their role in the catalytic pathways [4]. Plants, animals and microorganisms accomplish ester bond hydrolysis with the help of these enzymes [5]. On the basis of source from where these are obtained, they are classified into carboxylesterase, arylesterase, cholesterol esterase, cocaine esterase, feruolyl esterases, acetylcholine esterase, tenin esterase [6]. Initially microbial CE's were classified into eight families on the basis of sequence homology but later on these enzymes were further classified into 21 families. These enzymes act on small and water soluble esters [7,8]. Esterase catalysed reactions are useful in many industrial processes like in pharmaceuticals, dairy, beverages and perfume industry. Esterases do not require cofactor that make them attractive biocatalysts in industries [5] and their high regio and stereo selectivity helps them to be efficient biocatalysts in the synthesis of optically active form of fine chemicals [4]. Many clinically important drugs are subject to catalysis by CE's hydrolysis like cocaine, aspirin, heroin, etc. Bacterial esterases especially CE's plays an important role in the efficient hydrolysis of neurotoxic organophosphates (OP's). This detoxification reaction proves beneficial in environmental monitoring [9,10]. Esterases have been obtained from all environmental conditions from normal to extreme. However, less characterized and sequenced data have been obtained from hyperthermophiles and most of them have been obtained from metagenomics samples. Since enzymes have wider range of applicability so require careful investigation of physiochemical properties as well as amino acid composition is required. It can be done experimentally or using in silico approach. Although, number of studies have been performed till date to differentiate mesophiles and thermophiles but these enzymes are still lacking comparative analysis at sequence level. The present study has been carried out to compare ten carboxylesterases at primary level using in silico approach and to identify the amino acids which contribute towards thermal adaptation of carboxylesterases.

Abstract
Carboxylesterases (CE's), are a group of esterases that catalyze the hydrolysis of carboxylic ester molecule to form alcohol and carboxylic acid in the presence of water. In silico analysis of ten carboxylesterases (E.C. 3.1.1.1) sequences from mesophilic and thermophilic organisms for various physiochemical parameters and amino acid comparison has been done. Ten sequences, five from each group were retrieved from NCBI and were aligned using multiple sequence alignment tool MUSCLE. The phylogenetic relationship between the two groups has been found using maximum parsimony method of MEGA-6. These sequences were further analyzed using online ProtParam ExPASy tool for some important physicochemical properties. Multiple sequence alignment (MSA) showed the presence of conserved catalytic triad S157, D254, H284. Maximum parsimony method using MEGA-6 distinguished the mesophilic and thermophilic esterases into their respected subgroups. The two groups showed significant variation in their physical and chemical properties. Theoretical pI and negatively charged residues were found to be significant in the present study. Amino acid Gln (Q), Val (V) showed significant statistical variation with 1.7 and 1. 22038182), Thermoproteus uzoniensis 768-20 (gi-327310723), Paenibacillus sp.OSY-SE (gi-518254251). Details of organisms and their protein sequences vis-à-vis their stability, specificity and reactivity were subjected to manual verification from BRENDA (http://www.brendaenzymes.org/) and ESTHER (Annotated and published informations on esterase sequences). BlastP was performed to look for the other homologues of these sequences in sequence databases. MUSCLE of European Bioinformatics Institute (EBI) was used to generate Multiple Sequence Alignment (MSA) of all the sequences for conserved catalytic triad and other important amino acids. Maximum parsimony method from MEGA-6 was used to reveal the relationship between the two groups of carboxylesterases sequences.
Physiochemical analysis was done using online tool ProtParam at the ExPASy server (the proteomic server of Swiss Institute of Bioinformatics). FASTA sequence format was provided for the calculation of some important physiochemical properties and for twenty amino acids. Number of amino acids, molecular weight (kDa) and pI values were deduced by using Compute pI/Mw tool, and the atomic compositions, values of instability index, aliphatic index and grand average of hydropathicity (GRAVY) of bacterial carboxylesterases protein sequences were derived.

Statistical analysis
An analysis of variance (ANOVA) was conducted on various physiochemical parameter variables for each study with the statistical packages 'Assistat version-7.7 beta 2015'. F-test was used to determine the statistical significance. When significant effects were detected, a tukey test was applied for all pairwise comparisons of mean responses.

Results
A comparison of ten carboxylesterase sequences from both mesophilic and thermophilic groups showed significant differences in physiochemical parameters and amino acid residues. Multiple sequence alignment (MSA) showed the presence of conserved catalytic triad of S157, D254 and H284 ( Figure 1). Ser (S), an important residue responsible for catalytic activity, was found to be fixed in the pentapeptide GDSAG motif. Evolutionary relationship using maximum parsimony method from MEGA-6 ( Figure 2) showed the relationship between the mesophilic and thermophilic organisms and distinguished them into their respective groups. The results of physiochemical parameters and amino acid analysis are presented in Tables 1a and 1b. Theoretical pI and negatively charged residues were found to be statistically significant (p<0.05). Negatively charged residues (Asp+Glu) equally contributed towards the stability of the two groups and were found to be equal in number. The amino acids Gln (Q), Val (V) exhibited significant statistical variations of (1.7) and (1.5) respectively. Other amino acid residues, viz. Asn (N), Gln (Q), Cys (C), Lys (K), Thr (T) and Trp (W) were found to be 1.4, 1.7, 1.4, 1.2, 1.3 and 1.4 fold higher in mesophiles as compared to thermophiles, whereas, Ala (A), Arg (R), Ile (I), Glu (E), Tyr (T), Val (V) were found to be 1.1, 1.5, 1.3, 1.2, 1.2, 1.5 fold higher in thermophiles in comparison to mesophiles. The amino acid Cys (C), an important parameter in the calculation of extinction coefficient was found to be 1.4 fold higher in mesophiles, which contributed to their stability [11].

Discussion
In the present study, multiple sequence alignment tool (MUSCLE) with improved accuracy and speed showed the presence of conserved catalytic triad of S (157), D (254), H (284) [12,13]. Two Glycine (Gly) residues G-85; 86 remain conserved in sequence motif HGGGF important in transition between the steps during hydrolysis.Tree constructed from maximum parsimony method of MEGA-6 showed that all the ten sequences are closely related irrespective of their mesophilic and thermophilic origin [14]. Remarkable variations in the composition of the amino acids and physiochemical properties between mesophilic and thermophilic proteins determine the primary basis of adaptation of microorganisms in different conditions [15][16][17][18][19].
Thermophilic carboxylesterase sequences showed smaller amino acid length as compared to mesophilic counterpart (average 310.4 vs 324.6) which help these sequences to withstand tension exerted by high temperature in the present study [19]. Theoretical pI was statistically significant which lies between pH 5.0-7.8 which is slightly acidic, responsible for better solubility and thermophilic behaviour of esterases [20]. Negatively charged residues were found to be equally significant in both the groups in the present study. Total count of negatively charged acidic residues, i.e., Asp (D) and Glu (E) are higher in thermophilic proteins which is related to acidic nature of thermophilic esterases [19,21,22] and also supports the fact that charged residues contribute to ionic interactions by forming salt bridges which help a protein to maintain its integrity at high temperature [16,23,24].
In the present study thermophilic esterases showed a substantial increase in nonpolar amino acids, i.e., Ala (A), Val (V) [17] responsible for structural and functional regulation of esterases [19]. Statistically significant amino acid Gln (Q) having polar uncharged side chain is found to be higher by 1.7 fold in case of mesophilic esterases as they tend to undergo oxidation and deamination at elevated temperatures [18]. A significant decrease of Gln (Q) is recorded in thermophilic esterases compared to mesophilic esterases. Valine (V), a hydrophobic non-polar and neutral amino acid was significant with 1.4 fold higher in thermophilic esterases which was reported to have high stabilizing effect within hydrophobic regions [25]. Any change or substitution of valine (V) was found to disrupt packing by producing a small cavity and reducing hydrophobicity which lead to destabilization effect and disproportionately large influence on enzyme function [26].
On the other hand, charged amino acids i.e. Arg (R) and Glu (E) are found to be 1.5 and 1.2 fold higher in thermophilic esterases. Guanidinium group of Arg (R) helps in large number of electrostatic interactions contributing to the protein thermostability assisting in salt bridge formation and hydrogen-bonds around protein surfaces [27]. Arg (R) and Glu (E) were also found to reduce the free energy during folding and increase internal hydrophobicity. Glutamic acid (E) is preferentially located on the surface of the proteins which forms ion pairs leading to thermophilicity of esterases [28]. In addition to Arg (R) and Glu (E), charged amino acid Lys (K), is found to be 1.2 fold higher in mesophiles. Being chemically similar to Arg (R), Lys (K) plays a significant role both in protein stability as well as in stabilising the native structure of protein. Side chain of Lys (K) helps in the formation of large number of conformations in the folded state and stabilizes the native state of protein [29][30][31]. Tyr (Y), a non-significant aromatic amino acid residue is found to be higher (1.2 fold) in thermophilic esterases which provides stability and is involved in cation-pi interaction with Lys (K) in thermophilic proteins. Esterase sequences have high content of Tyr (Y) [31][32][33]. Protein with high content of polar amino acid residues Gln (Q), Asn (N), Thr (T), Arg (R) and Glu (E) have affinity toward water loving substrates. Polar and especially charged amino acids are hydrophilic and are commonly present at the surface of water soluble proteins where they play role both in the protein solubility in water and also form binding sites for charged molecules.  Presence of 1.4 fold higher Cysteine (C) in mesophiles plays a dual role by both increasing thermostability by forming disulphide bridges and decreasing thermostability. When available in free form large sulphur atom in its side chain resist it to fit into the compact structure of the protein. The disulfide bridges between the cysteine residues prevents the thermal denaturation of protein at high temperature and also help in oligomerisation [17,18,28,34,35]. Though, the present analysis showed that cysteine and methionine to be statistically insignificant, but are very much responsible for the mesophilicity of esterases. Trp (W), a bulky hydrophobic residue was found to be 1.4 fold higher in mesophilic esterases. Trp (W) an amphipathic residue occurs with similar proportions in both mesophilic and thermophilic proteins. It is involved in both short and long range interactions, which tend to stabilize the tertiary structure of the protein [23,36].

Conclusion
Statistical investigations and fold value differentiate mesophilic and thermophillic esterase sequences. The two groups show significant   differences in their physiochemical parameters and amino acid residues. Theoretical pI and negatively charged residues were found to be significant. The presence of higher fold of amino acid residues in one or other group clearly demarcates thermophiles from the mesophiles. Thus in silico study differentiates the two groups of mesophiles and thermophiles and helps us to understand protein thermostability at primary level. This knowledge is again useful in engineering these enzymes in relation to their usability in industry and also for their medicinial use.