Database Analysis of Acidic Proteins from Halophilic Species and Their Corresponding Basic Proteins from Non-halophilic Species

Aims: To reveal which amino acid residues determine whether a protein is acidic or basic between orthologous pairs, acidic proteins from halophilic species and corresponding basic proteins from non-halophilic species were compared. Similarly acidic versus acidic protein pairs, and basic versus basic protein pairs were also analyzed. Place and Duration of Study: Department of Clinical Laboratory Science, Graduate Course of Medical Science and Technology, School of Health Sciences, Kanazawa University, Japan. Methodology: Halobacterium sp. NRC-1 was used as halophilic species and Gram-positive bacterium Bacillus subtilis, and radiation resistant bacterium Deinococcus radiodurans were used as non-halophilic species. The three species were selected because their proteins were closely Original Research Article Nakashima et al.; BBJ, 14(2): 1-12, 2016; Article no.BBJ.25207 2 related each other. The amino acid compositions were compared and the amino acid substitutions were counted for the orthologous protein pairs between Halobacterium and B. subtilis. Similar comparison was done for the proteins between Halobacterium and D. radiodurans. Results: The Asp and Glu residues are determinant whether a protein of Halobacterium sp. NRC-1 is acidic or basic. Amino acid substitutions to increase the Asp residues in the acidic proteins of Halobacterium from the corresponding proteins of non-halophilic species were almost identical whether the corresponding proteins were acidic or basic. This result suggested that the change of protein charges from basic proteins to acidic ones was same as from acidic proteins to acidic ones. The proteins of Halobacterium showed a tendency to have residues with smaller side chain than the proteins of B. subtilis / D. radiodurans.


INTRODUCTION
There are many ways to classify proteins according to their structures or functions. Proteins can be classified into groups/families based on amino acid sequence similarity [1] or three-dimensional structural similarity [2,3]; globular, membrane or fibrous proteins based on their shape and solubility; all alpha, all beta, alpha/beta or alpha + beta proteins based on the type of secondary structures present [4]; intracellular or extracellular proteins based on their localization. Classification of proteins into acidic, neutral and basic proteins based on their isoelectric point (pI) is possible.
It is reported that some classifications are related to the amino acid compositions, e.g. membrane or globular proteins [5][6][7], folding types [8][9][10][11][12][13][14], intracellular or extracellular proteins [15,16]. Previously, it was noticed that orthologous proteins among prokaryotes have similar pI values. The pI of a protein is determined by the balance of acidic and basic amino acid residues. Therefore, it seems that the balance of acidic and basic residues is important and conserved for the function of a protein.
The proteins from halophilic species are rich in acidic residues such as Asp and Glu [17,18], therefore, the proteins are biased to acidic proteins. The abundance of acidic residues on the protein surface from halophilic species is a key determinant of adaptation to high salt conditions [19,20]. It is considered that the orthologous acidic proteins from halophilic species must be corresponding to the basic proteins from non-halophilic species, and the comparison would reveal the factors between acidic and basic proteins. The genomic sequence of halophilic Halobacterium sp. NRC-1 (Halobacterium) [21] has indicated that the proteins are closely related to the proteins of Gram-positive bacterium Bacillus subtilis [22], or to the proteins of radiation resistant bacterium Deinococcus radiodurans [23].
In this study, comparison was carried out for the orthologous protein pairs between Halobacterium and B. subtilis for their acidic versus basic protein pairs, together with acidic vs. acidic pairs and basic vs. basic pairs. Similar comparison was done for the proteins between Halobacterium and D. radiodurans. The amino acid compositions were compared and the amino acid substitutions were counted in using the pairwise alignments.

Orthologous Proteins
The pI value of a protein was estimated by a program developed in-house. The validity of this program was checked by the comparison of pI values between calculated and experimentally determined proteins. The acidic proteins were selected as pI < 6 and the basic proteins as pI >8. The orthologous protein pairs were identified as the mutual best hit pair in homology search between two organisms using BLASTP program [24]. The orthologous protein pairs between Halobacterium acidic proteins and B. subtilis basic proteins were selected by following procedures. Firstly, the sequence alignments greater than 30% sequence identity longer than 100 residues between a protein pI < 6 of Halobacterium and a protein pI > 8 of B. subtilis were selected. Then, the sequence similarity among selected sequences of Halobacterium was examined. If there were some sequences which have similarity greater than 30%, only one sequence alignment was left and the other alignments were excluded to avoid the bias of sequences. Similarly, the sequence similarity among selected sequences of B. subtilis was examined. The orthologous protein pairs between Halobacterium and B. subtilis for their acidic vs. acidic and basic vs. basic protein pairs were selected by similar procedure. The orthologous protein pairs between Halobacterium and D. radiodurans were selected similarly.

Isoelectric Point Distribution of Proteins
Distribution of pI values of 2075 proteins from Halobacterium and 4105 proteins from B. subtilis is plotted in Fig. 1. The pI profile of Halobacterium was consistent with the previous reports [25,26]. In Halobacterium, more than 84% of proteins were considered as acidic proteins. Neutral and basic proteins were about 4% and 6%, respectively. Other halophilic species such as Halorhabdus utahensis, Halorubrum lacusprofundi and Natronomonas pharaonis also indicated a similar pI distribution. Generally, halophilic species have high G+C content more than 60% in their genome, however, G+C content of Haloquadratum walsbyi is 47.9%. The proteins of H. walsbyi also indicated similar pI distribution with Halobacterium.
The proteins of B. subtilis were classified as 60% acidic, 8% neutral and 30% basic proteins based on the pI distribution. The pI profile was consistent with the previous report [25]. The pI distribution was examined of following bacteria; radiation resistant Deinococcus radiodurans, alkaliphilic Bacillus halodurans, acidophilic Acidobacterium capsulatum, thermophilic Thermus thermophilus, moderately halophilic Vibrio parahaemolyticus. All the species mentioned above indicated similar pI distribution with B. subtilis. To show the pI distribution clearly, only the pI profiles of Halobacterium and B. subtilis are shown in Fig. 1.
It is known that the solubility of a protein is the lowest at the pH of its pI. Generally, the pH in a cell of a bacterium is around pH 7. Therefore, small amount of neutral proteins seemed to be appropriate to avoid precipitation. The acidophilic Sulfolobus acidocaldarius grows optimally in acidic environment at pH 2-3 but maintains the pH in a cell at about 6.5 [27]. The pI distribution of proteins of thermoacidophilic Picrophilus torridus was also examined. The P. torridus Frequency (%) pH optimally grows at pH 0.7 and its intracellular pH is 4.6 [28]. The ratio of basic proteins of torridus was 37%, which was a little higher than other species. Halobacterium has a neutral intracelluar pH 7.2 [29]. According to the pI profile of Halobacterium (Fig. 1), the number of orthologous protein pairs between Halobacterium and B. subtilis was considered to be small for their acidic vs. basic pairs as well as basic vs. basic pairs.

Amino Acid Composition
The selected orthologous protein pairs bet Halobacterium and B. subtilis were 100 for acidic vs. acidic pairs, 53 for acidic vs. basic pairs, and 21 for basic vs. basic pairs. As mentioned the number of orthologous acidic vs. basic pairs and basic vs. basic pairs was not so large. The five representative orthologous protein pairs are listed in Table 1. Ribosomal proteins were rich in the basic proteins of Halobacterium. proteins of B. subtilis were assigned as acidic proteins in Halobacterium (see Table 1).
The amino acid composition of orthologous proteins between Halobacterium and indicated in Table  4 optimally grows at pH 0.7 and its intracellular pH ]. The ratio of basic proteins of P. which was a little higher than has a neutral intracelluar pH 7.2 [29]. According to the pI ( Fig. 1), the number of Halobacterium was considered to be small for their acidic vs. basic pairs as well as basic vs.
The selected orthologous protein pairs between were 100 for acidic vs. acidic pairs, 53 for acidic vs. basic pairs, and As mentioned above, the number of orthologous acidic vs. basic pairs and basic vs. basic pairs was not so large. The ive representative orthologous protein pairs are listed in Table 1. Ribosomal proteins were rich in Halobacterium. Some basic were assigned as acidic (see Table 1).
The amino acid composition of orthologous and B. subtilis is indicated in Table 2. The content of the Asp residue was 9.98% in the acidic proteins of This high content of Asp is consistent with the reports [17,18]. However, the content of Asp was 4.14% in the basic proteins, therefore, the deviation between acidic and basic proteins was 5.84%. Similar deviations of Glu, Arg, and Lys residues were 4.22%, 1.13% and 0.35%, respectively. This result indicated that whether a protein of Halobacterium basic is almost determined by the acidic residues Asp and Glu, and the effect of the basic residues Lys and Arg is very small. In the proteins of subtilis, the Glu residue showed the largest deviation between acidic and basic proteins and Asp residue followed. 18]. However, the content of Asp was 4.14% in the basic proteins, therefore, the deviation between acidic and basic proteins was 5.84%. Similar deviations of Glu, Arg, and Lys residues were 4.22%, 1.13% and 0.35%, respectively. This result indicated that Halobacterium is acidic or acidic residues Asp and Glu, and the effect of the basic residues In the proteins of B. the Glu residue showed the largest en acidic and basic proteins and To clearly show the differences in amino acid and B. subtilis, Halobacterium to were calculated. The residues with ratios >1.30 were considered favorable and ratios <0.77 were considered unfavorable in B. subtilis. The Asp residue indicated the largest ratio 1.64 (9.98/6.07) and the Lys residue had the lowest ratio 0.28 (1.82/6.49) in the acidic vs. acidic protein pairs. The Asp and Ala residues were Halobacterium and the t were less frequently used commonly in the three protein -rich, and the codons of Lys, Ile, Asn and Met are A+T-rich.
The genomic G+C content of Halobacterium is 65.9% and that of B. subtilis is 43.5%. The amino acid bias is consistent with the genomic G+C content. The codon of Asp residue is neutral in G+C content, therefore, the richness of Asp residues in Halobacterium cannot be explained by G+C content.
The pI value of a protein is determined by the balance of positively and negatively charged residues. The Lys, Arg and His residues have potential to possess positive charge at their side chains, and Asp, Glu, Cys and Tyr residues have potential to possess negative charge. At pH 7, the Lys, Arg, Asp and Glu residues are in their fully charged form, and the Cys and Tyr residues are in their uncharged form. The His residue is partly positive as it has pK about 6.5. Usually, the content of His is low, so the effect of His residue was neglected. The content of (Lys + Arg) -(Asp + Glu) was simply calculated, which was negative for acidic proteins and positive for basic proteins (see Table 2). This calculated value was roughly correlated with pI value.
The five representative orthologous protein pairs between Halobacterium and D. radiodurans are listed in Supplementary Table S1. The amino acid composition of orthologous proteins is indicated in Supplementary Table S2. The number of orthologous acidic vs. basic pairs and basic vs. basic pairs was 59 and 15, respectively. In this case, the number of those pairs is not so large too. The genomic G+C content of D. radiodurans is 66.6% and that of B. subtilis is 43.5%. The content of the Lys and the Arg residues depend on the G+C content, therefore, their content was different between two species. However, it was interesting that the sum of positively charged residues, Lys + Arg was almost identical. For example, the sum of Lys + Arg was 9.14% in D. radiodurans, and 9.12% in B. subtilis for their basic proteins, respectively.

Amino Acid Substitutions
Sequence alignments were used to analyze amino acid substitutions. An example of sequence alignment of 50S ribosomal protein L11 among Halobacterium, B. subtilis, and D. radiodurans is shown in Fig. 2

V A G G Q A D P G P P L G P E L G P T P V D V Q A V V Q E I N D Q T E A F D G T E V P V T I E Y E D
B.sub-1

I P A G K A N P A P P V G P A L G Q A G V N V M G F C K E F N A R T A D Q A G L I I P V E I S V Y E D.rad -1 L P A G K A T P A P P V G P A L G Q Y G A N I M E F T K A F N A Q T A D K G D A I I P V E I T I Y A
Halo-2

D G S F S I E V G V P P T A A L V K D E A G F D T G S G E P Q E N F V A D L S I E Q L K T I A E Q K
B.sub-2

D R S F T F I T K T P P A A V L L K K A A G I E S G S G E P N R N K V A T V K R D K V R E I A E T K D.rad -2 D R S F T F I T K T P P M S Y L I R K A A G I G K G S S T P N K A K V G K L N W D Q V L E I A K T K
Halo-3 The Asp residues are the main determinant whether a protein of Halobacterium is acidic or basic, so a possibility of different substitution patterns of Asp residues was examined according to the type of proteins acidic or basic. The replacements from Asp residues of Halobacterium to another residues of B. subtilis were counted. The Asp residues in the acidic proteins of Halobacterium were replaced in order by Glu > Lys > Asn > Ser > Gly in the basic proteins of B. subtilis. The order of substituted residues was same for the acidic proteins of B. subtilis. Similar analysis was done using sequence alignments between Halobacterium and D. radiodurans. The Asp residues in the acidic proteins of Halobacterium were replaced by Glu > Gly > Ala > Arg > Gln both in the basic and acidic proteins of D. radiodurans. These results indicated that substitution patterns of Asp residues of Halobacterium to another residues of B. subtilis / D. radiodurans were almost identical and there were no differences in the substitution patterns of Asp residues between acidic and basic proteins.

DISCUSSION
The difference between acidic proteins and corresponding basic proteins was examined. The pI value of a protein was used to determine whether a protein is acidic or basic. Generally, the pI values are conserved among orthologous proteins of prokaryotes. To examine a large number of acidic vs. basic orthologous protein pairs, acidic proteins from halophilic species and basic proteins from non-halophilic species were employed. Therefore, the difference between halophilic and hon-halophilic proteins might be reflected on the results.
Halophiles can be classified as slightly, moderately or extremely halophilic organisms depending on their optimally growth salt concentration. Halophilic organisms have to adjust osmotic pressure at their salt concentrations they inhabit. There are three ways in adjustment; accumulation of KCl in a cell [26,31], accumulation of organic osmotic solutes [26], and accumulation of acidic proteins with large negative charges. Most of the extremely halophilic organisms accumulate KCl and they have the pI distribution patterns like Halobacterium. Moderately halophilic organisms like Vibrio parahaemolyticus usually accumulate organic osmotic solutes and indicate the pI distribution patterns similar to that of Bacillus subtilis. Albumin is the smallest and most abundant of the human plasma proteins, and plays an important role in osmotic regulation. Albumin has a negative charge of 18 with pI 4.7, and produces a greater osmotic effect than expected for its concentration in plasma [32]. Acidic proteins of Halobacterium have similar character like albumin in terms of pI and negative charge, therefore, it is assumed that they have potential to adjust osmotic pressure. The combination of osmotic pressure adjustment is possible.
It is known that the hydrophobic Leu, Ile and Val residues are mostly found in interior regions of globular proteins, and hydrophilic Glu, Asp, Lys and Arg residues are mostly found in surface regions. The proteins of Halobacterium showed the preference of both hydrophobic and hydrophilic residues with smaller side chain volume. The small side chain volume of hydrophobic residues in interior of proteins may lead to compact shape. The compact size of proteins might be stable in high salt medium. This assumption need to be validated. The meaning of the small side chain volume of hydrophilic residues in surface regions is not clear. Halophilic malate dehydrogenase tetramer is wider than the similar dogfish lactate dehydrogenase. This is because the large excess of acidic residues on the surface of halophilic enzyme yield negative charge repulsion of interdimer surface [19].
It is considered that the difference of the side chain size in a sequence may be reflected on the molecular weight. The molecular weights of the two sequences in the alignments were compared adjusting the length. The molecular weights of the proteins of Halobacterium were a bit lower than those of B. subtilis in the acidic vs. basic protein pair alignments, acidic vs. acidic pairs, and basic vs. basic pairs. Similarly, the molecular weights of Halobacterium were a bit lower than those of D. radiodurans. When compared the whole sequences, the molecular weights of proteins of Halobacterium were larger than the corresponding proteins of B. subtilis. This is because the lengths of proteins of Halobacterium were longer than the corresponding proteins of B. subtilis. The proteins of Halobacterium are compact than the corresponding proteins of B. subtilis or D. radiodurans when compared with the same length.
In this study, orthologs between Halobacterium and B. subtilis together with Halobacterium and D. radiodurans were compared. The three organisms belong to different taxonomy and they are remotely located on the phylogenetic tree [33,34]. However, they share considerable sequence similarity [21], even though the genomic G+C content differs considerably; 65.9% in Halobacterium, 43.5% in B. subtilis and 66.6% in D. radiodurans. Amino acid composition is affected on G+C content [7,35]. It is reported that the sequence similarity of orthologous proteins among Halobacterium, B. subtilis and D. radiodurans is due to laternal gene transfer [21,25]. The proteins of Halobacterium have changed to adapt to the high salt conditions. The simple way of adaptation is the change of protein charges from a basic protein to an acidic protein.
The Asp residues are main determinant whether a protein is acidic or basic. Therefore, amino acid substitutions to increase the Asp residues might be important process for the adaptation. If gene transferred proteins are acidic, it seems that there is no need to adapt. However, those proteins showed identical substitution patterns of the Asp residues as basic proteins showed. This result suggested that adaptation from basic proteins to acidic proteins is not a special way.
Ribosomal proteins were rich in the basic proteins of Halobacterium. Some of the ribosomal proteins of B. subtilis were basic proteins, and the corresponding ones were changed to acidic proteins in Halobacterium. The atomic structure of the large ribosomal subunit from halophilic Haloarcula marismortui was determined [36]. According to the structure, ribosomal protein L2 and L14 have substantial interactions with 23S rRNA, while ribosomal protein L1, L5, L11 and L18 have weak interactions. Halobacterium ribosomal proteins L2 and L14 were assigned as basic and L1, L5, L11 and L18 as acidic proteins. The possibility was estimated that the ribosomal proteins which have strong interactions with 23S rRNA remained as basic proteins and proteins with weak interactions changed to acidic proteins. RNA consists of negatively charged phosphates, which may interact with positively charged Lys or Arg residues. If the interactions are essential the Lys or Arg residues would be conserved, if not Lys or Arg residues are allowed to substitute. This scenario of the change from basic proteins to acidic proteins is based on the atomic structural data.

CONCLUSION
Most of the proteins of Halobacterium showed high content of Asp residues and considered as acidic proteins, however, the content of Asp was not high for the basic proteins. The Asp and Glu residues are determinant whether a protein of Halobacterium is acidic or basic. The substitution patterns to increase the Asp residues in the acidic proteins of Halobacterium are independent on the character of the corresponding proteins whether they are acidic or basic. The proteins of Halobacterium showed a tendency to have residues with smaller side chain than the proteins of B. subtilis / D. radiodurans.