The Hydrophobic Temperature Dependence of Amino Acids Directly Calculated from Protein Structures

Erik van Dijk; Arlo Hoogeveen; Sanne Abeln

doi:10.1371/journal.pcbi.1004277

Abstract

The hydrophobic effect is the main driving force in protein folding. One can estimate the relative strength of this hydrophobic effect for each amino acid by mining a large set of experimentally determined protein structures. However, the hydrophobic force is known to be strongly temperature dependent. This temperature dependence is thought to explain the denaturation of proteins at low temperatures. Here we investigate if it is possible to extract this temperature dependence directly from a large set of protein structures determined at different temperatures. Using NMR structures filtered for sequence identity, we were able to extract hydrophobicity propensities for all amino acids at five different temperature ranges (spanning 265-340 K). These propensities show that the hydrophobicity becomes weaker at lower temperatures, in line with current theory. Alternatively, one can conclude that the temperature dependence of the hydrophobic effect has a measurable influence on protein structures. Moreover, this work provides a method for probing the individual temperature dependence of the different amino acid types, which is difficult to obtain by direct experiment.

Author Summary

In general, proteins become functional once they fold into a specific globular structure. On folding, hydrophobic amino acids get buried inside the protein such that they are shielded from the water; this hydrophobic effect makes a protein fold stable. However, the strength of the hydrophobicity is known to be strongly temperature dependent, leading for example to lower stability at lower temperatures (cold denaturation). Nevertheless, it is difficult to quantify the temperature dependence for hydrophobic amino acids. Here we are able to estimate the strength of the hydrophobic effect, by analysing the positions of a large number of amino acids from protein structures experimentally determined at different temperatures. For each amino acid type, we use the ratio between the number of residues at the inside and at the surface of the folded structures as a measure for its hydrophobicity. This approach shows that the hydrophobic effect becomes weaker at lower temperatures, as expected from theoretical predictions. Understanding the temperature dependence for amino acids, can help to make proteins (or enzymes) stable at a specific temperature range. For example, the design of enzymes that are stable and functional at low temperatures may benefit from this work.

Citation: van Dijk E, Hoogeveen A, Abeln S (2015) The Hydrophobic Temperature Dependence of Amino Acids Directly Calculated from Protein Structures. PLoS Comput Biol 11(5): e1004277. https://doi.org/10.1371/journal.pcbi.1004277

Editor: Helmut Grubmüller, Max Planck Institute for Biophysical Chemistry, GERMANY

Received: October 3, 2014; Accepted: April 12, 2015; Published: May 22, 2015

Copyright: © 2015 van Dijk et al. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited

Data Availability: Data are available at http://www.few.vu.nl/~abeln/hydrophobicT/

Funding: SA has been supported by a Veni grant on the project ‘Understanding toxic protein oligomers through ensemble characteristics’ from Netherlands Organisation for Scientic Research (NWO)’. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.

Competing interests: The authors have declared that no competing interests exist.

Introduction

When a protein folds, hydrophobic amino acids get buried inside the protein to form a hydrophobic core. Inside this core the hydrophobic side chains are shielded from the water. The tendency of hydrophobic groups to cluster together when they are put into water—or the hydrophobic effect—is the most important driving force in protein folding. Note that there are several factors that contribute to the overall stability of a folded protein: for example the formation of hydrogen bonds between backbone atoms (secondary structure) and side chains; the formation of salt bridges between charged amino acids and the burial of hydrophobic side chains upon folding. It is thought that this hydrophobic force gives the single largest contribution to the stability of most protein folds [1]. Moreover, the positioning of hydrophobic clusters in the sequence may affect the folding pathway and dynamics e.g. [2, 3]. Note that these stabilizing forces are partially compensated by the decrease in chain entropy upon folding.

Hydrophobicity is a result of the collective behaviour of the water molecules and ‘oily’ groups. In essence the water-hydrophobe interface is unfavourable compared to water-water or hydrophobic-hydrophobic interactions. The free energy difference upon burial of hydrophobic groups is partially entropic and partially enthalpic, causing a distinct temperature dependence [4, 5]. Even though the exact molecular cause for these enthalpic and entropic contributions is the focus of active research [6, 7] and can change depending on the type of protein [7], the resultant temperature dependence can be measured experimentally for several different non-polar substances [8, 9]. From such measurements, models and theory we know that the hydrophobic force peaks between 30–80°C and becomes weaker at both lower and higher temperatures, see Fig 1A.

Download:

Fig 1. Length scale dependence of hydrophobic effect from calculations by Huang and Chandler [10] (A).

The cost of making a cavity in the water with a radius of the given size against temperature is plotted. The position of the maximum depends on the size (radius) of the solute. Small solutes with a radius of 4 Å have a peak at around 70°C, whereas larger particles with a radius of 10 Å have a peak around 40°C. An example protein structure: PDB-ID: 2K5I (B). We estimate free energies of transfer from the hydrophobic core to the surface of the protein by comparing the number of hydrophobic amino acids on the surface (small yellow spheres), to the number of buried hydrophobics (large yellow spheres), to the number of polar amino acids on the surface (small blue spheres) and to the number of buried polar amino acids (large blue spheres).

https://doi.org/10.1371/journal.pcbi.1004277.g001

Since hydrophobicity is such a large contributor to protein stability, the temperature dependence of the hydrophobic effect has important consequences. Firstly, some proteins do not only unfold at high temperatures, as can be explained through the entropy of the chain, but also at low temperatures (cold denaturation) [11]. This effect is thought to be a consequence of hydrophobicity becoming weaker at low temperatures [12]. Secondly, alternate states of intrinsically disordered proteins may become more favourable at different temperatures due to this effect [13]. Thirdly, protein-protein and protein-substrate interactions—if dominated by hydrophobic interactions—may also be sensitive to temperature changes.

It is essential to quantify the temperature dependence if one wants to model and predict the stability of folded proteins and protein interactions over a large range of temperatures. For industrial purposes, proteins or enzymes that can be active over a wide temperature range are of crucial importance. To achieve this, proteins from species that live at extreme temperatures, thermophiles and psychrophiles, have been used and adapted extensively for biocatalysis [14, 15]. Understanding and quantifying the hydrophobic temperature dependence for specific amino acids is essential if one wants to predict thermostability of proteins.

Earlier, Folch et al. [16] showed that temperature dependent pairwise potentials for amino acids can help to predict the melting temperature of homologous pairs of proteins. More recently, this study was extended to also predict stability at low temperatures [17]. In this work we focus on the temperature dependence of the effective interactions between hydrophobic amino acids and water.

Even though this temperature dependence has important consequences, it is often not considered due to practical concerns. The temperature dependence is typically not included in interaction potentials for protein structure prediction or coarse grained simulations; such potentials do not model the water molecules explicitly or in enough detail to capture this effect. It is difficult to measure the temperature dependence for specific amino acids by experiments, under physically relevant conditions. In other words, it is difficult to measure the difference in free energy between the folded and unfolded chain for separate amino acids. In this work we show that it is possible to obtain this temperature dependence for specific amino acids by mining a large set of protein structures resolved by Nuclear Magnetic Resonance (NMR).

Physically or chemically relevant quantities can be obtained by averaging over a large set of structures. For example, specific bond lengths, the most favourable dihedral angles or approximate hydrophobicities for different amino acid types can be obtained by taking an ensemble average over a set of protein structures. More specifically, hydrophobicity scales for the different amino acid types may be obtained using physicochemical properties [18], or by calculating how often we find each residue type exposed to the solvent at the surface of a protein [18–21]. Different approaches give slightly different results—and a somewhat different ranking between the residues—but do agree overall. Hydrophobicity scales are useful for a wide range of problems involving structure prediction: from predicting the severity of a mutation to disorder prediction and full structure prediction e.g. [22–27].

Estimates for pairwise free energies between amino acid types have been obtained by mining protein structures. A pairwise interaction potential may be calculated by counting the number of contacts made between different types of amino acids [16, 28, 29]. More recently, this method has been further developed to allow the extraction of interactions between the solvent and the different types of residues, as well as the pairwise interactions [30]. Knowledge-based amino acid pair-potentials are used in structure prediction [31], coarse-grained protein simulations [32–35] and protein-protein docking methods [36]. Recently, a knowledge based amino acid pair potential with a temperature dependence has also been used to predict the thermostability of proteins [17].

In this work, we estimate the hydrophobic effect as the free energy cost for transferring a hydrophobic amino acid from the core of the protein to the water exposed surface, see Fig 1B. We use three distinct approaches to estimate these transfer free energies. Firstly, we use a previously validated approach to derive a statistical pair potential between amino acids to extract free energy estimates for the hydrophobic interaction. This contact based method has been shown to yield hydrophobicity estimates that give physically realistic results upon simulation. Secondly, we use a more direct approach that calculates propensities for surface accessibility for each of the amino acids; this method is similar to other approaches that derive knowledge based hydrophobicity scales [18–20]. Thirdly, we use an area based approach that considers the amount of exposed surface area per amino acid. The three approaches give similar results, and show significant temperature dependence for hydrophobic amino acids in line with expectations from theory and measurements on small hydrophobic particles.

Results/Discussion

In order to extract the hydrophobic temperature dependence from experimentally determined protein structures, it is important to choose the set of structures carefully. Firstly, we explored the contents of the Protein DataBank (PDB), [37], containing over 96k structures. Fig 2 shows the temperature distribution of available protein structures determined by X-ray crystallography and nuclear magnetic resonance (NMR). For this study we only use structures determined by NMR, as these experiments can be performed on soluble proteins at the temperature range of interest for the hydrophobic effect. This makes it possible to probe temperature dependent effects in proteins; for example a temperature induced transition [38] and cold denaturation [11] have been observed using this technique.

Download:

Fig 2. Distribution of temperatures at which experimental protein structures were resolved.

All acquisition temperatures of structures as of April 2014 available in the PDB are shown. The 80,662 X-ray diffraction structures are centred around 100 K, while the 10,969 NMR structures show a peak at room temperature (300 K). Note that the small peak of NMR data just above absolute zero may be temperatures entered in celsius instead of kelvin; this data is not used in this study. Temperature bins, as given in Table 1, are indicated in different shades of grey.

https://doi.org/10.1371/journal.pcbi.1004277.g002

In order to obtain estimates for the solvation free energies of different types of amino acids at different temperatures, we divided the data into five temperature bins, see Fig 2 and Table 1. The bins were chosen symmetrically around the peak at room temperature (300 K), to balance the number of structures in each bin.

Download:

Table 1. Selected protein structures.

https://doi.org/10.1371/journal.pcbi.1004277.t001

We set out to explore if we can observe the temperature dependence of the hydrophobic effect by analysing this filtered set of protein structures. Protein structures determined by NMR at different temperatures were used to obtain free energy estimates for the transfer of amino acids from the core of the protein to the surface. Under the assumption of random mixing, the transfer free energy estimates can be estimated through statistical methods [16, 28–30]. We investigate three methods, 1) a contact based calculation which has been shown to give a reasonable attraction [30], 2) a direct calculation of propensities to surface exposure 3) an area based calculation that incorporates the accessible surface area in a continuous measure of hydrophobicity, see Methods for details.

Firstly, we investigate whether the raw free energy estimates are dependent on the temperature. To further increase the statistical accuracy, amino acids are divided into five classes: hydrophobic, charged, polar, aromatic and other, see Table 2. Fig 3 shows a surprisingly clear temperature dependence for the different hydrophobic amino acids: at lower temperatures the hydrophobic effect becomes weaker. This is in line with expectations from experiments and theory [4, 5]. The results for the area based potential are very similar to the results of the contact based potential (see S7–S14 Figs).

Download:

Table 2. Amino acid class definition.

https://doi.org/10.1371/journal.pcbi.1004277.t002

Download:

Fig 3. Raw free energies of transfer for classes of amino acids.

Contact based (A) and surface based (B) free energies are shown for different classes of amino acids. Points show the free energy estimates for each temperature bin, lines are fitted with a parabola, consistent with the potentials found in [10]. Arrows indicate the bins used to test the significance of the temperature dependence.

https://doi.org/10.1371/journal.pcbi.1004277.g003

To test if this temperature dependence is indeed significant, we resampled the protein structures using random temperature labels. From this procedure p-values were calculated to determine the significance of the free energy difference. Table 3 shows the difference in transfer energy (ΔΔG) and p-values between the lowest temperature bin (265–290K) and room temperature (297–299K). Clearly, the temperature trend for the hydrophobic residues is significantly stronger than one would expect from random fluctuations. The standard error to the mean is estimated from the deviations in the potential obtained as indicated in the results by splitting the data set into five parts and recalculating the potentials for each part.

Download:

Table 3. Significance of hydrophobic temperature dependence pooled.

https://doi.org/10.1371/journal.pcbi.1004277.t003

Fig 3 also shows that the surface based potentials give larger absolute differences in free energies than the contact based potentials. This can most likely be explained by the strict cutoff (7% accessible surface area) in the surface based potential compared to the more gradual calculation of the contact based potential; charged and polar amino acids are rarely entirely buried and give therefore a very strong signal for the surface based measure. The relative hydrophobicity, however, is consistent between the three methods, showing our results are qualitatively independent of the method of derivation for the potential.

The results in Fig 3 show a slight temperature dependence for charged (and polar) amino acids. For the surface based potential, however, this effect is not significant (Table 3).

Our transfer free energy estimates are calculated under the assumption of a random mixing model; this provides us with relative transfer free energies for each type of amino acids. This means it is not trivial to compare the free energy differences between different temperature bins. The temperature dependence of the hydrophobic residues could cause the shift of the polar and charged amino acids. In order to enable comparison at different temperatures, we set a reference state for the free energy estimates. The reference state is an important part of the potential, and can determine the accuracy of a potential in structure validation [39].

As we are here particularly interested to compare the transfer free energies between different temperatures it is desirable that our reference does not have any temperature dependent interaction with the solvent. Betancourt and Thirumalai [29] and Buchete et al. [27] use Threonine, a small water-like polar amino acid, as a reference in the calculation for their amino acid pair-potential. In our case, as the number of structures available is limited, choosing a single amino acid as reference will propagate noise through the results. Instead, we pool all the charged and hydrophilic amino acids for each temperature bin, and use those as a reference potential (see Table 2). Even though it is known that polar and charged residues can have a temperature dependent interaction with the solvent and that this interaction can have consequences for protein structure and stability (see for example Refs. [40, 41]), comparing raw estimates (Fig 3) with reference corrected estimates (S1 and S4 Figs) shows that this correction does not change the relative trends, see Methods for further details.

Fig 4 shows estimates for the corrected transfer free energies for all hydrophobic and aromatic amino acids individually, with the polar and charged amino acids as a reference. Results for all amino acids, with and without reference correction are shown in S2, S3, S5 and S6 Figs. The hydrophobicity becomes weaker at lower temperatures, showing the results from the ‘raw’ estimates hold up. Again, the significance of the temperature dependence of each hydrophobic amino acid type is examined. For almost all hydrophobic amino acids the free energy estimates have a significant temperature dependence (Table 4). Note that the correction to a reference of polar and charged amino acids was also performed in the resampling procedure to obtain statistical significance

Download:

Fig 4. Reference corrected free energies of transfer for hydrophobic amino acids.

Contact based (A) and surface based (B) free energies are shown for hydrophobic and aromatic amino acids. The free energies are corrected by setting a reference of the polar and charged amino acids. Points show the free energy estimates for each temperature bin and lines are fitted with a parabola. Arrows indicate the bins used to test the significance of the temperature dependence.

https://doi.org/10.1371/journal.pcbi.1004277.g004

Download:

Table 4. Significance of hydrophobic temperature dependence.

https://doi.org/10.1371/journal.pcbi.1004277.t004

Fig 4 also shows that the estimated transfer free energies show a very similar trend with respect to temperature to those that have been measured for hydrophobic particles [4] or obtained by calculation according to LCW-theory [10, 42]. For clarity, we fitted parabolas through the estimated transfer free energies, which is a reasonable approximation for trends calculated from theory and observed in experiment (see S15 Fig). It can be observed that the free energies for the hydrophobic amino acids show a maximum of around 310–350 kelvin for both the surface and contact based free energy estimates; this is slightly lower than what is expected from theory (see for comparison Fig 1A)

Due to the lack of data at higher temperatures (T > 320K), it is difficult to estimate a precise maximum for the transfer free energies. Nevertheless, an interesting trend may be observed from Fig 4. Larger amino acids, for example Tryptophan, have a maximum at lower temperatures compared to smaller amino acids such as Alanine. Again, this trend is consistent with theory and experiments [10], where the transfer free energy of larger particles shows a maximum at lower temperatures.

Overall, we can conclude that the temperature dependence of the hydrophobic effect has a measurable influence on protein structures determined by NMR. The effect we find appears to be on the right order of magnitude in comparison with theory for the hydrophobic effect and known cold denaturating behaviour of proteins (see S2 Text). The results show that structures determined at lower temperature have more exposed hydrophobic surface area. This suggests that at these temperatures the structures already become more open, as has been observed for some specific proteins (e.g. [43]). It would be very interesting to investigate if these low temperature structures are more flexible and dynamic than the same structures obtained at room temperature.

Conclusion

In this work we set out to investigate whether the hydrophobic temperature dependence could be obtained by mining a large set of protein structures resolved by NMR. We used a contact based, an area based and a surface based approach to obtain free energy estimates for the transfer of an amino acid out of the hydrophobic protein core onto the water exposed surface. We find a surprisingly clear trend for the free energy estimates with respect to the temperature: the hydrophobic effect becomes weaker at lower temperatures, as is expected based on theory, simulations and experiments. Alternatively, one can conclude that the temperature dependence of the hydrophobic effect has indeed a measurable influence on protein structures. Despite the sparseness of the data, and the inconsistencies in reporting of experimental temperatures, we find that the observed trend holds and is significant regardless of the precise method used to estimate the transfer free energies, the specific groupings of amino acids or the chosen reference.

Methods

Data collection

The temperature (in kelvin) at which the experiment is performed can be found in the mandatory ‘acquisition data’ section of PDB files. Several filters were applied. Some structures were filtered out because no temperature was entered or because they were given several temperatures from multiple data collection sessions. In order to get representative statistics for amino acid composition, it is important to remove any bias in the PDB for large sequence families. To take out this redundancy we used PDB filter-select 25% [44–46]. Table 1 shows the number of remaining structures in each bin after these filterings. A few further PDB files had to be removed due to their incompatibility with DSSP. After these steps, each PDB-file was split into multiple models, and the accessible surface area was determined using DSSP for each model. For each residue in the protein chain, the average accessible surface area over all models was used. The final counts for each PDB-structure are shown in S1 Data. The format is explained in S1 Text.

Calculation of contact based potential

To obtain estimates for the free energies of transferring specific amino acid types from the outside of the protein to the hydrophobic core, we used two approaches. The first approach is based on contacts between amino acids, and between amino acids and the solvent as in the work of Abeln and Frenkel [30]. This potential has been shown to give an appropriate distinction between the protein core and surface by simulation. The second approach uses the presence or absence of amino acids on the surface of the protein, providing a more direct way to obtain the hydrophobicty of each amino acids.

In the contact based approach, we calculate knowledge-based pair-potentials over the set of structures described above. The free energy estimates ϵ_i,j between amino acid types i and j can be calculated as: (1) where c_i,j are the number of contacts between amino acids type i and j, and where ω_i,j is the expected number of contacts. Note that here we are specifically interested in the case where one of the interaction partners is the solvent, i.e. ϵ_i,solvent.

We can calculate the expected number of contacts, ω_i,j, by considering the distribution of the amino acid types i and j in the set of protein structures: (2) here n_i q_i is the total amount of contacts for type i, where n_i is the number of amino acid of type i and q_i is the coordination number, which we set to 4 for all amino acids to remain consistent with Abeln and Frenkel [30]. Note that the sum in denominator loops over all the amino acids and water (k). In practise the total number of contacts for an amino acid type n_i q_i can be calculated directly from the data.

The number of water contacts is estimated through the size of the surface accessible area for a residue as calculated by DSSP [47]. Note that for the water contact points, we do not consider real water molecules, but a surface area similar to the size of an amino acid. We estimate the number of contacts as the product between q = 4 and the fraction of exposed surface area α_r for residue r. Hence, based on the assumption that a residue can interact with four other residues, water contact points can be created. The fraction of exposed surface area, α_r, is given by: (3) S_r is the solvent accessible area, calculated with the DSSP program, and a(r) is the amino acid type of residue r; $\max {S_{a (r)}}$ is the maximum accessible area in an unfolded chain for that amino acid type.

Calculation of surface based potential

An alternative measure for hydrophobicity can be obtained by calculating the propensity for an amino acid to be on the surface. Classic amino acid propensities, which are for example used to describe the affinity for a certain secondary structure type, can be calculated through a simple ratio of fractions e.g. Chapter 12 of Ref. [48]. Here we use the structural classes buried and non-buried. To decide whether a residue (r) is buried, we use a cutoff: α_r < 7% [49]. We can calculate the propensity (P) for amino acids to be buried as: (4) where P_a,b stands for the propensity for an amino acid type, a, to be buried as indicated by the subscript b. Translating this into counts yields: (5) where N_a,b is the total number of amino acids of type a that are buried, and N_a,nb is the total number of amino acids of type a that are non-buried. Similarly, (6) where N_b is the total number of buried amino acids, and N_nb is the total number of amino acids that are not buried.

When propensities are used to estimate transfer free energies, through ΔF_a,b = −kT log(P_a,b) it has the disadvantage that: (7) This can be seen by substituting the formula for P_a,nb in the formula for the free energy, ΔF.

Here we define our propensities in an alternative way to overcome this problem similar to Shatyan et al. [21]. If we define our alternative propensities, P*, analogous to a partition coefficient, we obtain: (8) which does have the desired property summarized in Eq 7.

Calculation of area based potential

While the contact based potential is established, some of the assumptions are particularly useful in the context of a coarse grained lattice simulation. On the other hand, the surface based potential uses the assumption that a residue is buried when less then 7% of its surface is exposed. To test the robustness of our results with regards to these assumptions, we investigated two additional potentials, based on the exposed area. The first one corresponds to the contact based potential, with very large (infinite) coordination numbers. This area based potential is calculated by comparing the amount of exposed surface area, S_r, for an amino acid type a to that of the average amino acid. (9)

S_r is the solvent accessible area, calculated with the DSSP program, and a(r) is the amino acid type of residue r; $\max {S_{a (r)}}$ is the maximum accessible area in an unfolded chain for that amino acid type.

A similar potential, but scaled with the maximum solvent accessible area, is also calculated. We will refer to this potential as the scaled area based potential, C_a,s = C_a max(S_a). The interactions of each residue are multiplied by its maximum accessible surface area. The results for this potential are very similar. Large residues have a higher interaction score when compared to smaller residues. The results for this potential are shown in S11, S12, S13 and S14 Figs.

Significance of temperature dependence

The estimated error to the mean for each data point was obtained by splitting the data into five parts each containing an equal number of structures. The potential was recalculated for each of the five parts, and a standard deviation was calculated from each of them. This allows us to estimate a 95% confidence interval by taking two standard errors on each side of the mean. These are the error bars shown in the plots.

The significance of the temperature dependence of the potentials was determined through a resampling procedure for two different temperature bins: the lowest temperature range and room temperature. We resampled our data by shuffling the temperature labels of the protein structures and recalculating the contact based and surface based potentials for a set of 1000 random samples. P-values for the difference in hydrophobicity between the two temperature bins were determined as the fraction of resampled free energy differences that were larger in size than the original calculation.

Fitting procedure

To obtain an estimate for the temperature dependence of the potential, we need to assign a single temperature for the structures within a temperature bin. The average temperature of the structures is taken to be the temperature of the bin. A weighted least squares fitting procedure was used to fit a parabola to the potential as a function of temperature, which is a reasonable approximation to the relation found in both theory and experiment. In a weighted least squares fit, the sum $S = \sum_{i = 1}^{n} w_{i} r_{i}^{2}$ is minimized. Here, the i indicates the index of the temperature bin, w_i is the weight, and r_i is the difference between observations and the model. The number of residues of type s in bin i was used as weight.

Supporting Information

S1 Data. Data file containing counts.

Counts of the different parameters, for each PDB-structure, in a tab-separated format.

https://doi.org/10.1371/journal.pcbi.1004277.s001

(TXT)

S1 Text. Description raw data, contained in S1 Data.

https://doi.org/10.1371/journal.pcbi.1004277.s002

(PDF)