Research articlePopulation-based local search for protein folding simulation in the MJ energy model and cubic lattices
Introduction
Research on protein structure and folding has a history of over fifty years (Pauling and Corey, 1953, Scheraga, 1961), but the problem of how proteins self-assemble to their native state still remains unsolved. There is a continuous stream of both new theories and experimental work aiming at a final answer to this fundamental question (see Govindarajan and Goldstein, 1998, Saibil and Ranson, 2002, Mello and Barrick, 2004, Campbell et al., 2005, Weinkam et al., 2005, Deane et al., 2007, Rose et al., 2006 for a recent overview).
The research on structural proteomics goes along with attempts to solve the tertiary structure prediction problem through protein folding simulations. In order to speed-up simulations, a variety of coarse-grained models has been introduced. Even in one of the most simplified models, namely the HP model (Dill et al., 1995, Shakhnovich, 1998), protein folding has been shown to be NP-complete (Berger and Leighton, 1998), which, however, does not mean NP-completeness in off-lattice models (for example, Integer Programming is NP-complete, while Linear Programming is not). In the HP model, the amino acid chain of a protein is embedded into a two- or three-dimensional rectangular grid lattice. The quality of an embedding is measured by the number of neighbouring pairs (contacts) of hydrophobic amino acids in the lattice. Hart and Istrail (1997) have shown NP-hardness for HP-like models and generalized lattices. The hardness results motivated the study of approximation algorithms in the HP model (Hart and Istrail, 1996). However, the approximation ratio is far from giving exact predictions and consequently various heuristic methods have been proposed, such as genetic algorithms (Unger and Moult, 1993), tabu search (Lesh et al., 2003, Cebrián et al., 2008), Monte Carlo simulations (Wüst et al., 2009), simulated annealing (Hansmann and Okamoto, 1996, Albrecht et al., 2008), and quantum optimization (Perdomo et al., 2008).
In this context of complexity issues it is interesting to note that, to the best of our knowledge, there is no attempt so far of incorporating research on chaperone-mediated protein folding (Saibil and Ranson, 2002, Spiess et al., 2004) into simulation methods. From a computational point of view, chaperonin complexes seem to act like an oracle in complexity theory (Homer and Selman, 2001), where the information provided by the oracle can be used to speed-up computations or to analyse the inherent complexity of a problem subject to the oracle complexity. The utilisation of mechanisms that determine the functionality of such chaperonin folding machines (Saibil and Ranson, 2002) in local search procedures, for example, in associated neighbourhood relations, will certainly open up new avenues for more efficient protein folding simulations.
In the present paper, we devise a new population-based local search method. The method basically represents a guided random tabu search and utilises a landscape analysis method from Zahrani et al. (2008) in a pre-processing step. The population-based tabu search is applied to protein folding simulations in 3D cubic lattice structures with the Miyazawa–Jernigan energy function (Miyazawa and Jernigan, 1985). The neighbourhood relation chosen for the local search methods discussed in the present paper is the pull-move set devised by Lesh et al. (2003). The three components, i.e. the set of conformations in a given lattice structure, the associated objective function, and the neighbourhood relation define an energy landscape. As described in Albrecht et al. (2008), we utilise approximations of the maximum value D of the minimum escape height from local minima of the underlying energy landscape. In order to determine the crucial parameter D for the guided random search, a partial landscape analysis is executed prior to simulations.
In our population-based method, the search procedure traverses the energy landscape with greedy steps towards (potential) local minima, which are then followed by upward steps up to the level of Zm + D, where Zm denotes the value of the objective function of the preceding (potential) local minimum. The procedure starts with a population of randomly selected initial conformations, and after a certain number of steps, parts of the best solutions found so far plus some additional conformations are selected for a re-start with a new population. Due to the random neighbourhood function, the procedure creates a diversity of conformations.
Section snippets
Landscape analysis
We employ a landscape analysis method in order to obtain an estimation of the maximum value D of the minimum escape height over all local minima. The method has been successfully used in the HP model (Albrecht et al., 2008) and is based on short runs of simulated annealing with a logarithmic cooling schedule (LSA).
Logarithmic simulated annealing utilises Hajek's theorem (Hajek, 1988) that under natural assumption about the reversibility of the underlying Markov chain the simulated annealing
The Miyazawa–Jernigan matrix
The MJ energy function is a pairwise interaction matrix, extracted from the distribution of contacts as they occur in real proteins for which the structure has been resolved into PDB format. In the original paper (Miyazawa and Jernigan, 1985), where the energy function is introduced, there are two different interaction matrices. The first matrix, often referred to as MJa in the literature, stands for the actual energy value of each bond, while the second matrix, MJb, stands for the pairwise
Landscape analysis results
For each of the MJ sequences we performed 27 LSA-runs consisting of T = 100,000 iterations each, where each run corresponds to a single Dinit-value. In the following diagrams and tables we summarise the outcome of these experiments. In each diagram, the best energy Zbest achieved in a run is plotted against the Dinit-value used in the run. The tables illustrate the best five runs in terms of energy, which is the lower part of the Zbest vs Dinit diagrams (Fig. 1, Fig. 2, Fig. 3).
As expected, we
Results of population-based local search
We performed 10 independent runs of population-based local search (PLS) for each of the MJ-sequences with the individual parameters from Table 8. Each run was terminated if the energy minimum was reached or a maximum of 1,000,000 generations was completed. Here, a single generation accounts for one step in which all the individuals of the population perform a trial. The number of executed moves to a new feasible conformation is added to the total count of transitions executed so far. A fair
Analysis of local minima
We modified the PLS procedure in order to collect information about the distribution of local minima with respect to the value of the objective function. Instead of declaring a conformation S as being a local minimum after Ltrials transitions, see Section 2.2, all neighbours, i.e. all feasible pull-moves out of S, are examined. Local minima found in this way were stored in a file and then analysed off-line. Only structurally distinct local minima were kept, i.e. each pair of local minima with
Performance analysis
We performed a comparative analysis of PLS, LSA and Monte Carlo (MC) algorithms. For the Monte Carlo algorithm we incorporated the mean first passage time (MFPT) of simulations (see Section 6.3) from the original paper (Faisca and Plaxco, 2006) that introduced the benchmarks.
In total, PLS found native conformations for all six benchmarks in 47 out of 60 runs, whereas LSA terminated in 46 out of 60 runs with a native conformation. The results are very close, however, PLS has the advantage that
Summary
We introduced a population-based local search method for protein folding simulation in lattice models that employs an energy function based upon the Miyazawa–Jernigan Matrix. The search utilises logarithmic simulated annealing in a pre-processing step for landscape analysis. Each individual search process traverses the landscape in alternating sequences of downward and upward steps, where the limit for upward steps is defined by the outcome of the pre-processing step. Tabu lists for each
Acknowledgement
The research has been partially supported by EPSRC Grant EP/D062012/1.
References (35)
- et al.
Stochastic protein folding simulation in the three-dimensional HP-model
Comput. Biol. Chem.
(2008) - et al.
Protein structure prediction by threading. Why it works and why it does not
J. Mol. Biol.
(1998) - et al.
Contact order, transition state placement and the refolding rates of single domain proteins
J. Mol. Biol.
(1998) - et al.
The chaperonin folding machine
Trends Biochem. Sci.
(2002) Protein design: a perspective from simple tractable models
Fold. Des.
(1998)- et al.
Mechanism of the eukaryotic chaperonin: protein folding in the chamber of secrets
Trends Cell Biol.
(2004) - et al.
Genetic algorithms for protein folding simulations
J. Mol. Biol.
(1993) - et al.
Monte Carlo simulations of systems with complex energy landscapes
Comput. Phys. Comm.
(2009) - et al.
Genetic local search for multicast routing with pre-processing by logarithmic simulated annealing
Comput. Oper. Res.
(2008) - et al.
Protein folding in the hydrophobic–hydrophilic (HP) model is NP-complete
J. Comput. Biol.
(1998)
Biochemistry Illustrated: Biochemistry and Molecular Biology in the Post-genomic Era
Protein structure prediction on the face centered cubic lattice by local search
Protein folding, the Levinthal paradox andrapidly mixing Markov chains
Cotranslational protein folding—fact or fiction?
Bioinformatics
Principles of protein folding—a perspective from simple exact models
Protein Sci.
The roles of stability and contact order in determining protein folding rates
Nat. Struct. Biol.
Cooperativity and the origins of rapid, single-exponential kinetics in protein folding
Protein Sci.
Cited by (16)
Guided macro-mutation in a graded energy based genetic algorithm for protein structure prediction
2016, Computational Biology and ChemistryCitation Excerpt :The two other protein instances in Table 5 (2J61 and 2HFQ) are taken from Torres et al. (2007). In the literature we found very few works (Kapsokalivas et al., 2009; Torres et al., 2007) that used 20 × 20 MJ potential-matrix (Miyazawa and Jernigan, 1985) for protein structure prediction on 3D FCC lattice. However, Torres et al. (2007) used 3D HCP lattice and Kapsokalivas et al. (2009) used 3D cubic lattice in their works for protein mapping.
Uphill unfolding of native protein conformations in cubic lattices
2010, Journal of Computational ScienceCitation Excerpt :The simulations are executed in cubic lattices with the pull-move set as neighbourhood relation. The pull-move set was introduced by Lesh et al. [15], see also [12,23] for recent applications of the neighbourhood relation. We note that within the pull-move set some neighbourhood transitions may trigger a sequence of elementary, intermediate steps before the transition is completed.
An Efficient Ant Colony Optimization Algorithm for Protein Structure Prediction
2018, International Symposium on Medical Information and Communication Technology, ISMICTAn Enhanced Genetic Algorithm for Ab Initio Protein Structure Prediction
2016, IEEE Transactions on Evolutionary Computation