Research article
Population-based local search for protein folding simulation in the MJ energy model and cubic lattices

https://doi.org/10.1016/j.compbiolchem.2009.06.006Get rights and content

Abstract

We present experimental results on benchmark problems in 3D cubic lattice structures with the Miyazawa–Jernigan energy function for two local search procedures that utilise the pull-move set: (i) population-based local search (PLS) that traverses the energy landscape with greedy steps towards (potential) local minima followed by upward steps up to a certain level of the objective function; (ii) simulated annealing with a logarithmic cooling schedule (LSA). The parameter settings for PLS are derived from short LSA-runs executed in pre-processing and the procedure utilises tabu lists generated for each member of the population. In terms of the total number of energy function evaluations both methods perform equally well, however, PLS has the potential of being parallelised with an expected speed-up in the region of the population size. Furthermore, both methods require a significant smaller number of function evaluations when compared to Monte Carlo simulations with kink-jump moves.

Introduction

Research on protein structure and folding has a history of over fifty years (Pauling and Corey, 1953, Scheraga, 1961), but the problem of how proteins self-assemble to their native state still remains unsolved. There is a continuous stream of both new theories and experimental work aiming at a final answer to this fundamental question (see Govindarajan and Goldstein, 1998, Saibil and Ranson, 2002, Mello and Barrick, 2004, Campbell et al., 2005, Weinkam et al., 2005, Deane et al., 2007, Rose et al., 2006 for a recent overview).

The research on structural proteomics goes along with attempts to solve the tertiary structure prediction problem through protein folding simulations. In order to speed-up simulations, a variety of coarse-grained models has been introduced. Even in one of the most simplified models, namely the HP model (Dill et al., 1995, Shakhnovich, 1998), protein folding has been shown to be NP-complete (Berger and Leighton, 1998), which, however, does not mean NP-completeness in off-lattice models (for example, Integer Programming is NP-complete, while Linear Programming is not). In the HP model, the amino acid chain of a protein is embedded into a two- or three-dimensional rectangular grid lattice. The quality of an embedding is measured by the number of neighbouring pairs (contacts) of hydrophobic amino acids in the lattice. Hart and Istrail (1997) have shown NP-hardness for HP-like models and generalized lattices. The hardness results motivated the study of approximation algorithms in the HP model (Hart and Istrail, 1996). However, the approximation ratio is far from giving exact predictions and consequently various heuristic methods have been proposed, such as genetic algorithms (Unger and Moult, 1993), tabu search (Lesh et al., 2003, Cebrián et al., 2008), Monte Carlo simulations (Wüst et al., 2009), simulated annealing (Hansmann and Okamoto, 1996, Albrecht et al., 2008), and quantum optimization (Perdomo et al., 2008).

In this context of complexity issues it is interesting to note that, to the best of our knowledge, there is no attempt so far of incorporating research on chaperone-mediated protein folding (Saibil and Ranson, 2002, Spiess et al., 2004) into simulation methods. From a computational point of view, chaperonin complexes seem to act like an oracle in complexity theory (Homer and Selman, 2001), where the information provided by the oracle can be used to speed-up computations or to analyse the inherent complexity of a problem subject to the oracle complexity. The utilisation of mechanisms that determine the functionality of such chaperonin folding machines (Saibil and Ranson, 2002) in local search procedures, for example, in associated neighbourhood relations, will certainly open up new avenues for more efficient protein folding simulations.

In the present paper, we devise a new population-based local search method. The method basically represents a guided random tabu search and utilises a landscape analysis method from Zahrani et al. (2008) in a pre-processing step. The population-based tabu search is applied to protein folding simulations in 3D cubic lattice structures with the Miyazawa–Jernigan energy function (Miyazawa and Jernigan, 1985). The neighbourhood relation chosen for the local search methods discussed in the present paper is the pull-move set devised by Lesh et al. (2003). The three components, i.e. the set of conformations in a given lattice structure, the associated objective function, and the neighbourhood relation define an energy landscape. As described in Albrecht et al. (2008), we utilise approximations of the maximum value D of the minimum escape height from local minima of the underlying energy landscape. In order to determine the crucial parameter D for the guided random search, a partial landscape analysis is executed prior to simulations.

In our population-based method, the search procedure traverses the energy landscape with greedy steps towards (potential) local minima, which are then followed by upward steps up to the level of Zm + D, where Zm denotes the value of the objective function of the preceding (potential) local minimum. The procedure starts with a population of randomly selected initial conformations, and after a certain number of steps, parts of the best solutions found so far plus some additional conformations are selected for a re-start with a new population. Due to the random neighbourhood function, the procedure creates a diversity of conformations.

Section snippets

Landscape analysis

We employ a landscape analysis method in order to obtain an estimation of the maximum value D of the minimum escape height over all local minima. The method has been successfully used in the HP model (Albrecht et al., 2008) and is based on short runs of simulated annealing with a logarithmic cooling schedule (LSA).

Logarithmic simulated annealing utilises Hajek's theorem (Hajek, 1988) that under natural assumption about the reversibility of the underlying Markov chain the simulated annealing

The Miyazawa–Jernigan matrix

The MJ energy function is a pairwise interaction matrix, extracted from the distribution of contacts as they occur in real proteins for which the structure has been resolved into PDB format. In the original paper (Miyazawa and Jernigan, 1985), where the energy function is introduced, there are two different interaction matrices. The first matrix, often referred to as MJa in the literature, stands for the actual energy value of each bond, while the second matrix, MJb, stands for the pairwise

Landscape analysis results

For each of the MJ sequences we performed 27 LSA-runs consisting of T = 100,000 iterations each, where each run corresponds to a single Dinit-value. In the following diagrams and tables we summarise the outcome of these experiments. In each diagram, the best energy Zbest achieved in a run is plotted against the Dinit-value used in the run. The tables illustrate the best five runs in terms of energy, which is the lower part of the Zbest vs Dinit diagrams (Fig. 1, Fig. 2, Fig. 3).

As expected, we

Results of population-based local search

We performed 10 independent runs of population-based local search (PLS) for each of the MJ-sequences with the individual parameters from Table 8. Each run was terminated if the energy minimum was reached or a maximum of 1,000,000 generations was completed. Here, a single generation accounts for one step in which all the individuals of the population perform a trial. The number of executed moves to a new feasible conformation is added to the total count of transitions executed so far. A fair

Analysis of local minima

We modified the PLS procedure in order to collect information about the distribution of local minima with respect to the value of the objective function. Instead of declaring a conformation S as being a local minimum after Ltrials transitions, see Section 2.2, all neighbours, i.e. all feasible pull-moves out of S, are examined. Local minima found in this way were stored in a file and then analysed off-line. Only structurally distinct local minima were kept, i.e. each pair of local minima with

Performance analysis

We performed a comparative analysis of PLS, LSA and Monte Carlo (MC) algorithms. For the Monte Carlo algorithm we incorporated the mean first passage time (MFPT) of simulations (see Section 6.3) from the original paper (Faisca and Plaxco, 2006) that introduced the benchmarks.

In total, PLS found native conformations for all six benchmarks in 47 out of 60 runs, whereas LSA terminated in 46 out of 60 runs with a native conformation. The results are very close, however, PLS has the advantage that

Summary

We introduced a population-based local search method for protein folding simulation in lattice models that employs an energy function based upon the Miyazawa–Jernigan Matrix. The search utilises logarithmic simulated annealing in a pre-processing step for landscape analysis. Each individual search process traverses the landscape in alternating sequences of downward and upward steps, where the limit for upward steps is defined by the outcome of the pre-processing step. Tabu lists for each

Acknowledgement

The research has been partially supported by EPSRC Grant EP/D062012/1.

References (35)

  • P.N. Campbell et al.

    Biochemistry Illustrated: Biochemistry and Molecular Biology in the Post-genomic Era

    (2005)
  • M. Cebrián et al.

    Protein structure prediction on the face centered cubic lattice by local search

  • P. Clote

    Protein folding, the Levinthal paradox andrapidly mixing Markov chains

  • C.M. Deane et al.

    Cotranslational protein folding—fact or fiction?

    Bioinformatics

    (2007)
  • K.A. Dill et al.

    Principles of protein folding—a perspective from simple exact models

    Protein Sci.

    (1995)
  • A.R. Dinner et al.

    The roles of stability and contact order in determining protein folding rates

    Nat. Struct. Biol.

    (2001)
  • P. Faisca et al.

    Cooperativity and the origins of rapid, single-exponential kinetics in protein folding

    Protein Sci.

    (2006)
  • Cited by (16)

    • Guided macro-mutation in a graded energy based genetic algorithm for protein structure prediction

      2016, Computational Biology and Chemistry
      Citation Excerpt :

      The two other protein instances in Table 5 (2J61 and 2HFQ) are taken from Torres et al. (2007). In the literature we found very few works (Kapsokalivas et al., 2009; Torres et al., 2007) that used 20 × 20 MJ potential-matrix (Miyazawa and Jernigan, 1985) for protein structure prediction on 3D FCC lattice. However, Torres et al. (2007) used 3D HCP lattice and Kapsokalivas et al. (2009) used 3D cubic lattice in their works for protein mapping.

    • Uphill unfolding of native protein conformations in cubic lattices

      2010, Journal of Computational Science
      Citation Excerpt :

      The simulations are executed in cubic lattices with the pull-move set as neighbourhood relation. The pull-move set was introduced by Lesh et al. [15], see also [12,23] for recent applications of the neighbourhood relation. We note that within the pull-move set some neighbourhood transitions may trigger a sequence of elementary, intermediate steps before the transition is completed.

    • An Efficient Ant Colony Optimization Algorithm for Protein Structure Prediction

      2018, International Symposium on Medical Information and Communication Technology, ISMICT
    • An Enhanced Genetic Algorithm for Ab Initio Protein Structure Prediction

      2016, IEEE Transactions on Evolutionary Computation
    View all citing articles on Scopus
    View full text