Computational protein design for given backbone: recent progresses in general method-related aspects

https://doi.org/10.1016/j.sbi.2016.06.013Get rights and content

Highlights

  • High success rate protein design for given backbone remains an important goal.

  • Physic-based energies for polar interactions need to be balanced with desolvation.

  • New statistical energy functions have been developed and experimentally tested.

  • Several methods consider limited backbone flexibility during sequence optimization.

  • Efficient experimental approaches accelerate computational method development.

To achieve high success rate in protein design requires a reliable sequence design method to find amino acid sequences that stably fold into a desired backbone structure. This problem is addressed by computational protein design through the approach of energy minimization. Here we review recent method progresses related to improving the accuracy of this approach. First, the quality of the energy model is a key factor. Second, with structure sensitive energy functions, whether and how backbone flexibility is considered can have large effects on design accuracy, although usually only small adjustments of the backbone structure itself are involved. Third, the effective accuracy of design results can be boosted by post-processing a small number of designed sequences with complementary models that may not be efficient enough for full sequence optimization. Finally, computational method development will benefit greatly from increasingly efficient experimental approaches that can be applied to obtain extensive feedbacks.

Introduction

In different protein design tasks, it is often common requisites for the designed protein to possess a thermodynamically stable and well-defined three-dimensional structure. Thus a primary quest of computational protein design is to design proteins that fold into a target structure, or form a stable complex of a well-defined structure with given molecular partners. This difficult question may be split into two sub-questions to make it more tractable (Figure 1). The first sub-question is how to define or construct a backbone structure that is of high designability (which means that there exist a relatively large number of amino acid sequences that stably form the structure). The second sub-question is how to find (some of) these amino acid sequences. So far there still lacks a general approach to address the first sub-question, although there were several significant advances for the design of particular types of backbone structures lately [1, 2, 3, 4, 5] (Figure 2). For α-helical coiled-coil backbones, their designability has been analyzed mainly in the space of the so-called crick-parameters [1, 2, 6]. Besides this backbone type, analyses of secondary structure and linking loop patterns in relation to tertiary structure packing motifs have led Koga et al. [3] to design ‘idealized’ backbones, and Lin et al. [4] to design backbones of controlled secondary structure sizes and overall shapes. In another work, Park et al. demonstrated a step-wise strategy to design repeat proteins of controlled curvatures [5].

On the other hand, the sub-question of designing amino acid sequences for a given backbone structure has been systematically addressed by computational protein design (CPD), and it comprises the main topic with which the current review is concerned. Sequence design is carried out via the classical approach of minimizing a structure-dependent objective function (i.e. the energy function) with respect to the amino acid sequence (Figure 3). The same approach is employed to design protein–protein [7] or protein–small molecule binding interfaces [8]. Through CPD, the number of fully automatically designed proteins that form thermodynamically stable and well-defined structures have been increasing continuously. Despite this, current energy models used in CPD are not yet perfect and may result in low success rates and inaccurate designs [9]. Improvements of sequence design for given backbones can make CPD a more effective tool for different applications. In addition, being able to consistently find good sequences for arbitrary types of designable backbones will obviously accelerate solving the first sub-problem mentioned above, which is the creating of new backbones. In this review, we give an overview of works published in recent years that are related to the methods of CPD for given backbones in aspects that are associated with improving the accuracy or success rates of the method (see also Figure 3): energy functions, treating structural flexibility and/or negative design, assessing and refining individual sequences, and efficient experimental approaches that can provide extensive feedback information for method improvements.

Section snippets

Energy functions

Here we use ‘energy function’ to broadly refer to the objective functions for sequence optimization. CPD has heavily borrowed energy terms from molecular mechanics (MM) force fields, although these physics-based terms of empirical functional forms have been originally developed for structural modeling instead of sequence design. In CPD, one has to treat solvent implicitly, which affects both intra-solute polar interactions and desolvation. Balancing the two types of interaction is considered to

Structural flexibility

In CPD, usually very limited conformational flexibility surrounding a given backbone structure is considered, so that the vast sequence space can be explored efficiently. Common simplifications in this aspect include fixed backbone and discretized side chain rotamer state. Under such simplifications either heuristic or deterministic [23, 24] optimization algorithms can be applied to optimize the sequence. However, these simplifications are often inconsistent with the use of high-resolution

Evaluation and fine-tuning of individual sequences

For full sequence optimization, the energy function needs to be computationally efficient, with necessary trade-offs in accuracy and comprehensiveness. To make up for the effects of these trade-offs, it may be beneficial to apply evaluation, error-detection and correction to individual designed sequences after automatic sequence optimization. At this post-processing stage, more sophisticated and complementary models can be employed, as the requirements on computational efficiency is far less

Obtaining experimental feedbacks

Judged by the relatively small number of experimental successes accumulated through the last two decades, method improvement of CPD has been relatively slow. To a large part this can be attributed to the lack of proper experimental feedbacks. For method improvement, we are in need of experimental results concerning not a few but a large variety of design results, including those failed designs. In addition, we need to know the effects of sequence perturbations, that is, which changes lead to

Conclusions

The general framework of computational protein design (CPD) comprises computationally optimizing the sequence with respect to a structure-dependent energy function. We summarize recent studies within this framework that span several general aspects. First, while physics-based terms in the energy function may benefit from better-balanced intra-solute polar interactions and desolvation, new ways to derive energy terms based on statistical analyses of known sequence and structural data have been

Conflict of interest

The authors declared that they have no conflict of interest to this work.

References and recommended reading

Papers of particular interest, published within the period of review, have been highlighted as:

  • • of special interest

  • •• of outstanding interest

Acknowledgement

This work has been supported by Chinese Natural Science Foundation (Grant numbers 31370755 and 31570719).

References (50)

  • O. Khersonsky et al.

    Bridging the gaps in design methodologies by evolutionary optimization of the stability and proficiency of designed Kemp eliminase KE59

    Proc Natl Acad Sci U S A

    (2012)
  • J. DeBartolo et al.

    Predictive Bcl-2 family binding models rooted in experiment or structure

    J Mol Biol

    (2012)
  • P.S. Huang et al.

    High thermodynamic stability of parametrically designed helical bundles

    Science

    (2014)
  • A.R. Thomson et al.

    Computational design of water-soluble alpha-helical barrels

    Science

    (2014)
  • N. Koga et al.

    Principles for designing ideal protein structures

    Nature

    (2012)
  • Y.R. Lin et al.

    Control over overall shape and size in de novo designed proteins

    Proc Natl Acad Sci U S A

    (2015)
  • K. Park et al.

    Control of repeat-protein curvature by computational protein design

    Nat Struct Mol Biol

    (2015)
  • Z. Li et al.

    Energy functions in de novo protein design: current challenges and future prospects

    Annu Rev Biophys

    (2013)
  • P.B. Stranges et al.

    A comparison of successful and failed protein interface designs highlights the challenges of designing buried hydrogen bonds

    Protein Sci

    (2013)
  • E.L. Jackson et al.

    Amino-acid site variability among natural and designed proteins

    PeerJ

    (2013)
  • D.N. Kim et al.

    Boosting protein stability with the computational design of beta-sheet surfaces

    Protein Sci

    (2016)
  • D.H. Pike et al.

    Empirical estimation of local dielectric constants: toward atomistic design of collagen mimetic peptides

    Biopolymers

    (2015)
  • P. Xiong et al.

    Protein design with a comprehensive statistical energy function and boosted by experimental selection for foldability

    Nat Commun

    (2014)
  • P. Mitra et al.

    An evolution-based approach to de novo protein design and case study on Mycobacterium tuberculosis

    PLoS Comput Biol

    (2013)
  • J.R. Brender et al.

    Predicting the effect of mutations on protein–protein binding interactions through structure-based interface profiles

    PLoS Comput Biol

    (2015)
  • Cited by (0)

    View full text