Computational protein design for given backbone: recent progresses in general method-related aspects

doi:10.1016/j.sbi.2016.06.013

Current Opinion in Structural Biology

Volume 39, August 2016, Pages 89-95

https://doi.org/10.1016/j.sbi.2016.06.013 Get rights and content

Highlights

•
High success rate protein design for given backbone remains an important goal.
•
Physic-based energies for polar interactions need to be balanced with desolvation.
•
New statistical energy functions have been developed and experimentally tested.
•
Several methods consider limited backbone flexibility during sequence optimization.
•
Efficient experimental approaches accelerate computational method development.

To achieve high success rate in protein design requires a reliable sequence design method to find amino acid sequences that stably fold into a desired backbone structure. This problem is addressed by computational protein design through the approach of energy minimization. Here we review recent method progresses related to improving the accuracy of this approach. First, the quality of the energy model is a key factor. Second, with structure sensitive energy functions, whether and how backbone flexibility is considered can have large effects on design accuracy, although usually only small adjustments of the backbone structure itself are involved. Third, the effective accuracy of design results can be boosted by post-processing a small number of designed sequences with complementary models that may not be efficient enough for full sequence optimization. Finally, computational method development will benefit greatly from increasingly efficient experimental approaches that can be applied to obtain extensive feedbacks.

Introduction

In different protein design tasks, it is often common requisites for the designed protein to possess a thermodynamically stable and well-defined three-dimensional structure. Thus a primary quest of computational protein design is to design proteins that fold into a target structure, or form a stable complex of a well-defined structure with given molecular partners. This difficult question may be split into two sub-questions to make it more tractable (Figure 1). The first sub-question is how to define or construct a backbone structure that is of high designability (which means that there exist a relatively large number of amino acid sequences that stably form the structure). The second sub-question is how to find (some of) these amino acid sequences. So far there still lacks a general approach to address the first sub-question, although there were several significant advances for the design of particular types of backbone structures lately [1, 2, 3, 4, 5] (Figure 2). For α-helical coiled-coil backbones, their designability has been analyzed mainly in the space of the so-called crick-parameters [1, 2, 6]. Besides this backbone type, analyses of secondary structure and linking loop patterns in relation to tertiary structure packing motifs have led Koga et al. [3] to design ‘idealized’ backbones, and Lin et al. [4] to design backbones of controlled secondary structure sizes and overall shapes. In another work, Park et al. demonstrated a step-wise strategy to design repeat proteins of controlled curvatures [5].

On the other hand, the sub-question of designing amino acid sequences for a given backbone structure has been systematically addressed by computational protein design (CPD), and it comprises the main topic with which the current review is concerned. Sequence design is carried out via the classical approach of minimizing a structure-dependent objective function (i.e. the energy function) with respect to the amino acid sequence (Figure 3). The same approach is employed to design protein–protein [7] or protein–small molecule binding interfaces [8]. Through CPD, the number of fully automatically designed proteins that form thermodynamically stable and well-defined structures have been increasing continuously. Despite this, current energy models used in CPD are not yet perfect and may result in low success rates and inaccurate designs [9^•]. Improvements of sequence design for given backbones can make CPD a more effective tool for different applications. In addition, being able to consistently find good sequences for arbitrary types of designable backbones will obviously accelerate solving the first sub-problem mentioned above, which is the creating of new backbones. In this review, we give an overview of works published in recent years that are related to the methods of CPD for given backbones in aspects that are associated with improving the accuracy or success rates of the method (see also Figure 3): energy functions, treating structural flexibility and/or negative design, assessing and refining individual sequences, and efficient experimental approaches that can provide extensive feedback information for method improvements.

Section snippets

Energy functions

Here we use ‘energy function’ to broadly refer to the objective functions for sequence optimization. CPD has heavily borrowed energy terms from molecular mechanics (MM) force fields, although these physics-based terms of empirical functional forms have been originally developed for structural modeling instead of sequence design. In CPD, one has to treat solvent implicitly, which affects both intra-solute polar interactions and desolvation. Balancing the two types of interaction is considered to

Structural flexibility

In CPD, usually very limited conformational flexibility surrounding a given backbone structure is considered, so that the vast sequence space can be explored efficiently. Common simplifications in this aspect include fixed backbone and discretized side chain rotamer state. Under such simplifications either heuristic or deterministic [23, 24] optimization algorithms can be applied to optimize the sequence. However, these simplifications are often inconsistent with the use of high-resolution

Evaluation and fine-tuning of individual sequences

For full sequence optimization, the energy function needs to be computationally efficient, with necessary trade-offs in accuracy and comprehensiveness. To make up for the effects of these trade-offs, it may be beneficial to apply evaluation, error-detection and correction to individual designed sequences after automatic sequence optimization. At this post-processing stage, more sophisticated and complementary models can be employed, as the requirements on computational efficiency is far less

Obtaining experimental feedbacks

Judged by the relatively small number of experimental successes accumulated through the last two decades, method improvement of CPD has been relatively slow. To a large part this can be attributed to the lack of proper experimental feedbacks. For method improvement, we are in need of experimental results concerning not a few but a large variety of design results, including those failed designs. In addition, we need to know the effects of sequence perturbations, that is, which changes lead to

Conclusions

The general framework of computational protein design (CPD) comprises computationally optimizing the sequence with respect to a structure-dependent energy function. We summarize recent studies within this framework that span several general aspects. First, while physics-based terms in the energy function may benefit from better-balanced intra-solute polar interactions and desolvation, new ways to derive energy terms based on statistical analyses of known sequence and structural data have been

Conflict of interest

The authors declared that they have no conflict of interest to this work.

References and recommended reading

Papers of particular interest, published within the period of review, have been highlighted as:

• of special interest
•• of outstanding interest

Acknowledgement

This work has been supported by Chinese Natural Science Foundation (Grant numbers 31370755 and 31570719).

References (50)

G. Grigoryan et al.
Probing designability via a generalized model of helical bundle geometry
J Mol Biol
(2011)
N. London et al.
An accurate binding interaction model in de novo computational protein design of interactions: if you build it, they will bind
J Struct Biol
(2014)
K. Feldmeier et al.
Computational protein design of ligand binding and catalysis
Curr Opin Chem Biol
(2013)
T. Simonson et al.
Computational protein design: the Proteus software and selected applications
J Comput Chem
(2013)
Y.C. Zhou et al.
An efficient parallel algorithm for accelerating computational protein design
Bioinformatics
(2014)
N. Ollikainen et al.
Coupling protein side-chain and backbone flexibility improves the re-design of protein-ligand specificity
PLoS Comput Biol
(2015)
J.A. Davey et al.
Improving the accuracy of protein stability predictions with multistate design using a variety of backbone ensembles
Proteins: Struct Funct Bioinform
(2014)
S. Lanouette et al.
Discovery of substrates for a SET domain lysine methyltransferase predicted by multistate computational protein design
Structure
(2015)
J.K. Kim et al.
BetaVoid: molecular voids via beta-complexes and Voronoi diagrams
Proteins: Struct Funct Bioinform
(2014)
K.P. Tan et al.
TSpred: a web server for the rational design of temperature-sensitive mutants
Nucleic Acids Res
(2014)

Cited by (0)

View full text

Computational protein design for given backbone: recent progresses in general method-related aspects

Highlights

Introduction

Section snippets

Energy functions

Structural flexibility

Evaluation and fine-tuning of individual sequences

Obtaining experimental feedbacks

Conclusions

Conflict of interest

References and recommended reading

Acknowledgement

J Mol Biol

J Struct Biol

Curr Opin Chem Biol

J Comput Chem

Bioinformatics

PLoS Comput Biol

Proteins: Struct Funct Bioinform

Structure

Proteins: Struct Funct Bioinform

Nucleic Acids Res

Proc Natl Acad Sci U S A

J Mol Biol

High thermodynamic stability of parametrically designed helical bundles

Science

Computational design of water-soluble alpha-helical barrels

Science

Principles for designing ideal protein structures

Nature

Control over overall shape and size in de novo designed proteins

Proc Natl Acad Sci U S A

Control of repeat-protein curvature by computational protein design

Nat Struct Mol Biol

Energy functions in de novo protein design: current challenges and future prospects

Annu Rev Biophys

A comparison of successful and failed protein interface designs highlights the challenges of designing buried hydrogen bonds

Protein Sci

Amino-acid site variability among natural and designed proteins

PeerJ

Boosting protein stability with the computational design of beta-sheet surfaces

Protein Sci

Empirical estimation of local dielectric constants: toward atomistic design of collagen mimetic peptides

Biopolymers

Protein design with a comprehensive statistical energy function and boosted by experimental selection for foldability

Nat Commun

An evolution-based approach to de novo protein design and case study on Mycobacterium tuberculosis

PLoS Comput Biol

Predicting the effect of mutations on protein–protein binding interactions through structure-based interface profiles

PLoS Comput Biol