Computational protein design for given backbone: recent progresses in general method-related aspects
Introduction
In different protein design tasks, it is often common requisites for the designed protein to possess a thermodynamically stable and well-defined three-dimensional structure. Thus a primary quest of computational protein design is to design proteins that fold into a target structure, or form a stable complex of a well-defined structure with given molecular partners. This difficult question may be split into two sub-questions to make it more tractable (Figure 1). The first sub-question is how to define or construct a backbone structure that is of high designability (which means that there exist a relatively large number of amino acid sequences that stably form the structure). The second sub-question is how to find (some of) these amino acid sequences. So far there still lacks a general approach to address the first sub-question, although there were several significant advances for the design of particular types of backbone structures lately [1, 2, 3, 4, 5] (Figure 2). For α-helical coiled-coil backbones, their designability has been analyzed mainly in the space of the so-called crick-parameters [1, 2, 6]. Besides this backbone type, analyses of secondary structure and linking loop patterns in relation to tertiary structure packing motifs have led Koga et al. [3] to design ‘idealized’ backbones, and Lin et al. [4] to design backbones of controlled secondary structure sizes and overall shapes. In another work, Park et al. demonstrated a step-wise strategy to design repeat proteins of controlled curvatures [5].
On the other hand, the sub-question of designing amino acid sequences for a given backbone structure has been systematically addressed by computational protein design (CPD), and it comprises the main topic with which the current review is concerned. Sequence design is carried out via the classical approach of minimizing a structure-dependent objective function (i.e. the energy function) with respect to the amino acid sequence (Figure 3). The same approach is employed to design protein–protein [7] or protein–small molecule binding interfaces [8]. Through CPD, the number of fully automatically designed proteins that form thermodynamically stable and well-defined structures have been increasing continuously. Despite this, current energy models used in CPD are not yet perfect and may result in low success rates and inaccurate designs [9•]. Improvements of sequence design for given backbones can make CPD a more effective tool for different applications. In addition, being able to consistently find good sequences for arbitrary types of designable backbones will obviously accelerate solving the first sub-problem mentioned above, which is the creating of new backbones. In this review, we give an overview of works published in recent years that are related to the methods of CPD for given backbones in aspects that are associated with improving the accuracy or success rates of the method (see also Figure 3): energy functions, treating structural flexibility and/or negative design, assessing and refining individual sequences, and efficient experimental approaches that can provide extensive feedback information for method improvements.
Section snippets
Energy functions
Here we use ‘energy function’ to broadly refer to the objective functions for sequence optimization. CPD has heavily borrowed energy terms from molecular mechanics (MM) force fields, although these physics-based terms of empirical functional forms have been originally developed for structural modeling instead of sequence design. In CPD, one has to treat solvent implicitly, which affects both intra-solute polar interactions and desolvation. Balancing the two types of interaction is considered to
Structural flexibility
In CPD, usually very limited conformational flexibility surrounding a given backbone structure is considered, so that the vast sequence space can be explored efficiently. Common simplifications in this aspect include fixed backbone and discretized side chain rotamer state. Under such simplifications either heuristic or deterministic [23, 24] optimization algorithms can be applied to optimize the sequence. However, these simplifications are often inconsistent with the use of high-resolution
Evaluation and fine-tuning of individual sequences
For full sequence optimization, the energy function needs to be computationally efficient, with necessary trade-offs in accuracy and comprehensiveness. To make up for the effects of these trade-offs, it may be beneficial to apply evaluation, error-detection and correction to individual designed sequences after automatic sequence optimization. At this post-processing stage, more sophisticated and complementary models can be employed, as the requirements on computational efficiency is far less
Obtaining experimental feedbacks
Judged by the relatively small number of experimental successes accumulated through the last two decades, method improvement of CPD has been relatively slow. To a large part this can be attributed to the lack of proper experimental feedbacks. For method improvement, we are in need of experimental results concerning not a few but a large variety of design results, including those failed designs. In addition, we need to know the effects of sequence perturbations, that is, which changes lead to
Conclusions
The general framework of computational protein design (CPD) comprises computationally optimizing the sequence with respect to a structure-dependent energy function. We summarize recent studies within this framework that span several general aspects. First, while physics-based terms in the energy function may benefit from better-balanced intra-solute polar interactions and desolvation, new ways to derive energy terms based on statistical analyses of known sequence and structural data have been
Conflict of interest
The authors declared that they have no conflict of interest to this work.
References and recommended reading
Papers of particular interest, published within the period of review, have been highlighted as:
• of special interest
•• of outstanding interest
Acknowledgement
This work has been supported by Chinese Natural Science Foundation (Grant numbers 31370755 and 31570719).
References (50)
- et al.
Probing designability via a generalized model of helical bundle geometry
J Mol Biol
(2011) - et al.
An accurate binding interaction model in de novo computational protein design of interactions: if you build it, they will bind
J Struct Biol
(2014) - et al.
Computational protein design of ligand binding and catalysis
Curr Opin Chem Biol
(2013) - et al.
Computational protein design: the Proteus software and selected applications
J Comput Chem
(2013) - et al.
An efficient parallel algorithm for accelerating computational protein design
Bioinformatics
(2014) - et al.
Coupling protein side-chain and backbone flexibility improves the re-design of protein-ligand specificity
PLoS Comput Biol
(2015) - et al.
Improving the accuracy of protein stability predictions with multistate design using a variety of backbone ensembles
Proteins: Struct Funct Bioinform
(2014) - et al.
Discovery of substrates for a SET domain lysine methyltransferase predicted by multistate computational protein design
Structure
(2015) - et al.
BetaVoid: molecular voids via beta-complexes and Voronoi diagrams
Proteins: Struct Funct Bioinform
(2014) - et al.
TSpred: a web server for the rational design of temperature-sensitive mutants
Nucleic Acids Res
(2014)