Skip to main content

Knowledge-Based Unfolded State Model for Protein Design

  • Protocol
  • First Online:
Computational Peptide Science

Part of the book series: Methods in Molecular Biology ((MIMB,volume 2405))

  • 1502 Accesses

Abstract

The design of proteins and miniproteins is an important challenge. Designed variants should be stable, meaning the folded/unfolded free energy difference should be large enough. Thus, the unfolded state plays a central role. An extended peptide model is often used, where side chains interact with solvent and nearby backbone, but not each other. The unfolded energy is then a function of sequence composition only and can be empirically parametrized. If the space of sequences is explored with a Monte Carlo procedure, protein variants will be sampled according to a well-defined Boltzmann probability distribution. We can then choose unfolded model parameters to maximize the probability of sampling native-like sequences. This leads to a well-defined maximum likelihood framework. We present an iterative algorithm that follows the likelihood gradient. The method is presented in the context of our Proteus software, as a detailed downloadable tutorial. The unfolded model is combined with a folded model that uses molecular mechanics and a Generalized Born solvent. It was optimized for three PDZ domains and then used to redesign them. The sequences sampled are native-like and similar to a recent PDZ design study that was experimentally validated.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Protocol
USD 49.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 129.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 169.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 249.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Dahiyat BI, Mayo SL (1997) De novo protein design: fully automated sequence selection. Science 278:82–87

    Article  CAS  PubMed  Google Scholar 

  2. Kuhlman B, Dantas G, Ireton GC, Varani G, Stoddard BL, Baker D (2003) Design of a novel globular protein fold with atomic-level accuracy. Science 302:1364–1368

    Article  CAS  PubMed  Google Scholar 

  3. Dantas G, Kuhlman B, Callender D, Wong M, Baker D (2003) A large test of computational protein design: folding and stability of nine completely redesigned globular proteins. J Mol Biol 332:449–460

    Article  CAS  PubMed  Google Scholar 

  4. Xiong P, Wang M, Zhou X, Zhang T, Zhang J, Chen Q, Liu H (2014) Protein design with a comprehensive statistical energy function and boosted by experimental selection for foldability. Nat Commun 5:5330

    Article  CAS  PubMed  Google Scholar 

  5. Huang P, Feldmeier K, Parmggiegiani F, Velas DAF, Hoecker B, Baker D (2016) De novo design of a four-fold symmetric TIM-barrel protein with atomic-level accuracies. Nat Chem Biol 12:29–43

    Article  CAS  Google Scholar 

  6. Johansson KE, Johansen NT, Christensen S, Horowitz S, Bardwell JCA, Olsen JG, Willemoes M, Lindorff-Larsen K, Ferkinghoff-Borg J, Hamelryck T, Winther JR (2016) Computational redesign of thioredoxin is hypersensitive toward minor conformational changes in the backbone template. J Mol Biol 428:4361–4377

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  7. Rocklin GJ, Chidyausiku TM, Goreshnik I, Ford A, Houliston S, Lemak A, Carter L, Ravichandran R, Mulligan VK, Chevalier A, Arrowsmith CH, Baker D (2017) Global analysis of protein folding using massively parallel design, synthesis, and testing. Science 357:168–175

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  8. Cao L, Goreshnik I, Coventry B, Case JB, Miller L, Kozodoy L, Chen RE, Carter L, Walls L, Park Y-J, Stewart L, Diamond M, Veesler D, Baker D (2020) De novo design of picomolar SARS-Cov-2 miniprotein inhibitors. Science 370:426–431

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  9. Opuu V, Nigro G, Gaillard T, Mechulam Y, Schmitt E, Simonson T (2020). Adaptive landscape flattening allows the design of both enzyme:substrate binding and catalytic power. PLoS Comp Biol 16:e1007600

    Article  Google Scholar 

  10. Simon AJ, Zhou Y, Ramasubramani V, Glaser J, Pothukuchy A, Gollihar J, Gerberich JC, Leggere JC, Morrow BR, Jung C, Glotzer SC, Taylor DW, Ellington AD (2019) Supercharging enables organized assembly of synthetic biomolecules. Nat Chem 11:204–212

    Article  CAS  PubMed  Google Scholar 

  11. Hsia Y, Mout R, Sheffler W, Edman NI, Vulovic I, Park Y-J, Redler RL, Bick MJ, Bera AK, Courbet A, Kang A, Brunette T, Nattermann U, Tsai E, Saleem A, Chow CM, Ekiert D, Bhabha G, Veesler D, Baker D (2021) Hierarchical design of multi-scale protein complexes by combinatorial assembly of oligomeric helical bundle and repeat protein building blocks. Nat Chem (in press)

    Google Scholar 

  12. Pokala N, Handel TM (2005) Energy functions for protein design: adjustment with protein-protein complex affinities, models for the unfolded state, and negative design of solubility and specificity. J Mol Biol 347:203–227

    Article  CAS  PubMed  Google Scholar 

  13. Villa F, Panel N, Chen X, Simonson T (2018) Adaptive landscape flattening in amino acid sequence space for the computational design of protein:peptide binding. J Chem Phys 149:072302

    Article  PubMed  Google Scholar 

  14. Ptitsyn OB (1995) Molten globule and protein folding. Adv Prot Chem 47:83–229

    CAS  Google Scholar 

  15. Korzhnev DM, Religa TL, Banachewicz W, Fersht AR, Kay LE (2010) A transient and low-populated protein-folding intermediate at atomic resolution. Science 329:1312–1316

    Article  CAS  PubMed  Google Scholar 

  16. Voelz VA, Bowman GR, Beauchamp K, Pande VS (2010) Molecular simulation of ab initio protein folding for a millisecond folder NTL9(1–39). J Am Chem Soc 132:1526–1528

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  17. Shaw DE, Maragakis P, Lindorff-Larsen K, Piana S, Dror RO, Eastwood MP, Bank JA, Jumper JM, Salmon JK, Shan Y, Wriggers W (2010) Atomic-level characterization of the structural dynamics of proteins. Science 330:341–346

    Article  CAS  PubMed  Google Scholar 

  18. Kundrotas P, Karshikoff A (2002) Modeling of denatured state for calculation of the electrostatic contribution to protein stability. Prot Sci 11:1681

    Article  CAS  Google Scholar 

  19. Saven JG (2003) Connecting statistical and optimized potentials in protein folding via a generalized foldability criterion. J Chem Phys 118:6133–6136

    Article  CAS  Google Scholar 

  20. Zhou HX (2002) A Gaussian-chain model for treating residual charge-charge interactions in the unfolded state of proteins. Proc Natl Acad Sci USA 99:3569–3574

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  21. Samish I, MacDermaid CM, Perez-Aguilar JM, Saven JG (2011) Theoretical and computational protein design. Ann Rev Phys Chem 62:129–149

    Article  CAS  Google Scholar 

  22. Simonson T, Gaillard T, Mignon D, Schmidt am Busch M, Lopes A, Amara N, Polydorides S, Sedano A, Druart K, Archontis G (2013) Computational protein design: the Proteus software and selected applications. J Comput Chem 34:2472–2484

    Google Scholar 

  23. Mignon D, Panel N, Chen X, Fuentes EJ, Simonson T (2017) Computational design of the Tiam1 PDZ domain and its ligand binding. J Chem Theory Comput 13:2271–2289

    Article  PubMed  Google Scholar 

  24. Mignon D, Druart K, Michael E, Opuu V, Polydorides S, Villa F, Gaillard T, Gaillard T, Panel N, Archontis G, Simonson T (2020) Physics-based computational protein design: an update. J Phys Chem A 124:10637–10648

    Article  PubMed  Google Scholar 

  25. Hallen MA, Reedy DA, Donald BR (2012) Dead-end elimination with perturbations (DEEPer): a provable protein design algorithm with continuous side chain and backbone flexibility. Proteins 81:18–39

    Article  PubMed  PubMed Central  Google Scholar 

  26. Simoncini D, Allouche D, de Givry S, Delmas C, Barbe S, Schiex T (2015) Guaranteed discrete energy optimization on large protein design problems. J Chem Theory Comput 11:5980–5989

    Article  CAS  PubMed  Google Scholar 

  27. Nisonoff PGHM, Donald BR (2016) Algorithms for protein design. Curr Opin Struct Biol 39:16–26

    Article  PubMed  PubMed Central  Google Scholar 

  28. Karimi M, Shen Y (2018) iCFN: an efficient exact algorithm for multistate protein design. Bioinformatics 34:i811–820

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  29. Charpentier A, Mignon D, Barbe S, Cortes J, Schiex T, Simonson T, Allouche D (2019) Variable neighborhood search with cost function networks to solve large computational protein design problems. J Chem Inf Model 59:127–136

    Article  CAS  PubMed  Google Scholar 

  30. Mignon D, Simonson T (2016) Comparing three stochastic search algorithms for computational protein design: Monte Carlo, Replica Exchange Monte Carlo, and a multistart, steepest-descent heuristic. J Comput Chem 37:1781–1793

    Article  CAS  PubMed  Google Scholar 

  31. Davidson AR, Sauer RT (1994) Folded proteins occur frequently in libraries of random amino acid sequences. Proc Natl Acad Sci USA 91:2146–2150

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  32. Jackel C, Kast P, Hilvert D (2008) Protein design by directed evolution. Ann Rev Biochem 37:153–173

    CAS  Google Scholar 

  33. Saven JG, Wolynes PG (1997) Statistical mechanics of the combinatorial synthesis and analysis of folding macromolecules. J Phys Chem B 101:8375–8389

    Article  CAS  Google Scholar 

  34. Grimmett GR, Stirzaker DR (2001) Probability and random processes. Oxford University Press, Oxford

    Google Scholar 

  35. Leaver-Fay A, Tyka M, Lewis SM, Lange OF, Thompson J, Jacak R, Kaufman KW, Renfrew PD, Smith CA, Sheffler W, Davis IW, Cooper S, Treuille A, Mandell DJ, Richter F, Ban Y-EA, Fleishman SJ, Corn JE, Kim DE, Lyskov S, Berrondo M, Mentzer S, Popovic Z, Havranek JJ, Karanicolas J, Das R, Meiler J, Kortemme T, Gray JJ, Kuhlman B, Baker D, Bradley P (2011) Rosetta3: an object-oriented software suite for the simulation and design of macromolecules. Methods Enzym 487:545–574

    Article  CAS  Google Scholar 

  36. Kuhlman B (2019) Designing protein structures and complexes with the molecular modeling program Rosetta. J Biol Chem 294:19436–19443

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  37. Gainza P, Roberts KE, Georgiev I, Lilien RH, Keedy DA, Chen C, Reza F, Anderson AC, Richardson DC, Richardson JS, Donald BR (2013) OSPREY: protein design with ensembles, flexibility, and provable algorithms. Methods Enzym 523:87–107

    Article  CAS  Google Scholar 

  38. Pokola N, Handel TM (2005) Energy functions for protein design: adjustment with protein-protein complex affinities, models for the unfolded state, and negative design of solubility and specificity. J Mol Biol 347:203–227

    Article  Google Scholar 

  39. Kleinman CL, Rodrigue N, Bonnard C, Philippe H, Lartillot N (2006) A maximum likelihood framework for protein design. BMC Bioinf. 7:Art. 326

    Google Scholar 

  40. Cornell W, Cieplak P, Bayly C, Gould I, Merz K, Ferguson D, Spellmeyer D, Fox T, Caldwell J, Kollman P (1995) A second generation force field for the simulation of proteins, nucleic acids, and organic molecules. J Am Chem Soc 117:5179–5197

    Article  CAS  Google Scholar 

  41. Lopes A, Aleksandrov A, Bathelt C, Archontis G, Simonson T (2007) Computational sidechain placement and protein mutagenesis with implicit solvent models. Proteins 67:853–867

    Article  CAS  PubMed  Google Scholar 

  42. Michael E, Polydorides S, Simonson T, Archontis G (2017) Simple models for nonpolar solvation: parametrization and testing. J Comput Chem 38:2509–2519

    Article  CAS  PubMed  Google Scholar 

  43. Polydorides S, Simonson T (2013) Monte Carlo simulations of proteins at constant pH with generalized Born solvent, flexible sidechains, and an effective dielectric boundary. J Comput Chem 34:2742–2756

    Article  CAS  PubMed  Google Scholar 

  44. Villa F, Mignon D, Polydorides S, Simonson T (2017) Comparing pairwise-additive and many-body generalized born models for acid/base calculations and protein design. J Comput Chem 38:2396–2410

    Article  CAS  PubMed  Google Scholar 

  45. Archontis G, Simonson T (2005) A residue-pairwise Generalized Born scheme suitable for protein design calculations. J Phys Chem B 109:22667–22673

    Article  CAS  PubMed  Google Scholar 

  46. Schaffer AA, Aravind L, Madden TL, Shavirin JL, Spouge S, Wolf YI, Koonin EV, Altschul SF (2001) Improving the accuracy of PSI-BLAST protein database searches with composition-based statistics and other refinements. Nucl Acids Res 29:2994–3005

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  47. Finn, RD, Clements J, Eddy SR (2011) HMMER web server: interactive sequence similarity searching. Nucl Acids Res 39:W29–37

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  48. Wilson D, Madera M, Vogel C, Chothia C, Gough J (2007) The SUPERFAMILY database in 2007: families and functions. Nucl Acids Res 35:D308–D313

    Article  CAS  PubMed  Google Scholar 

  49. Sievers F, Higgins DG (2018) Clustal Omega for making accurate alignments of many protein sequences. Prot Sci 27:135–145

    Article  CAS  Google Scholar 

  50. Simonson T (2019) The Proteus software for computational protein design. https://proteus.polytechnique.fr. Ecole Polytechnique, Paris

  51. Finn RD, Mistry J, Schuster-Böckler B, Griffiths-Jones S, Hollich V, Lassmann T, Moxon S, Marshall M, Khanna A, Durbin R, Eddy SR, Sonnhammer ELL, Bateman A (2006) Pfam: clans, web tools and services. Nucl Acids Res 34:D247–251

    Article  CAS  PubMed  Google Scholar 

  52. Druart K, Bigot J, Audit E, Simonson T (2017) A hybrid Monte Carlo method for multibackbone protein design. J Chem Theory Comput 12:6035–6048

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Thomas Simonson .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2022 The Author(s), under exclusive license to Springer Science+Business Media, LLC, part of Springer Nature

About this protocol

Check for updates. Verify currency and authenticity via CrossMark

Cite this protocol

Opuu, V., Mignon, D., Simonson, T. (2022). Knowledge-Based Unfolded State Model for Protein Design. In: Simonson, T. (eds) Computational Peptide Science. Methods in Molecular Biology, vol 2405. Humana, New York, NY. https://doi.org/10.1007/978-1-0716-1855-4_19

Download citation

  • DOI: https://doi.org/10.1007/978-1-0716-1855-4_19

  • Published:

  • Publisher Name: Humana, New York, NY

  • Print ISBN: 978-1-0716-1854-7

  • Online ISBN: 978-1-0716-1855-4

  • eBook Packages: Springer Protocols

Publish with us

Policies and ethics