Skip to main content

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

  • Protocol
  • Published:

Assessing protein–ligand interaction scoring functions with the CASF-2013 benchmark

Abstract

Scoring functions are a group of computational methods widely applied in structure-based drug design for fast evaluation of protein–ligand interactions. To date, a whole spectrum of scoring functions have been developed based on different assumptions or algorithms. Therefore, it is important to both the end users and the developers of scoring functions that their performance be objectively assessed. We have developed the comparative assessment of scoring functions (CASF) benchmark as an open-access solution for scoring function evaluation. The latest CASF-2013 benchmark enables evaluation of the so-called 'scoring power', 'ranking power', 'docking power', and 'screening power' of a given scoring function with a high-quality test set of 195 complexes formed between diverse protein molecules and their small-molecule ligands. Evaluation results of the standard scoring functions implemented in several mainstream software programs (including Schrödinger, MOE, Discovery Studio, SYBYL, and GOLD) are provided as reference. This benchmark has become popular among the scoring function community since its first release. In this protocol, we provide detailed descriptions of the data files included in the CASF-2013 package and step-by-step instructions on how to conduct the performance tests with the ready-to-use computer scripts included in the package. This protocol is expected to lower the technical hurdles in front of new and existing users of the CASF-2013 benchmark. On a standard desktop workstation, it takes roughly half an hour to complete the whole evaluation procedure for one scoring function, once the required inputs, i.e., the results computed on the test set, are ready to use.

This is a preview of subscription content, access via your institution

Access options

Rent or buy this article

Prices vary by article type

from$1.95

to$39.95

Prices may be subject to local taxes which are calculated during checkout

Figure 1: Illustration of how the test set used in CASF-2013 was compiled.
Figure 2: Correlation between the experimental binding constant of each protein–ligand complex and its ΔSAS (i.e., buried solvent-accessible surface area of the ligand molecule upon binding) in the scoring power test.
Figure 3: Illustration of how the docking power is evaluated with decoy ligand-binding poses.
Figure 4: Information recorded in the 'CoreSet.dat' file.
Figure 5: An example output given by the scoring power test.
Figure 6: An example output given by the ranking power test.
Figure 7: An example output of the docking power test.
Figure 8: An example output of the screening power test measured by enrichment factors.
Figure 9: An example output of the screening power test measured by the success rate of finding the tightest binder.

Similar content being viewed by others

References

  1. Böhm, H.J. & Stahl, M. The use of scoring functions in drug discovery applications. in Reviews in Computational Chemistry, Vol. 18 (eds. Lipkowitz, K.B. & Boyd, D.B.) 41–88 (Wiley-VCH, 2002).

  2. Schulz-Gasch, T. & Stahl, M. Scoring functions for protein-ligand interactions: a critical perspective. Drug Discov. Today Tech. 1, 231–239 (2004).

    Article  CAS  Google Scholar 

  3. Leach, A.R., Shoichet, B.K. & Peishoff, C.E. Prediction of protein-ligand interactions. docking and scoring: successes and gaps. J. Med. Chem. 49, 5851–5855 (2006).

    Article  CAS  Google Scholar 

  4. Rajamani, R. & Good, A.C. Ranking poses in structure-based lead discovery and optimization: current trends in scoring function development. Curr. Opin. Drug Discov. Dev. 10, 308–315 (2007).

    CAS  Google Scholar 

  5. Brooijmans, N. & Kuntz, I.D. Molecular recognition and docking algorithms. Annu. Rev. Biophys. Biomol. Struct. 32, 335–373 (2003).

    Article  CAS  Google Scholar 

  6. Muegge, I. & Rarey, M. Small molecule docking and scoring. in Reviews in Computational Chemistry, Vol. 17 (eds. Lipkowitz, K.B. & Boyd, D.B.) 1–60 (Wiley-VCH, 2001).

  7. Kitchen, D.B., Decornez, H., Furr, J.R. & Bajorath, J. Docking and scoring in virtual screening for drug discovery: methods and applications. Nat. Rev. Drug Discov. 3, 935–949 (2004).

    Article  CAS  Google Scholar 

  8. Kuntz, I.D., Blaney, J.M., Oatley, S.J., Langridge, R. & Ferrin, T.E. A geometric approach to macromolecule-ligand interactions. J. Mol. Biol. 161, 269–288 (1982).

    Article  CAS  Google Scholar 

  9. Ewing, T.J.A., Makino, S., Skillman, A.G. & Kuntz, I.D. DOCK 4.0: search strategies for automated molecular docking of flexible molecule databases. J. Comput. Aided Mol. Des. 15, 411–428 (2001).

    Article  CAS  Google Scholar 

  10. Morris, G.M. et al. Automated docking using a Lamarckian genetic algorithm and an empirical binding free energy function. J. Comput. Chem. 19, 1639–1662 (1998).

    Article  CAS  Google Scholar 

  11. Morris, G.M. et al. Autodock4 and AutoDockTools4: automated docking with selective receptor flexibility. J. Comput. Chem. 16, 2785–2791 (2009).

    Article  Google Scholar 

  12. Jones, G., Willett, P., Glen, R.C., Leach, A.R. & Taylor, R. Development and validation of a genetic algorithm for flexible docking. J. Mol. Biol. 267, 727–748 (1997).

    Article  CAS  Google Scholar 

  13. Friesner, R.A. et al. Glide: a new approach for rapid, accurate docking and scoring. 1. Method and assessment of docking accuracy. J. Med. Chem. 47, 1739–1749 (2004).

    Article  CAS  Google Scholar 

  14. Halgren, T.A. et al. Glide: a new approach for rapid, accurate docking and scoring. 2. Enrichment factors in database screening. J. Med. Chem. 47, 1750–1759 (2004).

    Article  CAS  Google Scholar 

  15. Jain, A.N. Surflex: fully automatic flexible molecular docking using a molecular similarity-based search engine. J. Med. Chem. 46, 499–511 (2003).

    Article  CAS  Google Scholar 

  16. Jain, A.N. Surflex-Dock 2.1: robust performance from ligand energetic modeling, ring flexibility, and knowledge-based search. J. Comput. Aided Mol. Des. 21, 281–306 (2007).

    Article  CAS  Google Scholar 

  17. Schneider, G. & Fechner, U. Computer-based de novo design of drug-like molecules. Nat. Rev. Drug Discov. 4, 649–663 (2005).

    Article  CAS  Google Scholar 

  18. Kutchukian, P.S. & Shakhnovich, E.I. De novo design: balancing novelty and confined chemical space. Expert Opin. Drug Discov. 5, 789–812 (2010).

    Article  CAS  Google Scholar 

  19. Liu, J. & Wang, R. Classification of current scoring functions. J. Chem. Inf. Model. 55, 475–482 (2015).

    Article  CAS  Google Scholar 

  20. Charifson, P.S., Corkery, J.J., Murcko, M.A. & Walters, W.P. Consensus scoring: a method for obtaining improved hit rates from docking databases of three-dimensional structures into proteins. J. Med. Chem. 42, 5100–5109 (1999).

    Article  CAS  Google Scholar 

  21. Bissantz, C., Folkers, G. & Rognan, D. Protein-based virtual screening of chemical databases. 1. Evaluation of different docking/scoring combinations. J. Med. Chem. 43, 4759–4767 (2000).

    Article  CAS  Google Scholar 

  22. Ha, S., Andreani, R., Robbins, A. & Muegge, I. Evaluation of docking/scoring approaches: a comparative study based on MMP3 inhibitors. J. Comput. Aided Mol. Des. 14, 435–448 (2000).

    Article  CAS  Google Scholar 

  23. Stahl, M. & Rarey, M. Detailed analysis of scoring functions for virtual screening. J. Med. Chem. 44, 1035–1042 (2001).

    Article  CAS  Google Scholar 

  24. Bursulaya, B., Totrov, M., Abagyan, R. & Brooks, C. Comparative study of several algorithms for flexible ligand docking. J. Comput. Aided Mol. Des. 17, 755–763 (2003).

    Article  CAS  Google Scholar 

  25. Xing, L., Hodgkin, E., Liu, Q. & Sedlock, D. Evaluation and application of multiple scoring functions for a virtual screening experiment. J. Comput. Aided Mol. Des. 18, 333–344 (2004).

    Article  CAS  Google Scholar 

  26. Hu, X., Balaz, S. & Shelver, W.H. A practical approach to docking of zinc metalloproteinase inhibitors. J. Mol. Graph. Model. 22, 293–307 (2004).

    Article  CAS  Google Scholar 

  27. Kontoyianni, M., McClellan, L.M. & Sokol, G.S. Evaluation of docking performance: comparative data on docking algorithms. J. Med. Chem. 47, 558–565 (2004).

    Article  CAS  Google Scholar 

  28. Kontoyianni, M., Sokol, G.S. & MCclellan, L.M. Evaluation of library ranking efficacy in virtual screening. J. Comput. Chem. 26, 11–22 (2005).

    Article  CAS  Google Scholar 

  29. Cummings, M.D., DesJarlais, R.L., Gibbs, A.C., Mohan, V. & Jaeger, E.P. Comparison of automated docking programs as virtual screening tools. J. Med. Chem. 48, 962–976 (2005).

    Article  CAS  Google Scholar 

  30. Evers, A. & Klabunde, T. Structure-based drug discovery using GPCR homology modeling: successful virtual screening for antagonists of the alpha1a adrenergic receptor. J. Med. Chem. 48, 1088–1097 (2005).

    Article  CAS  Google Scholar 

  31. Warren, G.L. et al. A critical assessment of docking programs and scoring functions. J. Med. Chem. 49, 5912–5931 (2006).

    Article  CAS  Google Scholar 

  32. Zhou, Z., Felts, A.K., Friesner, R.A. & Levy, R.M. Comparative performance of several flexible docking programs and scoring functions: enrichment studies for a diverse set of pharmaceutically relevant targets. J. Chem. Inf. Model. 47, 1599–1608 (2007).

    Article  CAS  Google Scholar 

  33. McGaughey, G.B. et al. Comparison of topological, shape, and docking methods in virtual screening. J. Chem. Inf. Model. 47, 1504–1519 (2007).

    Article  CAS  Google Scholar 

  34. Houston, D.R. & Walkinshaw, M.D. Consensus docking: improving the reliability of docking in a virtual screening context. J. Chem. Inf. Model. 53, 384–390 (2013).

    Article  CAS  Google Scholar 

  35. Tuccinardi, T., Poli, G., Romboli, V., Giordano, A. & Martinelli, A. Extensive consensus docking evaluation for ligand pose prediction and virtual screening studies. J. Chem. Inf. Model. 54, 2980–2986 (2014).

    Article  CAS  Google Scholar 

  36. Xu, W., Lucke, A.J. & Fairlie, D.P. Comparing sixteen scoring functions for predicting biological activities of ligands for protein targets. J. Mol. Graph. Model. 57, 76–88 (2015).

    Article  CAS  Google Scholar 

  37. Damm-Ganamet, K.L., Smith, R.D., Dunbar, J.B., Stuckey, J.A. & Carlson, H.A. CSAR benchmark exercise 2011–2012: evaluation of results from docking and relative ranking of blinded congeneric series. J. Chem. Inf. Model. 53, 1853–1870 (2013).

    Article  CAS  Google Scholar 

  38. Dunbar, J.B. et al. CSAR Data Set Release 2012: ligands, affinities, complexes, and docking decoys. J. Chem. Inf. Model. 53, 1842–1852 (2013).

    Article  CAS  Google Scholar 

  39. Smith, R.D. et al. CSAR benchmark exercise 2013: evaluation of results from a combined computational protein design, docking, and scoring/ranking challenge. J. Chem. Inf. Model. 56, 1022–1031 (2016).

    Article  CAS  Google Scholar 

  40. Carlson, H.A. et al. CSAR 2014: a benchmark exercise using unpublished data from pharma. J. Chem. Inf. Model. 56, 1063–1077 (2016).

    Article  CAS  Google Scholar 

  41. Perez, C. & Ortiz, A.R. Evaluation of docking functions for protein-ligand docking. J. Med. Chem. 44, 3768–3785 (2001).

    Article  CAS  Google Scholar 

  42. Kellenberger, E., Rodrigo, J., Muller, P. & Rognan, D. Comparative evaluation of eight docking tools for docking and virtual screening accuracy. Proteins 57, 225–242 (2004).

    Article  CAS  Google Scholar 

  43. Perola, E., Walters, W.P. & Charifson, P.S. A detailed comparison of current docking and scoring methods on systems of pharmaceutical relevance. Proteins 56, 235–249 (2004).

    Article  CAS  Google Scholar 

  44. Chen, H., Lyne, P.D., Giordanetto, F., Lovell, T. & Li, J. On evaluating molecular-docking methods for pose prediction and enrichment factors. J. Chem. Inf. Model. 46, 401–415 (2006).

    Article  CAS  Google Scholar 

  45. Onodera, K., Satou, K. & Hirota, H. Evaluations of molecular docking programs for virtual screening. J. Chem. Inf. Model. 47, 1609–1618 (2007).

    Article  CAS  Google Scholar 

  46. Kim, R. & Skolnick, J. Assessment of programs for ligand binding affinity prediction. J. Comput. Chem. 29, 1316–1331 (2008).

    Article  CAS  Google Scholar 

  47. Cross, J.B. et al. Comparison of several molecular docking programs: pose prediction and virtual screening accuracy. J. Chem. Inf. Model. 49, 1455–1474 (2009).

    Article  CAS  Google Scholar 

  48. Li, X., Li, Y., Cheng, T., Liu, Z. & Wang, R. Evaluation of the performance of four molecular docking programs on a diverse set of protein-ligand complexes. J. Comput. Chem. 31, 2109–2125 (2010).

    Article  Google Scholar 

  49. Plewczynski, D., Lazniewski, M., Augustyniak, R. & Ginalski, K. Can we trust docking results? Evaluation of seven commonly used programs on PDBbind database. J. Comput. Chem. 32, 742–755 (2011).

    Article  CAS  Google Scholar 

  50. Wang, Z. et al. Comprehensive evaluation of ten docking programs on a diverse set of protein–ligand complexes: the prediction accuracy of sampling power and scoring power. Phys. Chem. Chem. Phys. 18, 12964–12975 (2016).

    Article  CAS  Google Scholar 

  51. Wang, R., Fang, X., Lu, Y. & Wang, S. The PDBbind database: collection of binding affinities for protein-ligand complexes with known three-dimensional structures. J. Med. Chem. 47, 2977–2980 (2004).

    Article  CAS  Google Scholar 

  52. Wang, R., Fang, X., Lu, Y., Yang, C.-Y. & Wang, S. The PDBbind database: methodologies and updates. J. Med. Chem. 48, 4111–4119 (2005).

    Article  CAS  Google Scholar 

  53. Liu, Z.H. et al. PDB-wide collection of binding data: current status of the PDBbind database. Bioinformatics 31, 405–412 (2015).

    Article  CAS  Google Scholar 

  54. Hu, L., Benson, M.L., Smith, R.D., Lerner, M.G. & Carlson, H.A. Binding MOAD (Mother of All Databases). Proteins 60, 333–340 (2005).

    Article  CAS  Google Scholar 

  55. Benson, M.L. et al. Binding MOAD, a high-quality protein-ligand database. Nucleic Acids Res. 36, D674–D678 (2008).

    Article  CAS  Google Scholar 

  56. Ahmed, A., Smith, R.D., Clark, J.J., Dunbar, J.B. Jr. & Carlson, H.A. Recent improvements to Binding MOAD: a resource for protein-ligand binding affinities and structures. Nucleic Acids Res. 43, D465–D469 (2014).

    Article  Google Scholar 

  57. Cole, J.C., Murray, C.W., Nissink, W.M., Taylor, R.D. & Taylor, R. Comparing protein-ligand docking programs is difficult. Proteins 60, 325–332 (2005).

    Article  CAS  Google Scholar 

  58. Jain, A.N. Bias, reporting, and sharing: computational evaluations of docking methods. J. Comput. Aided Mol. Des. 22, 201–212 (2008).

    Article  CAS  Google Scholar 

  59. Todorov, N.P., Monthoux, P.H. & Alberts, I.L. The influence of variations of ligand protonation and tautomerism on protein-ligand recognition and binding energy landscape. J. Chem. Inf. Model. 46, 1134–1142 (2006).

    Article  CAS  Google Scholar 

  60. Brink, T. & Exner, T.E. Influence of protonation, tautomeric, and stereoisomeric states on protein-ligand docking results. J. Chem. Inf. Model. 49, 1535–1546 (2009).

    Article  Google Scholar 

  61. Wang, R., Lu, Y. & Wang, S. Comparative evaluation of 11 scoring functions for molecular docking. J. Med. Chem. 46, 2287–2303 (2003).

    Article  CAS  Google Scholar 

  62. Wang, R., Lu, Y., Fang, X. & Wang, S. An extensive test of 14 scoring functions using the PDBbind refined set of 800 protein-ligand complexes. J. Chem. Inf. Comput. Sci. 44, 2114–2125 (2004).

    Article  CAS  Google Scholar 

  63. Ferrara, P., Gohlke, H., Price, D.J., Klebe, G. & Brooks, C.L. Assessing scoring functions for protein-ligand interactions. J. Med. Chem. 47, 3032–3047 (2004).

    Article  CAS  Google Scholar 

  64. Marsden, P.M., Puvanendrampillai, D., Mitchell, J.B.O. & Glen, R.C. Predicting protein-ligand binding affinities: a low scoring game? Org. Biomol. Chem. 2, 3267–3273 (2004).

    Article  CAS  Google Scholar 

  65. Oda, A., Tsuchida, K., Takakura, T., Yamaotsu, N. & Hirono, S. Comparison of consensus scoring strategies for evaluating computational models of protein-ligand complexes. J. Chem. Inf. Model. 46, 380–391 (2006).

    Article  CAS  Google Scholar 

  66. Dunbar, J.B. et al. CSAR benchmark exercise of 2010: selection of the proteinligand complexes. J. Chem. Inf. Model. 51, 2036–2046 (2011).

    Article  CAS  Google Scholar 

  67. Smith, R.D. et al. CSAR benchmark exercise of 2010: combined evaluation across all submitted scoring functions. J. Chem. Inf. Model. 51, 2115–2131 (2011).

    Article  CAS  Google Scholar 

  68. Yilmazer, N.D. & Korth, M. Comparison of molecular mechanics, semi-empirical quantum mechanical, and density functional theory methods for scoring protein-ligand interactions. J. Phys. Chem. B 117, 8075–8084 (2013).

    Article  CAS  Google Scholar 

  69. Cheng, T., Li, X., Li, Y., Liu, Z. & Wang, R. Comparative assessment of scoring functions on a diverse test set. J. Chem. Inf. Model. 49, 1079–1093 (2009).

    Article  CAS  Google Scholar 

  70. Li, Y. et al. Comparative assessment of scoring functions on an updated benchmark: I. Compilation of the test set. J. Chem. Inf. Model. 54, 1700–1716 (2014).

    Article  CAS  Google Scholar 

  71. Li, Y., Han, L., Liu, Z.H. & Wang, R.X. Comparative assessment of scoring functions on an updated benchmark: II. Evaluation methods and general results. J. Chem. Inf. Model. 54, 1717–1736 (2014).

    Article  CAS  Google Scholar 

  72. Berman, H.M., Henrick, K. & Nakamura, H. Announcing the worldwide Protein Data Bank. Nat. Struct. Biol. 10, 980 (2003).

    Article  CAS  Google Scholar 

  73. Lipinski, C.A., Lombardo, F., Dominy, B.W. & Feeney, P.J. Experimental and computational approaches to estimate solubility and permeability in drug discovery and development settings. Adv. Drug Deliv. Rev. 46, 3–26 (2001).

    Article  CAS  Google Scholar 

  74. Wang, R., Lai, L. & Wang, S. Further development and validation of empirical scoring functions for structure-based binding affinity prediction. J. Comput. Aided Mol. Des. 16, 11–26 (2002).

    Article  CAS  Google Scholar 

  75. Ain, Q.U., Aleksandrova, A., Roessler, F.D. & Ballester, P.J. Machine-learning scoring functions to improve structure-based binding affinity prediction and virtual screening. WIREs Comput. Mol. Sci. 5, 405–424 (2015).

    Article  CAS  Google Scholar 

Download references

Acknowledgements

We are grateful to the users of the CASF benchmark for their valuable feedback. This work was financially supported by the Ministry of Science and Technology of China (National Key Research Program, grant no. 2016YFA0502302), the National Natural Science Foundation of China (grant nos. 81725022, 81430083, 21472227, 21673276, and 21402230), the Chinese Academy of Sciences (Strategic Priority Research Program, grant no. XDB20000000), and the Science and Technology Development Foundation of Macao SAR (grant no. 055/2013/A2).

Author information

Authors and Affiliations

Authors

Contributions

R.W. conceived and supervised the project. Y.L. designed the protocol, performed computations, and also drafted the manuscript. M.S., Z.L., J. Li, J. Liu, and L.H. helped with data processing and programming.

Corresponding author

Correspondence to Renxiao Wang.

Ethics declarations

Competing interests

The authors declare no competing financial interests.

Integrated supplementary information

Supplementary Figure 1 Contents under the ‘decoys_docking/’ directory in the CASF-2013 package.

Only some data files under this directory are shown in this figure as demonstration.

Supplementary Figure 2 Contents under the ‘decoys_screening/’ directory in the CASF-2013 package.

Only some data files under the "10gs/" subdirectory are shown in this figure as demonstration.

Supplementary Figure 3 Information of the target proteins and their known binders recorded in ‘TargetInfo.dat’.

Only some target proteins are shown in this figure. The first four-letter code in each line refers to the PDB entry from which the target protein structure is retrieved; while the rest codes indicate the PDB entries containing the known binders to this target protein. All known binders to the target protein are ranked in a descending order by their binding affinities, i.e. the tightest binder is ranked at the first place.

Supplementary information

Supplementary Text and Figures

Supplementary Figures 1–3 and Supplementary Tables 1–9. (PDF 1359 kb)

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Li, Y., Su, M., Liu, Z. et al. Assessing protein–ligand interaction scoring functions with the CASF-2013 benchmark. Nat Protoc 13, 666–680 (2018). https://doi.org/10.1038/nprot.2017.114

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1038/nprot.2017.114

This article is cited by

Comments

By submitting a comment you agree to abide by our Terms and Community Guidelines. If you find something abusive or that does not comply with our terms or guidelines please flag it as inappropriate.

Search

Quick links

Nature Briefing

Sign up for the Nature Briefing newsletter — what matters in science, free to your inbox daily.

Get the most important science stories of the day, free in your inbox. Sign up for Nature Briefing