Rescoring Docking Hit Lists for Model Cavity Sites: Predictions and Experimental Testing

https://doi.org/10.1016/j.jmb.2008.01.049Get rights and content

Abstract

Molecular docking computationally screens thousands to millions of organic molecules against protein structures, looking for those with complementary fits. Many approximations are made, often resulting in low “hit rates.” A strategy to overcome these approximations is to rescore top-ranked docked molecules using a better but slower method. One such is afforded by molecular mechanics–generalized Born surface area (MM–GBSA) techniques. These more physically realistic methods have improved models for solvation and electrostatic interactions and conformational change compared to most docking programs. To investigate MM–GBSA rescoring, we re-ranked docking hit lists in three small buried sites: a hydrophobic cavity that binds apolar ligands, a slightly polar cavity that binds aryl and hydrogen-bonding ligands, and an anionic cavity that binds cationic ligands. These sites are simple; consequently, incorrect predictions can be attributed to particular errors in the method, and many likely ligands may actually be tested. In retrospective calculations, MM–GBSA techniques with binding-site minimization better distinguished the known ligands for each cavity from the known decoys compared to the docking calculation alone. This encouraged us to test rescoring prospectively on molecules that ranked poorly by docking but that ranked well when rescored by MM–GBSA. A total of 33 molecules highly ranked by MM–GBSA for the three cavities were tested experimentally. Of these, 23 were observed to bind—these are docking false negatives rescued by rescoring. The 10 remaining molecules are true negatives by docking and false positives by MM–GBSA. X-ray crystal structures were determined for 21 of these 23 molecules. In many cases, the geometry prediction by MM–GBSA improved the initial docking pose and more closely resembled the crystallographic result; yet in several cases, the rescored geometry failed to capture large conformational changes in the protein. Intriguingly, rescoring not only rescued docking false positives, but also introduced several new false positives into the top-ranking molecules. We consider the origins of the successes and failures in MM–GBSA rescoring in these model cavity sites and the prospects for rescoring in biologically relevant targets.

Introduction

Molecular docking computationally screens large databases of small molecules against a macromolecular binding site of defined structure. The technique is often used to find novel ligands for drug discovery. Notwithstanding important successes,1, 2, 3, 4, 5, 6, 7, 8 docking continues to struggle with many methodological deficits. Many approximations are made to screen many molecules in a timely fashion. These include using only one conformation of the protein, neglecting the internal energies of the docking molecules, using simplified models of ligand solvation energies, typically ignoring protein desolvation, and ignoring most entropic terms entirely. These and other shortcuts lead to the high false-positive and false-negative rates for which docking screens are notorious. Docking methods are unreliable for affinity prediction and, except in domains of highly related compounds, even for rank ordering the likely hits that emerge from the virtual screens.

To overcome these deficits, several groups have combined disparate scoring functions in a consensus fashion to capitalize on the strengths and overcome the deficiencies of individual methods.9, 10, 11, 12 This “consensus scoring” approach is attractive when it has worked, but its theoretical underpinnings are slim.13 An alternative approach involves using a higher level of theory to rescore the docking hit lists after the docking calculation has completed. The goal is to reevaluate the top docking hits for energetic complementarity to the target after including more terms and degrees of freedom than modeled by the docking program. Because more terms are considered, rescoring is typically much slower than docking, so much so that only the top-scoring docking pose of the best scoring docked molecules are often considered. This approach has been adopted by versions of the program GLIDE.14 Here ligands are first docked using simplified and relaxed criteria and are then refined by more sophisticated and stringent evaluation of the energies of binding. Similarly, Wang et al. used a hierarchical technique that begins with initial database screening and progresses to molecular mechanics–Poisson–Boltzmann surface area (MM–PBSA) rescoring to find HIV-1 reverse transcriptase inhibitors.15 The combination of an initial docking screen with subsequent rescoring by a molecular mechanics–generalized Born surface area (MM–GBSA) method has been used to improve enrichment of known ligands for several enzymes in retrospective studies and even to identify substrates.16, 17, 18, 19, 20

Such MM–PBSA and MM–GBSA methods involve minimization and often dynamic sampling of the protein–ligand complexes, and include ligand and receptor conformational energies and strain. They evaluate the electrostatics and solvation components of the binding energy by PB or GB methods, including both ligand and receptor desolvation. The MM–GBSA binding energy is determined by Ecomplex  Ereceptor  Eligand where E is an MM–GBSA estimate and solute configurational entropy effects are ignored. In this article, we focus on relative binding energies of different ligands to the same receptor, so the free receptor energy (Ereceptor) does not affect the results. Because the MM–GBSA function includes both internal energies and solvation free energies, and because we explicitly subtract complex (Ecomplex) and ligand (Eligand) contributions, desolvation effects upon complex formation for both the ligand and the receptor are included, at least in principle. There are three main limitations: (1) the force fields and solvation energies are not uniformly accurate; (2) for reasons of computational efficiency, only a small part of configuration space near the DOCK starting pose is really explored; and (3) configurational entropy effects are ignored. Notwithstanding these limitations, the MM–GBSA methods represent a substantially higher level of theory than that encoded by most docking programs and are attractive alternatives to a more complete treatment of the energies of interaction by free-energy perturbation and thermodynamic integration,21 which remain the gold standard but are very slow.

In this study, we set out to test MM–GBSA rescoring of docking hit lists in simple model cavity sites. These sites have been engineered into the buried cores of proteins and bind multiple small organic molecules. In contrast to most drug targets, these cavities are small (150–180 Å3), buried from bulk solvent, and are dominated by a single interaction term. The L99A (Leu99  Ala) cavity in T4 lysozyme22 is almost entirely apolar, the L99A/M102Q (Leu99  Ala/Met102  Gln)23 cavity in the same protein has a single hydrogen-bond acceptor (the introduced Gln102), whereas the W191G (Trp191  Gly) cavity in cytochrome c peroxidase (CCP)24, 25 has a single anionic residue, Asp235 (Fig. 1). The ligands recognized by these sites correspond to these features: the hydrophobic L99A binds small, typically aromatic nonpolar molecules; the slightly polar L99A/M102Q binds not only both apolar molecules but also those bearing one or two hydrogen-bond donors, whereas the anionic W191G cavity almost exclusively binds small monocations. The simplicity of these sites is conducive to disentangling the energetic terms of ligand binding, which are so often convoluted in drug targets with their larger, more complex binding sites. It should be noted that previous work with solvent-exposed sites has suggested that a major advantage of MM–GBSA scoring functions is calculating partial receptor desolvation upon ligand binding.17 This benefit with complex solvent-exposed binding sites may be less relevant in the buried cavity sites, especially the hydrophobic L99A and polar L99A/M102Q sites, which are mostly desolvated. (It is our experience that the cavity sites, in fact, impose a greater strain on the GBSA solvent models to fully desolvate the pockets.)

In the cavity sites, as in other simplified sites,27 an incorrect prediction is often informative, identifying a single problematic term in a scoring function; we have used these cavities as model binding sites to identify problems in molecular docking23, 28, 29, 30 and, more recently, thermodynamic integration.21 Others have found them attractive test systems for method development studies.31, 32, 33, 34 An important advantage of these cavity sites is that they are experimentally tractable for detailed, prospective testing of ligand predictions. Because the ligands they bind are small—in the 70- to 150-amu range—many possible ligands are readily available commercially, which is rarely true of drug targets.35 The binding of these predicted ligands may be tested by direct binding assays, and the structures of the ligand–protein complexes may be routinely determined by X-ray crystallography to resolutions better than 2 Å. Extensive study in the Matthews, Goodin, and our own laboratories has resulted in many tens of diverse ligands for each cavity, as well as tens of “decoys,” which are molecules that were predicted to bind to the sites but for which no binding was observed at concentrations as high as 10 mM on experimental testing.21, 23, 28, 29, 30

We thus used these three simple model cavity sites, L99A, L99A/M102Q, and W191G, as templates to measure the strengths and weaknesses of MM–GBSA rescoring of docking hit lists. We used two rescoring programs: Protein Local Optimization Program (PLOP),36, 37 with binding-site side-chain rotamer search and minimization, and AMBERDOCK, using short molecular dynamics (MD) steps and minimization of binding-site residues (Materials and Methods). Molecular docking was used to screen compound libraries that contained between 5000 and 60,231 fragment-like molecules from the Available Chemicals Directory (ACD); the library size was chosen to partly mitigate issues of size and charge bias from the library alone and to be consistent with earlier studies in these sites (Results).28, 29 The single best pose for each compound that ranked among the top 5000 or 10,000 compounds by docking was then rescored by both MM–GBSA programs. Multiple known ligands and decoys were among the molecules rescored for all three sites' rescored sets. In retrospective calculations, MM–GBSA rescoring improved the separation of ligands from decoys in each of the cavities. We then tested 33 new ligands that were predicted to bind by the MM–GBSA methods that docking alone ranked poorly, generally much worse than the top 500. To investigate the detailed basis of the MM–GBSA predictions, we determined crystal structures for 21 of these new ligands and compared them to the geometries predicted by theory. These studies suggest areas where MM–GBSA methods can contribute to the success of virtual screening and areas where this method faces important challenges.

Section snippets

Retrospective docking and rescoring in the hydrophobic cavity

Approximately 60,000 small molecules were docked into the hydrophobic cavity L99A using DOCK3.5.5423, 38 (Fig. 1a). The compounds in this set were selected from a much larger library so as not to exceed 25 non-hydrogen atoms, as previously described.29 This reduced the enrichment-factor bias that would have otherwise occurred by the trivial ability of the docking program to remove compounds that were simply too large to fit in the cavities. We note that reducing the number of molecules to

Discussion

In principle, the most important improvements of MM–GBSA over docking, certainly over the program used in this study, DOCK3.5.54, are the better representation of electrostatic interactions, ligand and protein desolvation energies, and relaxation of the ligand–protein complex. The simplicity of the model cavity sites allows us to explore how these terms influence docking results in detail and to make prospective predictions for ligands that we can, in fact, acquire and test. Many investigators

Docking against cavity sites

DOCK3.5.5423, 38 was used to dock a multiconformer database of small molecules into the model cavity sites. The receptors, grids, spheres, and ligand databases were prepared as described for the T4 Lysozyme23 and CCP28 cavities, respectively. Briefly, to sample ligand orientations, ligand, receptor, and overlap bins were set to 0.2 Å; the distance tolerance for matching ligand atoms to receptor was set to 0.75 Å. Each docking pose was evaluated for steric fit. Compounds passing this filter were

Acknowledgements

This work was supported by GM59957 (to B.K.S.), AI035707 (to M.P.J.) and GM56531 (to D.A.C.). M.P.J. is a consultant to Schrodinger Inc. We thank Niu Huang for advice on using PLOP; Michael Keiser and Jerome Hert for help with chemical similarity calculations; Michael Mysinger, Veena Thomas, and Michael Keiser for reading this manuscript; and MDL for providing the ACD database.

References (63)

  • Z. Otwinowski et al.

    Processing of X-ray diffraction data collected in oscillation mode

  • A. Evers et al.

    Successful virtual screening for a submicromolar antagonist of the neurokinin-1 receptor based on a ligand-supported homology model

    J. Med. Chem.

    (2004)
  • B.A. Grzybowski et al.

    Combinatorial computational method gives new picomolar ligands for a known enzyme

    Proc. Natl Acad. Sci. USA

    (2002)
  • S. Li et al.

    A computer screening approach to immunoglobulin superfamily structures and interactions: discovery of small non-peptidic CD4 inhibitors as novel immunotherapeutics

    Proc. Natl Acad. Sci. USA

    (1997)
  • N. Huang et al.

    Identification of non-phosphate-containing small molecular weight inhibitors of the tyrosine kinase p56 Lck SH2 domain via in silico screening against the pY + 3 binding site

    J. Med. Chem.

    (2004)
  • H. Song et al.

    A low-molecular-weight compound discovered through virtual database screening inhibits Stat3 function in breast cancer cells

    Proc. Natl Acad. Sci. USA

    (2005)
  • P.D. Lyne et al.

    Identification of compounds with nanomolar binding affinity for checkpoint kinase-1 using knowledge-based virtual screening

    J. Med. Chem.

    (2004)
  • P.S. Charifson et al.

    Consensus scoring: a method for obtaining improved hit rates from docking databases of three-dimensional structures into proteins

    J. Med. Chem.

    (1999)
  • C. Bissantz et al.

    Protein-based virtual screening of chemical databases. 1. Evaluation of different docking/scoring combinations

    J. Med. Chem.

    (2000)
  • M. Stahl et al.

    Detailed analysis of scoring functions for virtual screening

    J. Med. Chem.

    (2001)
  • R.X. Wang et al.

    How does consensus scoring work for virtual library screening? An idealized computer experiment

    J. Chem. Inf. Comput. Sci.

    (2001)
  • R.A. Friesner et al.

    Extra precision glide: docking and scoring incorporating a model of hydrophobic enclosure for protein–ligand complexes

    J. Med. Chem.

    (2006)
  • J. Wang et al.

    Hierarchical database screenings for HIV-1 reverse transcriptase using a pharmacophore model, rigid docking, solvation docking, and MM-PB/SA

    J. Med. Chem.

    (2005)
  • C. Kalyanaraman et al.

    Virtual screening against highly charged active sites: identifying substrates of alpha–beta barrel enzymes

    Biochemistry

    (2005)
  • N. Huang et al.

    Physics-based scoring of protein–ligand complexes: enrichment of known inhibitors in large-scale virtual screening

    J. Chem. Inf. Model.

    (2006)
  • Emanuele Perola

    Minimizing false positives in kinase virtual screens

    Proteins: Struct., Funct., Bioinf.

    (2006)
  • M.R. Lee et al.

    Improving docking accuracy through molecular mechanics generalized Born optimization and scoring

    J. Chem. Theory Comput.

    (2007)
  • P.D. Lyne et al.

    Accurate prediction of the relative potencies of members of a series of kinase inhibitors using molecular docking and MM–GBSA scoring

    J. Med. Chem.

    (2006)
  • A. Morton et al.

    Energetic origins of specificity of ligand binding in an interior nonpolar cavity of T4 lysozyme

    Biochemistry

    (1995)
  • M.M. Fitzgerald et al.

    Small molecule binding to an artificially created cavity at the active site of cytochrome c peroxidase

    Biochemistry

    (1994)
  • W.L. DeLano

    The PyMOL Molecular Graphics System

    (2002)
  • Cited by (0)

    A.P.G. and D.M.S. contributed equally to this work.

    2

    Present address: D. M. Shivakumar, Institute for Molecular Pediatric Sciences, University of Chicago, 929 E. 57th Street, Chicago, IL 60637, USA.

    View full text