Skip to main content

Co-supervised Pre-training of Pocket and Ligand

  • Conference paper
  • First Online:
Machine Learning and Knowledge Discovery in Databases: Research Track (ECML PKDD 2023)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 14169))

  • 1333 Accesses

Abstract

Can we inject the pocket-ligand complementarity knowledge into the pre-trained model and jointly learn their chemical space? Pre-training molecules and proteins have attracted considerable attention in recent years, while most of these approaches focus on learning one of the chemical spaces and lack the consideration of their complementarity. We propose a co-supervised pre-training (CoSP) framework to learn 3D pocket and ligand representations simultaneously. We use a gated geometric message passing layer to model 3D pockets and ligands, where each node’s chemical features, geometric position, and direction are considered. To learn meaningful biological embeddings, we inject the pocket-ligand complementarity into the pre-training model via ChemInfoNCE loss, cooperating with a chemical similarity-enhanced negative sampling strategy to improve the representation learning. Through extensive experiments, we conclude that CoSP can achieve competitive results in pocket matching, molecule property prediction, and virtual screening.

Z. Gao and C. Tan—Equal Contribution.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Altalib, M.K., Salim, N.: Similarity-based virtual screen using enhanced siamese deep learning methods. ACS omega 7(6), 4769–4786 (2022)

    Article  Google Scholar 

  2. Ballester, P.J., Mitchell, J.B.: A machine learning approach to predicting protein-ligand binding affinity with applications to molecular docking. Bioinformatics 26(9), 1169–1175 (2010)

    Article  Google Scholar 

  3. Batista, J., Hawkins, P.C., Tolbert, R., Geballe, M.T.: Sitehopper-a unique tool for binding site comparison. J. Cheminform. 6(1), 1–1 (2014)

    Google Scholar 

  4. Batzner, S., et al.: E (3)-equivariant graph neural networks for data-efficient and accurate interatomic potentials. Nature Commun. 13(1), 1–11 (2022)

    Article  Google Scholar 

  5. Boström, J., Hogner, A., Schmitt, S.: Do structurally similar ligands bind in a similar fashion? J. Med. Chem. 49(23), 6716–6725 (2006)

    Article  Google Scholar 

  6. Brandstetter, J., Hesselink, R., van der Pol, E., Bekkers, E., Welling, M.: Geometric and physical quantities improve e (3) equivariant message passing. arXiv preprint arXiv:2110.02905 (2021)

  7. Chaffey, N.: Alberts, B., Johnson, A., Lewis, J., Raff, M., Roberts, K., Walter, P. Molecular Biology of the Cell. 4th edn. (2003)

    Google Scholar 

  8. Chartier, M., Najmanovich, R.: Detection of binding site molecular interaction field similarities. J. Chem. Inform. Model. 55(8), 1600–1615 (2015)

    Article  Google Scholar 

  9. Chuang, C.Y., Robinson, J., Lin, Y.C., Torralba, A., Jegelka, S.: Debiased contrastive learning. Adv. Neural Inform. Process. Syst. 33, 8765–8775 (2020)

    Google Scholar 

  10. Cohen, T., Welling, M.: Group equivariant convolutional networks. In: International conference on machine learning, pp. 2990–2999. PMLR (2016)

    Google Scholar 

  11. Dankwah, K.O., Mohl, J.E., Begum, K., Leung, M.Y.: Understanding the binding of the same ligand to gpcrs of different families. In: 2021 IEEE International Conference on Bioinformatics and Biomedicine (BIBM), pp. 2494–2501 (2021). https://doi.org/10.1109/BIBM52615.2021.9669761

  12. Desaphy, J., Azdimousa, K., Kellenberger, E., Rognan, D.: Comparison and druggability prediction of protein-ligand binding sites from pharmacophore-annotated cavity shapes (2012)

    Google Scholar 

  13. Desaphy, J., Raimbaud, E., Ducrot, P., Rognan, D.: Encoding protein-ligand interaction patterns in fingerprints and graphs. J. Chem. Inform. Model. 53(3), 623–637 (2013)

    Article  Google Scholar 

  14. Durrant, J.D., McCammon, J.A.: Nnscore: a neural-network-based scoring function for the characterization of protein- ligand complexes. J. Chem. Inform. Model. 50(10), 1865–1871 (2010)

    Article  Google Scholar 

  15. Ehrt, C., Brinkjost, T., Koch, O.: A benchmark driven guide to binding site comparison: an exhaustive evaluation using tailor-made data sets (prospeccts). PLoS Comput. Biol. 14(11), e1006483 (2018)

    Article  Google Scholar 

  16. Fang, X.: Geometry-enhanced molecular representation learning for property prediction. Nature Mach. Intell. 4(2), 127–134 (2022)

    Article  Google Scholar 

  17. Fang, Y., Yang, H., Zhuang, X., Shao, X., Fan, X., Chen, H.: Knowledge-aware contrastive molecular graph learning. arXiv preprint arXiv:2103.13047 (2021)

  18. Francoeur, P.G.: Three-dimensional convolutional neural networks and a cross-docked data set for structure-based drug design. J. Chem. Inform. Model. 60(9), 4200–4215 (2020)

    Article  Google Scholar 

  19. Fuchs, F., Worrall, D., Fischer, V., Welling, M.: Se (3)-transformers: 3d roto-translation equivariant attention networks. Adv. Neural Inform. Process. Syst. 33, 1970–1981 (2020)

    Google Scholar 

  20. Ganea, O.E., et al.: Independent se (3)-equivariant models for end-to-end rigid protein docking. arXiv preprint arXiv:2111.07786 (2021)

  21. Gao, Z., Tan, C., Li, S., et al.: Alphadesign: A graph protein design method and benchmark on alphafolddb. arXiv preprint arXiv:2202.01079 (2022)

  22. Guan, J., Qian, W.W., Ma, W.Y., Ma, J., Peng, J., et al.: Energy-inspired molecular conformation optimization. In: International Conference on Learning Representations (2021)

    Google Scholar 

  23. Hu, W., et al.: Strategies for pre-training graph neural networks. arXiv preprint arXiv:1905.12265 (2019)

  24. Jing, B., Eismann, S., Suriana, P., Townshend, R.J., Dror, R.: Learning from protein structure with geometric vector perceptrons. arXiv preprint arXiv:2009.01411 (2020)

  25. Kinnings, S.L., Liu, N., Buchmeier, N., Tonge, P.J., Xie, L., Bourne, P.E.: Drug discovery using chemical systems biology: repositioning the safe medicine comtan to treat multi-drug and extensively drug resistant tuberculosis. PLoS Comput. Biol. 5(7), e1000423 (2009)

    Article  Google Scholar 

  26. Konc, J., Janežič, D.: Probis algorithm for detection of structurally similar protein binding sites by local structural alignment. Bioinformatics 26(9), 1160–1168 (2010)

    Article  Google Scholar 

  27. Krotzky, T., Grunwald, C., Egerland, U., Klebe, G.: Large-scale mining for similar protein binding pockets: with RAPMAD retrieval on the fly becomes real. J. Chem. Inform. Model. 55(1), 165–179 (2015)

    Article  Google Scholar 

  28. Landrum, G.: Rdkit: Open-source cheminformatics software (2016). https://github.com/rdkit/rdkit/releases/tag/Release_2016_09_4

  29. Li, P., et al.: An effective self-supervised framework for learning expressive molecular global representations to drug discovery. Brief. Bioinform. 22(6), bbab109 (2021)

    Google Scholar 

  30. Liu, S., Demirel, M.F., Liang, Y.: N-gram graph: simple unsupervised representation for graphs, with applications to molecules. Adv. Neural Inform. Process. Syst. 32 (2019)

    Google Scholar 

  31. Liu, S., Wang, H., Liu, W., Lasenby, J., Guo, H., Tang, J.: Pre-training molecular graph representation with 3D geometry-rethinking self-supervised learning on structured data

    Google Scholar 

  32. Liu, S., Wang, H., Liu, W., Lasenby, J., Guo, H., Tang, J.: Pre-training molecular graph representation with 3D geometry. arXiv preprint arXiv:2110.07728 (2021)

  33. Lu, A.X., Zhang, H., Ghassemi, M., Moses, A.: Self-supervised contrastive learning of protein representations by mutual information maximization. BioRxiv (2020)

    Google Scholar 

  34. Mysinger, M.M., Carchia, M., Irwin, J.J., Shoichet, B.K.: Directory of useful decoys, enhanced (dud-e): better ligands and decoys for better benchmarking. J. Med. Chem. 55(14), 6582–6594 (2012)

    Article  Google Scholar 

  35. Nguyen, T., Le, H., Quinn, T.P., Nguyen, T., Le, T.D., Venkatesh, S.: Graphdta: predicting drug-target binding affinity with graph neural networks. Bioinformatics 37(8), 1140–1147 (2021)

    Article  Google Scholar 

  36. Oord, A.v.d., Li, Y., Vinyals, O.: Representation learning with contrastive predictive coding. arXiv preprint arXiv:1807.03748 (2018)

  37. Pu, L., Govindaraj, R.G., Lemoine, J.M., Wu, H.C., Brylinski, M.: Deepdrug3d: classification of ligand-binding pockets in proteins with a convolutional neural network. PLoS Comput. Biol. 15(2), e1006718 (2019)

    Article  Google Scholar 

  38. Ragoza, M., Hochuli, J., Idrobo, E., Sunseri, J., Koes, D.R.: Protein-ligand scoring with convolutional neural networks. J. Chem. Inform. Model. 57(4), 942–957 (2017)

    Article  Google Scholar 

  39. Robinson, J., Chuang, C.Y., Sra, S., Jegelka, S.: Contrastive learning with hard negative samples. arXiv preprint arXiv:2010.04592 (2020)

  40. Rong, Y.: Self-supervised graph transformer on large-scale molecular data. Adv. Neural Inform. Process. Syst. 33, 12559–12571 (2020)

    Google Scholar 

  41. Satorras, V.G., Hoogeboom, E., Welling, M.: E (n) equivariant graph neural networks. In: International Conference on Machine Learning, pp. 9323–9332. PMLR (2021)

    Google Scholar 

  42. Schalon, C., Surgand, J.S., Kellenberger, E., Rognan, D.: A simple and fuzzy method to align and compare druggable ligand-binding sites. Proteins: Struct., Funct., Bioinform. 71(4), 1755–1778 (2008)

    Google Scholar 

  43. Schmitt, S., Kuhn, D., Klebe, G.: A new method to detect related function among proteins independent of sequence and fold homology. J. Mol. Biol. 323(2), 387–406 (2002)

    Article  Google Scholar 

  44. Shrivastava, A.D., Kell, D.B.: Fragnet, a contrastive learning-based transformer model for clustering, interpreting, visualizing, and navigating chemical space. Molecules 26(7), 2065 (2021)

    Article  Google Scholar 

  45. Shulman-Peleg, A., Nussinov, R., Wolfson, H.J.: Siteengines: recognition and comparison of binding sites and protein-protein interfaces. Nucleic Acids Res. 33(suppl_2), W337–W341 (2005)

    Google Scholar 

  46. Simonovsky, M., Meyers, J., Meyers, J.: Deeplytough: learning structural comparison of protein binding sites. J. Chem. Inform. Model. 60(4), 2356–2366 (2020)

    Article  Google Scholar 

  47. Stärk, H., et al.: 3d infomax improves GNNs for molecular property prediction. arXiv preprint arXiv:2110.04126 (2021)

  48. Sturmfels, P., Vig, J., Madani, A., Rajani, N.F.: Profile prediction: An alignment-based pre-training task for protein sequence models. arXiv preprint arXiv:2012.00195 (2020)

  49. Sun, M., Xing, J., Wang, H., Chen, B., Zhou, J.: Mocl: data-driven molecular fingerprint via knowledge-aware contrastive learning from molecular graph. In: Proceedings of the 27th ACM SIGKDD Conference on Knowledge Discovery & Data Mining, pp. 3585–3594 (2021)

    Google Scholar 

  50. Thomas, N., Smidt, T., Kearnes, S., Yang, L., Li, L., Kohlhoff, K., Riley, P.: Tensor field networks: Rotation-and translation-equivariant neural networks for 3D point clouds. arXiv preprint arXiv:1802.08219 (2018)

  51. Torng, W., Altman, R.B.: Graph convolutional neural networks for predicting drug-target interactions. J. Chem. Inform. Model. 59(10), 4131–4149 (2019)

    Article  Google Scholar 

  52. Tosco, P., Stiefl, N., Landrum, G.: Bringing the MMFF force field to the RDKit: implementation and validation. J. Cheminform. 6(1), 1–4 (2014)

    Article  Google Scholar 

  53. Vina, A.: Improving the speed and accuracy of docking with a new scoring function, efficient optimization, and multithreading trott, oleg; olson, arthur j. J. Comput. Chem 31(2), 455–461 (2010)

    Google Scholar 

  54. Wang, R., Fang, X., Lu, Y., Yang, C.Y., Wang, S.: The pdbbind database: methodologies and updates. J. Med. Chem. 48(12), 4111–4119 (2005)

    Article  Google Scholar 

  55. Wang, S., Shan, P., Zhao, Y., Zuo, L.: Gandti: a multi-task neural network for drug-target interaction prediction. Comput. Biol. Chem. 92, 107476 (2021)

    Article  Google Scholar 

  56. Wang, Y., Wang, J., Cao, Z., Barati Farimani, A.: Molecular contrastive learning of representations via graph neural networks. Nature Mach. Intell. 4(3), 279–287 (2022)

    Article  Google Scholar 

  57. Weber, A., et al.: Unexpected nanomolar inhibition of carbonic anhydrase by cox-2-selective celecoxib: new pharmacological opportunities due to related binding site recognition. J. Med. Chem. 47(3), 550–557 (2004)

    Article  Google Scholar 

  58. Weill, N., Rognan, D.: Alignment-free ultra-high-throughput comparison of druggable protein- ligand binding sites. J. Chem. Inform. Model. 50(1), 123–135 (2010)

    Article  Google Scholar 

  59. Willmann, D., et al.: Impairment of prostate cancer cell growth by a selective and reversible lysine-specific demethylase 1 inhibitor. Int. J. Cancer 131(11), 2704–2709 (2012)

    Article  Google Scholar 

  60. Wood, D.J., Vlieg, J.d., Wagener, M., Ritschel, T.: Pharmacophore fingerprint-based approach to binding site subpocket similarity and its application to bioisostere replacement. J. Chem. Inform. Model. 52(8), 2031–2043 (2012)

    Google Scholar 

  61. Wu, Z., et al.: Moleculenet: a benchmark for molecular machine learning. Chem. Sci. 9(2), 513–530 (2018)

    Article  Google Scholar 

  62. Xie, L., Bourne, P.E.: Detecting evolutionary relationships across existing fold space, using sequence order-independent profile-profile alignments. Proc. National Acad. Sci. 105(14), 5441–5446 (2008)

    Article  Google Scholar 

  63. Xiong, Z., et al.: Pushing the boundaries of molecular representation for drug discovery with the graph attention mechanism. J. Med. Chem. 63(16), 8749–8760 (2019)

    Article  Google Scholar 

  64. Yang, J., Roy, A., Zhang, Y.: Biolip: a semi-manually curated database for biologically relevant ligand-protein interactions. Nucleic Acids Res. 41(D1), D1096–D1103 (2012)

    Article  Google Scholar 

  65. Yang, K., et al.: Analyzing learned molecular representations for property prediction. J. Chem. Inform. Model. 59(8), 3370–3388 (2019)

    Article  Google Scholar 

  66. Yang, K.K., Lu, A.X., Fusi, N.: Convolutions are competitive with transformers for protein sequence pretraining. In: ICLR2022 Machine Learning for Drug Discovery (2022)

    Google Scholar 

  67. Yang, Y., et al.: Computational discovery and experimental verification of tyrosine kinase inhibitor pazopanib for the reversal of memory and cognitive deficits in rat model neurodegeneration. Chem. Sci. 6(5), 2812–2821 (2015)

    Article  Google Scholar 

  68. Yazdani-Jahromi, M., et al.: Attentionsitedti: an interpretable graph-based model for drug-target interaction prediction using NLP sentence-level relation classification. Brief. Bioinform. 23(4), bbac272 (2022)

    Google Scholar 

  69. Yeturu, K., Chandra, N.: Pocketmatch: a new algorithm to compare binding sites in protein structures. BMC Bioinform. 9(1), 1–17 (2008)

    Article  Google Scholar 

  70. Zhang, N., et al.: Ontoprotein: Protein pretraining with gene ontology embedding. arXiv preprint arXiv:2201.11147 (2022)

  71. Zhang, S., Hu, Z., Subramonian, A., Sun, Y.: Motif-driven contrastive learning of graph representations. arXiv preprint arXiv:2012.12533 (2020)

  72. Zhang, Y., Skolnick, J.: Tm-align: a protein structure alignment algorithm based on the TM-score. Nucleic Acids Res. 33(7), 2302–2309 (2005)

    Article  Google Scholar 

  73. Zhang, Z., Liu, Q., Wang, H., Lu, C., Lee, C.K.: Motif-based graph self-supervised learning for molecular property prediction. Adv. Neural Inform. Process. Syst. 34, 15870–15882 (2021)

    Google Scholar 

  74. Zhang, Z., et al.: Protein representation learning by geometric structure pretraining. arXiv preprint arXiv:2203.06125 (2022)

  75. Zheng, S., Li, Y., Chen, S., Xu, J., Yang, Y.: Predicting drug-protein interaction using quasi-visual question answering system. Nature Mach. Intell. 2(2), 134–140 (2020)

    Article  Google Scholar 

  76. Zhou, G., et al.: Uni-mol: A universal 3d molecular representation learning framework (2022)

    Google Scholar 

Download references

Acknowledgements

We thank the open-sourced codes of previous studies. This work was supported by the National Key R &D Program of China (Project 2022ZD0115100), the National Natural Science Foundation of China (Project U21A20427), the Research Center for Industries of the Future (Project WU2022C043).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Stan Z. Li .

Editor information

Editors and Affiliations

Ethics declarations

Ethical Statement

Our submission does not involve any ethical issues, including but not limited to privacy, security, etc.

Rights and permissions

Reprints and permissions

Copyright information

© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Gao, Z., Tan, C., Xia, J., Li, S.Z. (2023). Co-supervised Pre-training of Pocket and Ligand. In: Koutra, D., Plant, C., Gomez Rodriguez, M., Baralis, E., Bonchi, F. (eds) Machine Learning and Knowledge Discovery in Databases: Research Track. ECML PKDD 2023. Lecture Notes in Computer Science(), vol 14169. Springer, Cham. https://doi.org/10.1007/978-3-031-43412-9_24

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-43412-9_24

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-43411-2

  • Online ISBN: 978-3-031-43412-9

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics