Co-supervised Pre-training of Pocket and Ligand

Gao, Zhangyang; Tan, Cheng; Xia, Jun; Li, Stan Z.

doi:10.1007/978-3-031-43412-9_24

Zhangyang Gao¹²,
Cheng Tan¹²,
Jun Xia¹² &
…
Stan Z. Li¹²

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 14169))

Included in the following conference series:

Joint European Conference on Machine Learning and Knowledge Discovery in Databases

1333 Accesses

Abstract

Can we inject the pocket-ligand complementarity knowledge into the pre-trained model and jointly learn their chemical space? Pre-training molecules and proteins have attracted considerable attention in recent years, while most of these approaches focus on learning one of the chemical spaces and lack the consideration of their complementarity. We propose a co-supervised pre-training (CoSP) framework to learn 3D pocket and ligand representations simultaneously. We use a gated geometric message passing layer to model 3D pockets and ligands, where each node’s chemical features, geometric position, and direction are considered. To learn meaningful biological embeddings, we inject the pocket-ligand complementarity into the pre-training model via ChemInfoNCE loss, cooperating with a chemical similarity-enhanced negative sampling strategy to improve the representation learning. Through extensive experiments, we conclude that CoSP can achieve competitive results in pocket matching, molecule property prediction, and virtual screening.

Z. Gao and C. Tan—Equal Contribution.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Altalib, M.K., Salim, N.: Similarity-based virtual screen using enhanced siamese deep learning methods. ACS omega 7(6), 4769–4786 (2022)
Article Google Scholar
Ballester, P.J., Mitchell, J.B.: A machine learning approach to predicting protein-ligand binding affinity with applications to molecular docking. Bioinformatics 26(9), 1169–1175 (2010)
Article Google Scholar
Batista, J., Hawkins, P.C., Tolbert, R., Geballe, M.T.: Sitehopper-a unique tool for binding site comparison. J. Cheminform. 6(1), 1–1 (2014)
Google Scholar
Batzner, S., et al.: E (3)-equivariant graph neural networks for data-efficient and accurate interatomic potentials. Nature Commun. 13(1), 1–11 (2022)
Article Google Scholar
Boström, J., Hogner, A., Schmitt, S.: Do structurally similar ligands bind in a similar fashion? J. Med. Chem. 49(23), 6716–6725 (2006)
Article Google Scholar
Brandstetter, J., Hesselink, R., van der Pol, E., Bekkers, E., Welling, M.: Geometric and physical quantities improve e (3) equivariant message passing. arXiv preprint arXiv:2110.02905 (2021)
Chaffey, N.: Alberts, B., Johnson, A., Lewis, J., Raff, M., Roberts, K., Walter, P. Molecular Biology of the Cell. 4th edn. (2003)
Google Scholar
Chartier, M., Najmanovich, R.: Detection of binding site molecular interaction field similarities. J. Chem. Inform. Model. 55(8), 1600–1615 (2015)
Article Google Scholar
Chuang, C.Y., Robinson, J., Lin, Y.C., Torralba, A., Jegelka, S.: Debiased contrastive learning. Adv. Neural Inform. Process. Syst. 33, 8765–8775 (2020)
Google Scholar
Cohen, T., Welling, M.: Group equivariant convolutional networks. In: International conference on machine learning, pp. 2990–2999. PMLR (2016)
Google Scholar
Dankwah, K.O., Mohl, J.E., Begum, K., Leung, M.Y.: Understanding the binding of the same ligand to gpcrs of different families. In: 2021 IEEE International Conference on Bioinformatics and Biomedicine (BIBM), pp. 2494–2501 (2021). https://doi.org/10.1109/BIBM52615.2021.9669761
Desaphy, J., Azdimousa, K., Kellenberger, E., Rognan, D.: Comparison and druggability prediction of protein-ligand binding sites from pharmacophore-annotated cavity shapes (2012)
Google Scholar
Desaphy, J., Raimbaud, E., Ducrot, P., Rognan, D.: Encoding protein-ligand interaction patterns in fingerprints and graphs. J. Chem. Inform. Model. 53(3), 623–637 (2013)
Article Google Scholar
Durrant, J.D., McCammon, J.A.: Nnscore: a neural-network-based scoring function for the characterization of protein- ligand complexes. J. Chem. Inform. Model. 50(10), 1865–1871 (2010)
Article Google Scholar
Ehrt, C., Brinkjost, T., Koch, O.: A benchmark driven guide to binding site comparison: an exhaustive evaluation using tailor-made data sets (prospeccts). PLoS Comput. Biol. 14(11), e1006483 (2018)
Article Google Scholar
Fang, X.: Geometry-enhanced molecular representation learning for property prediction. Nature Mach. Intell. 4(2), 127–134 (2022)
Article Google Scholar
Fang, Y., Yang, H., Zhuang, X., Shao, X., Fan, X., Chen, H.: Knowledge-aware contrastive molecular graph learning. arXiv preprint arXiv:2103.13047 (2021)
Francoeur, P.G.: Three-dimensional convolutional neural networks and a cross-docked data set for structure-based drug design. J. Chem. Inform. Model. 60(9), 4200–4215 (2020)
Article Google Scholar
Fuchs, F., Worrall, D., Fischer, V., Welling, M.: Se (3)-transformers: 3d roto-translation equivariant attention networks. Adv. Neural Inform. Process. Syst. 33, 1970–1981 (2020)
Google Scholar
Ganea, O.E., et al.: Independent se (3)-equivariant models for end-to-end rigid protein docking. arXiv preprint arXiv:2111.07786 (2021)
Gao, Z., Tan, C., Li, S., et al.: Alphadesign: A graph protein design method and benchmark on alphafolddb. arXiv preprint arXiv:2202.01079 (2022)
Guan, J., Qian, W.W., Ma, W.Y., Ma, J., Peng, J., et al.: Energy-inspired molecular conformation optimization. In: International Conference on Learning Representations (2021)
Google Scholar
Hu, W., et al.: Strategies for pre-training graph neural networks. arXiv preprint arXiv:1905.12265 (2019)
Jing, B., Eismann, S., Suriana, P., Townshend, R.J., Dror, R.: Learning from protein structure with geometric vector perceptrons. arXiv preprint arXiv:2009.01411 (2020)
Kinnings, S.L., Liu, N., Buchmeier, N., Tonge, P.J., Xie, L., Bourne, P.E.: Drug discovery using chemical systems biology: repositioning the safe medicine comtan to treat multi-drug and extensively drug resistant tuberculosis. PLoS Comput. Biol. 5(7), e1000423 (2009)
Article Google Scholar
Konc, J., Janežič, D.: Probis algorithm for detection of structurally similar protein binding sites by local structural alignment. Bioinformatics 26(9), 1160–1168 (2010)
Article Google Scholar
Krotzky, T., Grunwald, C., Egerland, U., Klebe, G.: Large-scale mining for similar protein binding pockets: with RAPMAD retrieval on the fly becomes real. J. Chem. Inform. Model. 55(1), 165–179 (2015)
Article Google Scholar
Landrum, G.: Rdkit: Open-source cheminformatics software (2016). https://github.com/rdkit/rdkit/releases/tag/Release_2016_09_4
Li, P., et al.: An effective self-supervised framework for learning expressive molecular global representations to drug discovery. Brief. Bioinform. 22(6), bbab109 (2021)
Google Scholar
Liu, S., Demirel, M.F., Liang, Y.: N-gram graph: simple unsupervised representation for graphs, with applications to molecules. Adv. Neural Inform. Process. Syst. 32 (2019)
Google Scholar
Liu, S., Wang, H., Liu, W., Lasenby, J., Guo, H., Tang, J.: Pre-training molecular graph representation with 3D geometry-rethinking self-supervised learning on structured data
Google Scholar
Liu, S., Wang, H., Liu, W., Lasenby, J., Guo, H., Tang, J.: Pre-training molecular graph representation with 3D geometry. arXiv preprint arXiv:2110.07728 (2021)
Lu, A.X., Zhang, H., Ghassemi, M., Moses, A.: Self-supervised contrastive learning of protein representations by mutual information maximization. BioRxiv (2020)
Google Scholar
Mysinger, M.M., Carchia, M., Irwin, J.J., Shoichet, B.K.: Directory of useful decoys, enhanced (dud-e): better ligands and decoys for better benchmarking. J. Med. Chem. 55(14), 6582–6594 (2012)
Article Google Scholar
Nguyen, T., Le, H., Quinn, T.P., Nguyen, T., Le, T.D., Venkatesh, S.: Graphdta: predicting drug-target binding affinity with graph neural networks. Bioinformatics 37(8), 1140–1147 (2021)
Article Google Scholar
Oord, A.v.d., Li, Y., Vinyals, O.: Representation learning with contrastive predictive coding. arXiv preprint arXiv:1807.03748 (2018)
Pu, L., Govindaraj, R.G., Lemoine, J.M., Wu, H.C., Brylinski, M.: Deepdrug3d: classification of ligand-binding pockets in proteins with a convolutional neural network. PLoS Comput. Biol. 15(2), e1006718 (2019)
Article Google Scholar
Ragoza, M., Hochuli, J., Idrobo, E., Sunseri, J., Koes, D.R.: Protein-ligand scoring with convolutional neural networks. J. Chem. Inform. Model. 57(4), 942–957 (2017)
Article Google Scholar
Robinson, J., Chuang, C.Y., Sra, S., Jegelka, S.: Contrastive learning with hard negative samples. arXiv preprint arXiv:2010.04592 (2020)
Rong, Y.: Self-supervised graph transformer on large-scale molecular data. Adv. Neural Inform. Process. Syst. 33, 12559–12571 (2020)
Google Scholar
Satorras, V.G., Hoogeboom, E., Welling, M.: E (n) equivariant graph neural networks. In: International Conference on Machine Learning, pp. 9323–9332. PMLR (2021)
Google Scholar
Schalon, C., Surgand, J.S., Kellenberger, E., Rognan, D.: A simple and fuzzy method to align and compare druggable ligand-binding sites. Proteins: Struct., Funct., Bioinform. 71(4), 1755–1778 (2008)
Google Scholar
Schmitt, S., Kuhn, D., Klebe, G.: A new method to detect related function among proteins independent of sequence and fold homology. J. Mol. Biol. 323(2), 387–406 (2002)
Article Google Scholar
Shrivastava, A.D., Kell, D.B.: Fragnet, a contrastive learning-based transformer model for clustering, interpreting, visualizing, and navigating chemical space. Molecules 26(7), 2065 (2021)
Article Google Scholar
Shulman-Peleg, A., Nussinov, R., Wolfson, H.J.: Siteengines: recognition and comparison of binding sites and protein-protein interfaces. Nucleic Acids Res. 33(suppl_2), W337–W341 (2005)
Google Scholar
Simonovsky, M., Meyers, J., Meyers, J.: Deeplytough: learning structural comparison of protein binding sites. J. Chem. Inform. Model. 60(4), 2356–2366 (2020)
Article Google Scholar
Stärk, H., et al.: 3d infomax improves GNNs for molecular property prediction. arXiv preprint arXiv:2110.04126 (2021)
Sturmfels, P., Vig, J., Madani, A., Rajani, N.F.: Profile prediction: An alignment-based pre-training task for protein sequence models. arXiv preprint arXiv:2012.00195 (2020)
Sun, M., Xing, J., Wang, H., Chen, B., Zhou, J.: Mocl: data-driven molecular fingerprint via knowledge-aware contrastive learning from molecular graph. In: Proceedings of the 27th ACM SIGKDD Conference on Knowledge Discovery & Data Mining, pp. 3585–3594 (2021)
Google Scholar
Thomas, N., Smidt, T., Kearnes, S., Yang, L., Li, L., Kohlhoff, K., Riley, P.: Tensor field networks: Rotation-and translation-equivariant neural networks for 3D point clouds. arXiv preprint arXiv:1802.08219 (2018)
Torng, W., Altman, R.B.: Graph convolutional neural networks for predicting drug-target interactions. J. Chem. Inform. Model. 59(10), 4131–4149 (2019)
Article Google Scholar
Tosco, P., Stiefl, N., Landrum, G.: Bringing the MMFF force field to the RDKit: implementation and validation. J. Cheminform. 6(1), 1–4 (2014)
Article Google Scholar
Vina, A.: Improving the speed and accuracy of docking with a new scoring function, efficient optimization, and multithreading trott, oleg; olson, arthur j. J. Comput. Chem 31(2), 455–461 (2010)
Google Scholar
Wang, R., Fang, X., Lu, Y., Yang, C.Y., Wang, S.: The pdbbind database: methodologies and updates. J. Med. Chem. 48(12), 4111–4119 (2005)
Article Google Scholar
Wang, S., Shan, P., Zhao, Y., Zuo, L.: Gandti: a multi-task neural network for drug-target interaction prediction. Comput. Biol. Chem. 92, 107476 (2021)
Article Google Scholar
Wang, Y., Wang, J., Cao, Z., Barati Farimani, A.: Molecular contrastive learning of representations via graph neural networks. Nature Mach. Intell. 4(3), 279–287 (2022)
Article Google Scholar
Weber, A., et al.: Unexpected nanomolar inhibition of carbonic anhydrase by cox-2-selective celecoxib: new pharmacological opportunities due to related binding site recognition. J. Med. Chem. 47(3), 550–557 (2004)
Article Google Scholar
Weill, N., Rognan, D.: Alignment-free ultra-high-throughput comparison of druggable protein- ligand binding sites. J. Chem. Inform. Model. 50(1), 123–135 (2010)
Article Google Scholar
Willmann, D., et al.: Impairment of prostate cancer cell growth by a selective and reversible lysine-specific demethylase 1 inhibitor. Int. J. Cancer 131(11), 2704–2709 (2012)
Article Google Scholar
Wood, D.J., Vlieg, J.d., Wagener, M., Ritschel, T.: Pharmacophore fingerprint-based approach to binding site subpocket similarity and its application to bioisostere replacement. J. Chem. Inform. Model. 52(8), 2031–2043 (2012)
Google Scholar
Wu, Z., et al.: Moleculenet: a benchmark for molecular machine learning. Chem. Sci. 9(2), 513–530 (2018)
Article Google Scholar
Xie, L., Bourne, P.E.: Detecting evolutionary relationships across existing fold space, using sequence order-independent profile-profile alignments. Proc. National Acad. Sci. 105(14), 5441–5446 (2008)
Article Google Scholar
Xiong, Z., et al.: Pushing the boundaries of molecular representation for drug discovery with the graph attention mechanism. J. Med. Chem. 63(16), 8749–8760 (2019)
Article Google Scholar
Yang, J., Roy, A., Zhang, Y.: Biolip: a semi-manually curated database for biologically relevant ligand-protein interactions. Nucleic Acids Res. 41(D1), D1096–D1103 (2012)
Article Google Scholar
Yang, K., et al.: Analyzing learned molecular representations for property prediction. J. Chem. Inform. Model. 59(8), 3370–3388 (2019)
Article Google Scholar
Yang, K.K., Lu, A.X., Fusi, N.: Convolutions are competitive with transformers for protein sequence pretraining. In: ICLR2022 Machine Learning for Drug Discovery (2022)
Google Scholar
Yang, Y., et al.: Computational discovery and experimental verification of tyrosine kinase inhibitor pazopanib for the reversal of memory and cognitive deficits in rat model neurodegeneration. Chem. Sci. 6(5), 2812–2821 (2015)
Article Google Scholar
Yazdani-Jahromi, M., et al.: Attentionsitedti: an interpretable graph-based model for drug-target interaction prediction using NLP sentence-level relation classification. Brief. Bioinform. 23(4), bbac272 (2022)
Google Scholar
Yeturu, K., Chandra, N.: Pocketmatch: a new algorithm to compare binding sites in protein structures. BMC Bioinform. 9(1), 1–17 (2008)
Article Google Scholar
Zhang, N., et al.: Ontoprotein: Protein pretraining with gene ontology embedding. arXiv preprint arXiv:2201.11147 (2022)
Zhang, S., Hu, Z., Subramonian, A., Sun, Y.: Motif-driven contrastive learning of graph representations. arXiv preprint arXiv:2012.12533 (2020)
Zhang, Y., Skolnick, J.: Tm-align: a protein structure alignment algorithm based on the TM-score. Nucleic Acids Res. 33(7), 2302–2309 (2005)
Article Google Scholar
Zhang, Z., Liu, Q., Wang, H., Lu, C., Lee, C.K.: Motif-based graph self-supervised learning for molecular property prediction. Adv. Neural Inform. Process. Syst. 34, 15870–15882 (2021)
Google Scholar
Zhang, Z., et al.: Protein representation learning by geometric structure pretraining. arXiv preprint arXiv:2203.06125 (2022)
Zheng, S., Li, Y., Chen, S., Xu, J., Yang, Y.: Predicting drug-protein interaction using quasi-visual question answering system. Nature Mach. Intell. 2(2), 134–140 (2020)
Article Google Scholar
Zhou, G., et al.: Uni-mol: A universal 3d molecular representation learning framework (2022)
Google Scholar

Download references

Acknowledgements

We thank the open-sourced codes of previous studies. This work was supported by the National Key R &D Program of China (Project 2022ZD0115100), the National Natural Science Foundation of China (Project U21A20427), the Research Center for Industries of the Future (Project WU2022C043).

Author information

Authors and Affiliations

Zhejiang University AI Lab, Research Center for Industries of the Future, Westlake University, Hangzhou, China
Zhangyang Gao, Cheng Tan, Jun Xia & Stan Z. Li

Authors

Zhangyang Gao
View author publications
You can also search for this author in PubMed Google Scholar
Cheng Tan
View author publications
You can also search for this author in PubMed Google Scholar
Jun Xia
View author publications
You can also search for this author in PubMed Google Scholar
Stan Z. Li
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Stan Z. Li .

Editor information

Editors and Affiliations

University of Michigan, Ann Arbor, MI, USA
Danai Koutra
University of Vienna, Vienna, Austria
Claudia Plant
Max Planck Institute for Software Systems, Kaiserslautern, Germany
Manuel Gomez Rodriguez
Politecnico di Torino, Turin, Italy
Elena Baralis
CENTAI, Turin, Italy
Francesco Bonchi

Ethics declarations

Ethical Statement

Our submission does not involve any ethical issues, including but not limited to privacy, security, etc.

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Gao, Z., Tan, C., Xia, J., Li, S.Z. (2023). Co-supervised Pre-training of Pocket and Ligand. In: Koutra, D., Plant, C., Gomez Rodriguez, M., Baralis, E., Bonchi, F. (eds) Machine Learning and Knowledge Discovery in Databases: Research Track. ECML PKDD 2023. Lecture Notes in Computer Science(), vol 14169. Springer, Cham. https://doi.org/10.1007/978-3-031-43412-9_24

Download citation

DOI: https://doi.org/10.1007/978-3-031-43412-9_24
Published: 17 September 2023
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-43411-2
Online ISBN: 978-3-031-43412-9
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Societies and partnerships

the ECML PKDD community (opens in a new tab)

Co-supervised Pre-training of Pocket and Ligand