Calibrated geometric deep learning improves kinase–drug binding predictions

Luo, Yunan; Liu, Yang; Peng, Jian

doi:10.1038/s42256-023-00751-0

Article
Published: 06 November 2023

Calibrated geometric deep learning improves kinase–drug binding predictions

Nature Machine Intelligence volume 5, pages 1390–1401 (2023)Cite this article

3233 Accesses
2 Citations
16 Altmetric
Metrics details

Subjects

Abstract

Protein kinases regulate various cellular functions and hold significant pharmacological promise in cancer and other diseases. Although kinase inhibitors are one of the largest groups of approved drugs, much of the human kinome remains unexplored but potentially druggable. Computational approaches, such as machine learning, offer efficient solutions for exploring kinase–compound interactions and uncovering novel binding activities. Despite the increasing availability of three-dimensional (3D) protein and compound structures, existing methods predominantly focus on exploiting local features from one-dimensional protein sequences and two-dimensional molecular graphs to predict binding affinities, overlooking the 3D nature of the binding process. Here we present KDBNet, a deep learning algorithm that incorporates 3D protein and molecule structure data to predict binding affinities. KDBNet uses graph neural networks to learn structure representations of protein binding pockets and drug molecules, capturing the geometric and spatial characteristics of binding activity. In addition, we introduce an algorithm to quantify and calibrate the uncertainties of KDBNet’s predictions, enhancing its utility in model-guided discovery in chemical or protein space. Experiments demonstrated that KDBNet outperforms existing deep learning models in predicting kinase–drug binding affinities. The uncertainties estimated by KDBNet are informative and well-calibrated with respect to prediction errors. When integrated with a Bayesian optimization framework, KDBNet enables data-efficient active learning and accelerates the exploration and exploitation of diverse high-binding kinase–drug pairs.

Access through your institution

Buy or subscribe

This is a preview of subscription content, access via your institution

Access options

Access through your institution

Buy this article

Purchase on Springer Link
Instant access to full article PDF

Buy now

Prices may be subject to local taxes which are calculated during checkout

**Fig. 2: KDBNet achieves accurate prediction of kinase–drug binding affinity.**

**Fig. 3: KDBNet provides accurate and calibrated uncertainty estimation.**

**Fig. 4: Leveraging uncertainty for active learning, exploration and exploitation.**

Improving the generalizability of protein-ligand binding predictions with AI-Bind

Article Open access 08 April 2023

AiKPro: deep learning model for kinome-wide bioactivity profiling using structure-based sequence alignments and molecular 3D conformer ensemble descriptors

Article Open access 24 June 2023

Decoding the protein–ligand interactions using parallel graph neural networks

Article Open access 10 May 2022

Data availability

The two kinase–drug binding-affinity datasets, Davis²⁷ and KIBA²⁸, were curated by and available in the Therapeutics Data Commons benchmark⁷³. The PDBbind dataset (v.2020) was downloaded from http://www.pdbbind.org.cn/. The PDB codes of representative structures of kinases were obtained from the Kincore database⁵⁰ (http://dunbrack.fccc.edu/kincore/home). The binding pocket structure of each kinase was downloaded from the KLIFS database²⁰ (https://klifs.net/). Full AA sequences of kinases were obtained from UniProt⁵⁶ (https://www.uniprot.org/). The 3D molecular structures were downloaded from PubChem⁵⁷ (https://pubchem.ncbi.nlm.nih.gov/). Our processed version of the binding-affinity datasets and the identifier list of kinases and drugs are available on our GitHub repository (https://github.com/luoyunan/KDBNet).

Code availability

The source code of KDBNet is available at https://github.com/luoyunan/KDBNet and has been deposited to Zenodo⁷⁴ at https://doi.org/10.5281/zenodo.7959829. KDBNet was developed using Python v.3.9, PyTorch v.1.16, PyTorch Geometric v.2.2, RDKit (v.2022.03.2), NumPy v.1.23.4 and SciPy v.1.9.3.

References

Oprea, T. I. et al. Unexplored therapeutic opportunities in the human genome. Nat. Rev. Drug Discov. 17, 317–332 (2018).
Article Google Scholar
Attwood, M. M., Fabbro, D., Sokolov, A. V., Knapp, S. & Schiöth, H. B. Trends in kinase drug discovery: targets, indications and inhibitor design. Nat. Rev. Drug Discov. 20, 839–861 (2021).
Cohen, P., Cross, D. & Jänne, P. A. Kinase drug discovery 20 years after imatinib: progress and future directions. Nat. Rev. Drug Discov. 20, 551–569 (2021).
Hanson, S. M. et al. What makes a kinase promiscuous for inhibitors? Cell Chem. Biol. 26, 390–399 (2019).
Article Google Scholar
Arrowsmith, C. H. et al. The promise and peril of chemical probes. Nat. Chem. Biol. 11, 536–541 (2015).
Article Google Scholar
Cichońska, A. et al. Crowdsourced mapping of unexplored target space of kinase inhibitors. Nat. Commun.12, 3307 (2021).
Article Google Scholar
Bleakley, K. & Yamanishi, Y. Supervised prediction of drug–target interactions using bipartite local models. Bioinformatics 25, 2397–2403 (2009).
Article Google Scholar
Cobanoglu, M. C., Liu, C., Hu, F., Oltvai, Z. N. & Bahar, I. Predicting drug–target interactions using probabilistic matrix factorization. J. Chem. Inf. Model. 53, 3399–3409 (2013).
Article Google Scholar
Zheng, X., Ding, H., Mamitsuka, H. & Zhu, S. Collaborative matrix factorization with multiple similarities for predicting drug-target interactions. In Proc. 19th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining (eds Ghani, R. et al.) 1025–1033 (ACM, 2013).
Cichonska, A. et al. Computational-experimental approach to drug-target interaction mapping: a case study on kinase inhibitors. PLoS Comput. Biol. 13, e1005678 (2017).
Article Google Scholar
Luo, Y. et al. A network integration approach for drug-target interaction prediction and computational drug repositioning from heterogeneous information. Nat. Commun. 8, 573 (2017).
Article Google Scholar
Öztürk, H., Özgür, A. & Ozkirimli, E. Deepdta: deep drug–target binding affinity prediction. Bioinformatics 34, i821–i829 (2018).
Article Google Scholar
Karimi, M., Wu, D., Wang, Z. & Shen, Y. Deepaffinity: interpretable deep learning of compound–protein affinity through unified recurrent and convolutional neural networks. Bioinformatics 35, 3329–3338 (2019).
Article Google Scholar
Tsubaki, M., Tomii, K. & Sese, J. Compound–protein interaction prediction with end-to-end learning of neural networks for graphs and sequences. Bioinformatics 35, 309–318 (2019).
Article Google Scholar
Jiang, M. et al. Drug–target affinity prediction using graph neural network and contact maps. RSC Adv. 10, 20701–20712 (2020).
Article Google Scholar
Nguyen, T. et al. Graphdta: predicting drug–target binding affinity with graph neural networks. Bioinformatics 37, 1140–1147 (2021).
Article Google Scholar
Hie, B., Bryson, B. D. & Berger, B. Leveraging uncertainty in machine learning accelerates biological discovery and design. Cell Syst. 11, 461–477 (2020).
Article Google Scholar
Rose, P. W. et al. The RCSB protein data bank: integrative view of protein, gene and 3D structural information. Nucleic Acids Res. 45, gkw1000 (2016).
Van Linden, O. P., Kooistra, A. J., Leurs, R., De Esch, I. J. & De Graaf, C. KLIFS: a knowledge-based structural database to navigate kinase–ligand interaction space. J. Med. Chem. 57, 249–277 (2014).
Article Google Scholar
Kanev, G. K., de Graaf, C., Westerman, B. A., de Esch, I. J. & Kooistra, A. J. KLIFS: an overhaul after the first 5 years of supporting kinase research. Nucleic Acids Res. 49, D562–D569 (2021).
Article Google Scholar
Jumper, J. et al. Highly accurate protein structure prediction with alphafold. Nature 596, 583–589 (2021).
Article Google Scholar
Jing, B., Eismann, S., Suriana, P., Townshend, R. J. & Dror, R. Learning from protein structure with geometric vector perceptrons. Paper presented at the International Conference on Learning Representations (ICLR). (eds Oh, A., Murray, N. & Titov, I.) (2021); https://openreview.net/forum?id=1YLJDvSx6J4
Gainza, P. et al. Deciphering interaction fingerprints from protein molecular surfaces using geometric deep learning. Nat. Methods 17, 184–192 (2020).
Article Google Scholar
Lakshminarayanan, B., Pritzel, A. & Blundell, C. Simple and scalable predictive uncertainty estimation using deep ensembles. In Advances in Neural Information Processing Systems (NeurIPS) Vol 30 (eds Guyon, I. et al.) 6402–6413 (Curran Associates, Inc., 2017).
Zeng, H. & Gifford, D. K. Quantification of uncertainty in peptide-mhc binding prediction improves high-affinity peptide selection for therapeutic design. Cell Syst. 9, 159–166 (2019).
Article Google Scholar
Soleimany, A. P. et al. Evidential deep learning for guided molecular property prediction and discovery. ACS Cent. Sci. 7, 1356–1367 (2021).
Article Google Scholar
Davis, M. I. et al. Comprehensive analysis of kinase inhibitor selectivity. Nat. Biotechnol. 29, 1046–1051 (2011).
Article Google Scholar
Tang, J. et al. Making sense of large-scale kinase inhibitor bioactivity data sets: a comparative and integrative analysis. J. Chem. Inf. Model. 54, 735–743 (2014).
Article Google Scholar
Pahikkala, T. et al. Toward more realistic drug–target interaction predictions. Brief. Bioinform. 16, 325–337 (2015).
Article Google Scholar
Goldman, S., Das, R., Yang, K. K. & Coley, C. W. Machine learning modeling of family wide enzyme-substrate specificity screens. PLoS Comput. Biol. 18, e1009853 (2022).
Google Scholar
Singh, R., Sledzieski, S., Bryson, B., Cowen, L. & Berger, B. Contrastive learning in protein language space predicts interactions between drugs and protein targets. Proc. Natl Acad. Sci. 120, e2220778120 (2023).
Article Google Scholar
Jiménez, J., Skalic, M., Martinez-Rosell, G. & De Fabritiis, G. K deep: protein–ligand absolute binding affinity prediction via 3D-convolutional neural networks. J. Chem. Inf. Model. 58, 287–296 (2018).
Article Google Scholar
Townshend, R., Bedi, R., Suriana, P. & Dror, R. End-to-end learning on 3D protein structure for interface prediction. In Adv. Neural. Inf. Process. Syst. Vol 32 (eds Wallach, H. et al.) 15616–15625 (Curran Associate, Inc., 2019).
Townshend, R. J. et al. Atom3d: tasks on molecules in three dimensions. Preprint at https://arXiv.org/2012.04035 (2020).
Li, S. et al. Structure-aware interactive graph neural networks for the prediction of protein-ligand binding affinity. In Proc. 27th ACM SIGKDD Conference on Knowledge Discovery & Data Mining (eds Zhu, F., Ooi, B. C. & Miao, C.) 975–985 (ACM, 2021).
Liu, Z. et al. PDB-wide collection of binding data: current status of the PDBbind database. Bioinformatics 31, 405–412 (2015).
Article Google Scholar
Lim, J. et al. Predicting drug–target interaction using a novel graph neural network with 3D structure-embedded graph representation. J. Chem. Inf. Model. 59, 3981–3988 (2019).
Article Google Scholar
Zheng, L., Fan, J. & Mu, Y. Onionnet: a multiple-layer intermolecular-contact-based convolutional neural network for protein–ligand binding affinity prediction. ACS Omega 4, 15956–15965 (2019).
Article Google Scholar
Zhou, J. et al. Distance-aware molecule graph attention network for drug-target binding affinity prediction. Preprint at https://arXiv.org/2012.09624 (2020).
Hassan-Harrirou, H., Zhang, C. & Lemmin, T. Rosenet: improving binding affinity prediction by leveraging molecular mechanics energies with an ensemble of 3D convolutional neural networks. J. Chem. Inf. Model. 60, 2791–2802 (2020).
Article Google Scholar
Li, S. et al. Monn: a multi-objective neural network for predicting compound-protein interactions and affinities. Cell Syst. 10, 308–322 (2020).
Article Google Scholar
Kuleshov, V., Fenner, N. & Ermon, S. Accurate uncertainties for deep learning using calibrated regression. In Proc. International Conference on Machine Learning (PMLR) (eds Dy, J. & Krause, A.) 2796–2804 (ACM, 2018).
Tran, K. et al. Methods for comparing uncertainty quantifications for material property predictions. Mach. Learn. Sci. Technol. 1, 025006 (2020).
Article Google Scholar
Ali, K. et al. Inactivation of PI3K p110δ breaks regulatory t-cell-mediated immune tolerance to cancer. Nature 510, 407–411 (2014).
Article Google Scholar
Angelopoulos, A. N. & Bates, S. Conformal prediction: a gentle introduction. Found. Trends Mach. Learn. 16, 494–591 (2023).
Article Google Scholar
Bosc, N. et al. Large scale comparison of qsar and conformal prediction methods and their applications in drug discovery. J. Cheminform. 11, 4 (2019).
Article Google Scholar
Levi, D., Gispan, L., Giladi, N. & Fetaya, E. Evaluating and calibrating uncertainty prediction in regression tasks. Sensors 22, 5540 (2023).
Article Google Scholar
Song, H., Diethe, T., Kull, M. & Flach, P. Distribution calibration for regression. In Proc. International Conference on Machine Learning (PMLR) (eds Chaudhuri, K. & Salakhutdinov, R.) 5897–5906 (ACM, 2019).
PubChem3D release notes. PubChem https://pubchemdocs.ncbi.nlm.nih.gov/pubchem3d (2019).
Modi, V. & Dunbrack, R. Kincore: a web resource for structural classification of protein kinases and their inhibitors. Nucleic Acids Res. 50, D654–D664 (2022).
Article Google Scholar
Zhou, G. et al. Uni-mol: a universal 3d molecular representation learning framework. In Proc. of the 11th International Conference on Learning Representations (eds Nickel, M. et al.) (OpenReview, 2023).
Lu, W. et al. Tankbind: trigonometry-aware neural networks for drug-protein binding structure prediction. In Advances in Neural Information Processing Systems Vol 35 (eds Koyejo, S. et al.) 7236–7249 (Curran Associates, Inc., 2022)
Luo, Y., Peng, J. & Ma, J. Next decade’s AI-based drug development features tight integration of data and computation. Health Data Sci. 2022, 9816939 (2022).
Burley, S. K. et al. RCSB protein data bank: powerful new tools for exploring 3D structures of biological macromolecules for basic and applied research and education in fundamental biology, biomedicine, biotechnology, bioengineering and energy sciences. Nucleic Acids Res. 49D437–D451 (2021).
Article Google Scholar
Modi, V. & Dunbrack, R. L. Defining a new nomenclature for the structures of active and inactive kinases. Proc. Natl Acad. Sci. 116, 6818–6827 (2019).
Article Google Scholar
Consortium, T. U. Uniprot: the universal protein knowledgebase in 2021. Nucleic Acids Res.49, D480–D489 (2021).
Article Google Scholar
Kim, S. et al. PubChem in 2021: new data content and improved web interfaces. Nucleic Acids Res.49, D1388–D1395 (2021).
Article Google Scholar
Liu, Y., Palmedo, P., Ye, Q., Berger, B. & Peng, J. Enhancing evolutionary couplings with deep convolutional neural networks. Cell Syst.6, 65–74 (2018).
Article Google Scholar
Ingraham, J., Garg, V., Barzilay, R. & Jaakkola, T. Generative models for graph-based protein design. In Proc. Advances in Neural Information Processing Systems 32 (eds Wallach, H. et al.) 15820–15831 (Curran, 2019).
Rives, A. et al. Biological structure and function emerge from scaling unsupervised learning to 250 million protein sequences. Proc. Natl Acad. Sci. 118, e2016239118 (2021).
Luo, Y. et al. ECNet is an evolutionary context-integrated deep learning framework for protein engineering. Nat. Commun. 12, 5743 (2021).
Shaw, P., Uszkoreit, J. & Vaswani, A. Self-attention with relative position representations. In Proc. of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 2 (Short Papers) (eds Walker, M., Ji, H. & Stent, A.) 464–468 (Association for Computational Linguistics, 2018).
Vaswani, A. et al. Attention is all you need. In Proc. Advances in Neural Information Processing Systems 30 (eds Guyon, I. et al.) 5998–6008 (Curran, 2017).
Shi, Y. et al. Masked label prediction: unified message passing model for semi-supervised classification. In Proceedings of the Thirtieth International Joint Conference on Artificial Intelligence (IJCAI) (2021).
Maas, A. L., Hannun, A. Y. & Ng, A. Y. Rectifier nonlinearities improve neural network acoustic models. In Proc. 30th International Conference on Machine Learning (ICML) (eds Dasgupta, S. & McAllester, D.) 3–8 (JMLR, 2013).
Ashukha, A., Lyzhov, A., Molchanov, D. & Vetrov, D. Pitfalls of in-domain uncertainty estimation and ensembling in deep learning. Paper presented at the 8th International Conference on Learning Representations (ICLR) (eds Song, D., Cho, K. & White, M.) (2020).
Eyke, N. S., Green, W. H. & Jensen, K. F. Iterative experimental design based on active machine learning reduces the experimental burden associated with reaction screening. React. Chem. Eng. 5, 1963–1972 (2020).
Article Google Scholar
Roy, A. G. et al. Does your dermatology classifier know what it doesn’t know? Detecting the long-tail of unseen conditions. Med. Image Anal. 75, 102274 (2021).
Google Scholar
Busk, J. et al. Calibrated uncertainty for molecular property prediction using ensembles of message passing neural networks. Mach. Learn. Sci. Technol. 3, 015012 (2021).
Article Google Scholar
Chung, Y., Char, I., Guo, H., Schneider, J. & Neiswanger, W. Uncertainty toolbox: an open-source library for assessing, visualizing, and improving uncertainty quantification. Preprint at https://arXiv.org/2109.10254 (2021).
Brent, R. P. An algorithm with guaranteed convergence for finding a zero of a function. Comput. J. 14, 422–425 (1971).
Article MathSciNet Google Scholar
Virtanen, P. et al. SciPy 1.0: fundamental algorithms for scientific computing in Python. Nat. Methods 17, 261–272 (2020).
Article Google Scholar
Huang, K. et al. Therapeutics data commons: machine learning datasets and tasks for drug discovery and development. In Proc. Neural Information Processing Systems Track on Datasets and Benchmarks (eds Vanschoren, J. & Yeung, S.) (Conference on Neural Information Processing Systems, 2021).
Luo, Y. KDBNet: release v.0.1. Zenodo https://zenodo.org/record/7959829 (2023).

Download references

Acknowledgements

Y. Luo is supported in part by the National Institute of General Medical Sciences of the National Institutes of Health under award R35GM150890, the 2022 Amazon Research Award and the Seed Grant Program from the NSF AI Institute: Molecule Maker Lab Institute (grant no. 2019897) at the University of Illinois Urbana-Champaign. This work used the Delta GPU Supercomputer at NCSA of the University of Illinois Urbana-Champaign through allocation CIS230097 from the Advanced Cyberinfrastructure Coordination Ecosystem: Services & Support (ACCESS) programme, which is supported by NSF grant nos. 2138259, 2138286, 2138307, 2137603 and 2138296. The authors acknowledge the computational resources provided by Microsoft Azure through the Cloud Hub program at GaTech IDEaS and the Microsoft Accelerate Foundation Models Research (AFMR) program.

Author information

These authors contributed equally: Yunan Luo, Yang Liu.

Authors and Affiliations

School of Computational Science and Engineering, Georgia Institute of Technology, Atlanta, GA, USA
Yunan Luo
Department of Computer Science, University of Illinois Urbana-Champaign, Urbana, IL, USA
Yang Liu & Jian Peng

Authors

Yunan Luo
View author publications
You can also search for this author in PubMed Google Scholar
Yang Liu
View author publications
You can also search for this author in PubMed Google Scholar
Jian Peng
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

Y. Luo and J.P. conceived the research project. J.P. and Y. Luo supervised the project. Y. Luo developed the computational method and implemented the software. Y. Luo and Y. Liu performed the evaluation analyses. All authors analysed the results and participated in the interpretation. Y. Luo wrote the manuscript with support from all other authors.

Corresponding authors

Correspondence to Yunan Luo or Jian Peng.

Ethics declarations

Competing interests

The authors declare no competing interests.

Peer review

Peer review information

Nature Machine Intelligence thanks Tero Aittokallio and the other, anonymous, reviewer(s) for their contribution to the peer review of this work. Primary Handling Editor: Jacob Huth, in collaboration with the Nature Machine Intelligence team.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Extended data

Extended Data Fig. 1 Prediction performance evaluation on KIBA dataset.

(a) Four train-test split settings of evaluation, where the model is evaluated on data of unseen drugs (‘new-drug split’), unseen proteins (‘new-protein split’) or both (‘both-new split’), and unseen proteins with low (<50%) sequence identity (‘seq-id split’). (b) Comparisons of prediction performance of KDBNet with KronRLS, DeepDTA, GraphDTA, DGraphDTA, EnzPred, and ConPLex on the KIBA dataset using four train-test split settings. The performances of GP were not shown as it was not evaluated in the original study¹⁷ and it is computationally costly to run GP at the scale of KIBA dataset because of the high memory footprint of kernel computation. Performances were evaluated using three metrics, including Pearson correlation, Spearman correlation, and mean squared error (MSE) between predicted and true KIBA scores²⁸. All bar plots represented the mean ± SD of evaluation results on five random train/test splits. Abbreviations: seq. id.: sequence identity.

Supplementary information

Supplementary Information

Supplementary Notes, Tables 1–3 and Figs. 1–5.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Luo, Y., Liu, Y. & Peng, J. Calibrated geometric deep learning improves kinase–drug binding predictions. Nat Mach Intell 5, 1390–1401 (2023). https://doi.org/10.1038/s42256-023-00751-0

Download citation

Received: 20 December 2022
Accepted: 29 September 2023
Published: 06 November 2023
Issue Date: December 2023
DOI: https://doi.org/10.1038/s42256-023-00751-0