Affinity prediction using deep learning based on SMILES input for D3R grand challenge 4

Lim, Sangrak; Lee, Yong Oh; Yoon, Juyong; Kim, Young Jun

doi:10.1007/s10822-022-00448-3

Affinity prediction using deep learning based on SMILES input for D3R grand challenge 4

Published: 22 March 2022

Volume 36, pages 225–235, (2022)
Cite this article

Journal of Computer-Aided Molecular Design Aims and scope Submit manuscript

Sangrak Lim ORCID: orcid.org/0000-0001-5112-7907¹,
Yong Oh Lee^1,2,
Juyong Yoon¹ &
…
Young Jun Kim¹

626 Accesses
2 Citations
1 Altmetric
Explore all metrics

Abstract

Modern molecular docking comprises the prediction of pose and affinity. Prediction of docking poses is required for affinity prediction when three-dimensional coordinates of the ligand have not been provided. However, a large number of feature engineering is required for existing methods. In addition, there is a need for a robust model for the sequential combination of pose and affinity prediction due to the probabilistic deviation of the ligand position issue. We propose a pipeline using a bipartite graph neural network and transfer learning trained on a re-docking dataset. We evaluated our model on the released data from drug design data resource grand challenge 4 (D3R GC4). The two target protein data provided by the challenge have different patterns. The model outperformed the best participant by 9% on the BACE target protein from stage 2. Further, our model showed competitive performance on the CatS target protein.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Efficient and accurate large library ligand docking with KarmaDock

Article 21 September 2023

Xujun Zhang, Odin Zhang, … Tingjun Hou

A merged molecular representation learning for molecular properties prediction with a web-based service

Article Open access 26 May 2021

Hyunseob Kim, Jeongcheol Lee, … Jongsuk Ruth Lee

MERMAID: an open source automated hit-to-lead method based on deep reinforcement learning

Article Open access 27 November 2021

Daiki Erikawa, Nobuaki Yasuo & Masakazu Sekijima

Data availability

All data are publicly available. Please refer to the code availability section for detail.

Code availability

The model code and data is available at : https://github.com/arwhirang/affinity_prediction_BGNN.

References

Seifert MH, Wolf K, Vitt D (2003) Virtual high-throughput in silico screening. Biosilico 1(4):143–149
Article CAS Google Scholar
Braga R, Alves V, Silva A, Nascimento M, Silva F, Liao L, Andrade C (2014) Virtual screening strategies in medicinal chemistry: the state of the art and current challenges. Curr Top Med Chem 14(16):1899–1912
Article CAS Google Scholar
Gimeno A, Ojeda-Montes MJ, Tomás-Hernández S, Cereto-Massagué A, Beltrán-Debón R, Mulero M, Garcia-Vallvé S (2019) The light and dark sides of virtual screening: what is there to know? Int J Mol Sci 20(6):1375
Article CAS Google Scholar
Friesner RA, Banks JL, Murphy RB, Halgren TA, Klicic JJ, Mainz DT, Shenkin PS (2004) Glide: a new approach for rapid, accurate docking and scoring. 1. Method and assessment of docking accuracy. J Med Chem 47(7):1739–1749
Article CAS Google Scholar
Warren GL, Andrews CW, Capelli AM, Clarke B, LaLonde J, Lambert MH, Head MS (2006) A critical assessment of docking programs and scoring functions. J Med Chem 49(20):5912–5931
Article CAS Google Scholar
Gaieb Z, Parks CD, Chiu M, Yang H, Shao C, Walters WP, Gilson MK (2019) D3R Grand Challenge 3: blind prediction of protein-ligand poses and affinity rankings. J Comput Aided Mol Des 33(1):1–18
Article CAS Google Scholar
Ragoza M, Hochuli J, Idrobo E, Sunseri J, Koes DR (2017) Protein-ligand scoring with convolutional neural networks. J Chem Inf Model 57(4):942–957
Article CAS Google Scholar
Morrone JA, Weber JK, Huynh T, Luo H, Cornell WD (2020) Combining docking pose rank and structure with deep learning improves protein-ligand binding mode prediction over a baseline docking approach. J Chem Inf Model 60(9):4170–4179
Article CAS Google Scholar
Jiménez J, Skalic M, Martinez-Rosell G, De Fabritiis G (2018) Kdeep: protein-ligand absolute binding affinity prediction via 3d-convolutional neural networks. J Chem Inf Model 58(2):287–296
Article Google Scholar
Zheng L, Fan J, Mu Y (2019) Onionnet: a multiple-layer intermolecular-contact-based convolutional neural network for protein-ligand binding affinity prediction. ACS Omega 4(14):15956–15965
Article CAS Google Scholar
Yang L, Yang G, Chen X, Yang Q, Yao X, Bing Z, Yang L (2021) Deep scoring neural network replacing the scoring function components to improve the performance of structure-based molecular docking. ACS Chem Neurosci 12:2133
Article CAS Google Scholar
Muller U, Ben J, Cosatto E, Flepp B, Cun YL (2006) Off-road obstacle avoidance through end-to-end learning. Adv Neural Inf Process Syst 739–746
Oquab M, Bottou L, Laptev I, Sivic J (2014) Learning and transferring mid-level image representations using convolutional neural networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 1717–1724
Parks CD, Gaieb Z, Chiu M, Yang H, Shao C, Walters WP, Gilson MK (2020) D3R grand challenge 4: blind prediction of protein-ligand poses, affinity rankings, and relative binding free energies. J Comput-Aided Mol Des 34(2):99–119
Article CAS Google Scholar
Nguyen D, Gao K, Chen J, Wang R, Wei G (2020) Potentially highly potent drugs for 2019-nCoV. BioRxiv
Ragoza M, Turner L, Koes DR (2017) Ligand pose optimization with atomic grid-based convolutional neural networks. arXiv:1710.07400
Francoeur PG, Masuda T, Sunseri J, Jia A, Iovanisci RB, Snyder I, Koes DR (2020) Three-dimensional convolutional neural networks and a cross-docked data set for structure-based drug design. J Chem Inf Model 60(9):4200–4215
Article CAS Google Scholar
Riniker S, Landrum GA (2015) Better informed distance geometry: using what we know to improve conformation generation. J Chem Inf Model 55(12):2562–2574
Article CAS Google Scholar
Sánchez-Cruz N, Medina-Franco JL, Mestres J, Barril X (2020) Extended connectivity interaction features: improving binding affinity prediction through chemical description. Bioinformatics
Rogers D, Hahn M (2010) Extended-connectivity fingerprints. J Chem Inf Model 50(5):742–754
Article CAS Google Scholar
Wu Z, Ramsundar B, Feinberg EN, Gomes J, Geniesse C, Pappu AS, Pande V (2018) MoleculeNet: a benchmark for molecular machine learning. Chem Sci 9(2):513–530
Article CAS Google Scholar
Nguyen DD, Gao K, Wang M, Wei GW (2018) MathDL: mathematical deep learning for D3R grand challenge 4. J Comput-Aided Mol Des 342020:131–147
Google Scholar
Mikolov T, Chen K, Corrado G, Dean J (2013) Efficient estimation of word representations in vector space. arXiv:1301.3781
Cadeddu A, Wylie EK, Jurczak J, Wampler-Doty M, Grzybowski BA (2014) Organic chemistry as a language and the implications of chemical linguistics for structural and retrosynthetic analyses. Angew Chem Int Ed 53(31):8108–8112
Article CAS Google Scholar
Hirohara M, Saito Y, Koda Y, Sato K, Sakakibara Y (2018) Convolutional neural network based on SMILES representation of compounds for detecting chemical motif. BMC Bioinform 19(19):83–94
Google Scholar
Goh GB, Hodas NO, Siegel C, Vishnu A (2017) Smiles2vec: an interpretable general-purpose deep neural network for predicting chemical properties. arXiv:1712.02034
Vaswani A, Shazeer N, Parmar N, Uszkoreit J, Polosukhin I (2017) Attention is all you need. arXiv:1706.03762
Lin Z, Feng M, Santos CN, Yu M, Xiang B, Zhou B, Bengio Y (2017) A structured self-attentive sentence embedding. arXiv:1703.03130
Shin B, Park S, Kang K, Ho JC (2019) Self-attention based molecule representation for predicting drug–target interaction. In: Machine learning for healthcare conference. Proceedings of Machine Learning Research (PMLR) (pp. 230–248)
Zheng S, Li Y, Chen S, Xu J, Yang Y (2020) Predicting drug-protein interaction using quasi-visual question answering system. Nat Mach Intell 2(2):134–140
Article Google Scholar
Gaulton A, Bellis LJ, Bento AP, Chambers J, Davies M, Hersey A, Overington JP (2012) ChEMBL: a large-scale bioactivity database for drug discovery. Nucleic Acids Res 40:1100–1107
Article Google Scholar
Liu Z, Li Y, Han L, Li J, Liu J, Zhao Z, Wang R (2015) PDB-wide collection of binding data: current status of the PDBbind database. Bioinformatics 31(3):405–412
Article CAS Google Scholar
Varela-Rial A, Majewski M, Cuzzolin A, Martínez-Rosell G, De Fabritiis G (2020) SkeleDock: a web application for scaffold docking in play molecule. J Chem Inf Model 60(6):2673–2677
Article CAS Google Scholar
Ihlenfeldt WD, Takahashi Y, Abe H, Sasaki SI (1994) Computation and management of chemical properties in CACTVS: an extensible networked approach toward modularity and compatibility. J Chem Inf Comput Sci 34(1):109–116
Article CAS Google Scholar
McNutt A, Francoeur P, Aggarwal R, Masuda T, Meli R, Ragoza M, Koes D (2021) GNINA 1.0: molecular docking with deep learning
Su M, Yang Q, Du Y, Feng G, Liu Z, Li Y, Wang R (2018) Comparative assessment of scoring functions: the CASF-2016 update. J Chem Inf Model 59(2):895–913
Article Google Scholar
Li AH, Bradic J (2018) Boosting in the presence of outliers: adaptive classification with nonconvex loss functions. J Am Stat Assoc 113(522):660–674
Article CAS Google Scholar
Prechelt L (1998) Early stopping-but when? Neural networks: tricks of the trade. Springer, Berlin, pp 55–69
Chapter Google Scholar
Lam PC, Abagyan R, Totrov M (2018) Hybrid receptor structure/ligand-based docking and activity prediction in ICM: development and evaluation in D3R Grand Challenge 3. J Comput Aided Mol Des 33(1):35–46
Article Google Scholar
Sahu S, Shukla A (2009) Fortran 90 implementation of the Hartree–Fock approach within the CNDO/2 and INDO models. Comput Phys Commun 180(5):724–734
Article CAS Google Scholar
Wingert BM, Oerlemans R, Camacho CJ (2018) Optimal affinity ranking for automated virtual screening validated in prospective D3R grand challenges. J Comput Aided Mol Des 32(1):287–297
Article CAS Google Scholar
Ye Z, Baumgartner MP, Wingert BM, Camacho CJ (2016) Optimal strategies for virtual screening of induced-fit and flexible target in the 2015 D3R Grand Challenge. J Comput Aided Mol Des 30(9):695–706
Article CAS Google Scholar
He X, Man VH, Ji B, Xie XQ, Wang J (2019) Calculate protein-ligand binding affinities with the extended linear interaction energy method: application on the Cathepsin S set in the D3R Grand Challenge 3. J Comput Aided Mol Des 33(1):105–117
Article CAS Google Scholar
Wang J, Wang W, Kollman PA, Case DA (2006) Automatic atom type and bond type perception in molecular mechanical calculations. J Mol Graph Model 25(2):247–260
Article Google Scholar
Salomon-Ferrer R, Case DA, Walker RC (2013) An overview of the Amber biomolecular simulation package. Wiley Interdiscip Rev 3(2):198–210
CAS Google Scholar

Download references

Funding

The study is supported by National Research Council of Science & Technology (NST) grant by the Korea government (MSIP) (No. CAP-17-01-KIST Europe).

Author information

Authors and Affiliations

Kist Europe, Campus E7 1 66123, Saarbrücken , Germany
Sangrak Lim, Yong Oh Lee, Juyong Yoon & Young Jun Kim
Industrial and Data Engineering Department of Hongik University, Seoul, Republic of Korea
Yong Oh Lee

Authors

Sangrak Lim
View author publications
You can also search for this author in PubMed Google Scholar
Yong Oh Lee
View author publications
You can also search for this author in PubMed Google Scholar
Juyong Yoon
View author publications
You can also search for this author in PubMed Google Scholar
Young Jun Kim
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

The study was designed by SL, and YL. SL wrote the code and performed the analysis. The original manuscript was written by SL, and YL. All authors (SL, YL, JY, and YK) have reviewed and edited the manuscript. YL and YK acquired the funding. All authors have given approval to the final version of the manuscript.

Corresponding author

Correspondence to Sangrak Lim.

Ethics declarations

Conflict of interest

We declare no conflict of interest

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary Information

Below is the link to the electronic supplementary material.

Supplementary file1 (DOCX 2774 KB)

Rights and permissions

Reprints and permissions

About this article

Cite this article

Lim, S., Lee, Y.O., Yoon, J. et al. Affinity prediction using deep learning based on SMILES input for D3R grand challenge 4. J Comput Aided Mol Des 36, 225–235 (2022). https://doi.org/10.1007/s10822-022-00448-3

Download citation

Received: 14 June 2021
Accepted: 08 March 2022
Published: 22 March 2022
Issue Date: March 2022
DOI: https://doi.org/10.1007/s10822-022-00448-3

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Affinity prediction using deep learning based on SMILES input for D3R grand challenge 4

Abstract

Access this article

Similar content being viewed by others

Efficient and accurate large library ligand docking with KarmaDock

A merged molecular representation learning for molecular properties prediction with a web-based service

MERMAID: an open source automated hit-to-lead method based on deep reinforcement learning

Data availability

Code availability

References

Funding

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Supplementary Information

Supplementary file1 (DOCX 2774 KB)

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Affinity prediction using deep learning based on SMILES input for D3R grand challenge 4

Abstract

Access this article

Similar content being viewed by others

Efficient and accurate large library ligand docking with KarmaDock

A merged molecular representation learning for molecular properties prediction with a web-based service

MERMAID: an open source automated hit-to-lead method based on deep reinforcement learning

Data availability

Code availability

References

Funding

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Supplementary Information

Supplementary file1 (DOCX 2774 KB)

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation