Abstract
Modern molecular docking comprises the prediction of pose and affinity. Prediction of docking poses is required for affinity prediction when three-dimensional coordinates of the ligand have not been provided. However, a large number of feature engineering is required for existing methods. In addition, there is a need for a robust model for the sequential combination of pose and affinity prediction due to the probabilistic deviation of the ligand position issue. We propose a pipeline using a bipartite graph neural network and transfer learning trained on a re-docking dataset. We evaluated our model on the released data from drug design data resource grand challenge 4 (D3R GC4). The two target protein data provided by the challenge have different patterns. The model outperformed the best participant by 9% on the BACE target protein from stage 2. Further, our model showed competitive performance on the CatS target protein.
Similar content being viewed by others
Data availability
All data are publicly available. Please refer to the code availability section for detail.
Code availability
The model code and data is available at : https://github.com/arwhirang/affinity_prediction_BGNN.
References
Seifert MH, Wolf K, Vitt D (2003) Virtual high-throughput in silico screening. Biosilico 1(4):143–149
Braga R, Alves V, Silva A, Nascimento M, Silva F, Liao L, Andrade C (2014) Virtual screening strategies in medicinal chemistry: the state of the art and current challenges. Curr Top Med Chem 14(16):1899–1912
Gimeno A, Ojeda-Montes MJ, Tomás-Hernández S, Cereto-Massagué A, Beltrán-Debón R, Mulero M, Garcia-Vallvé S (2019) The light and dark sides of virtual screening: what is there to know? Int J Mol Sci 20(6):1375
Friesner RA, Banks JL, Murphy RB, Halgren TA, Klicic JJ, Mainz DT, Shenkin PS (2004) Glide: a new approach for rapid, accurate docking and scoring. 1. Method and assessment of docking accuracy. J Med Chem 47(7):1739–1749
Warren GL, Andrews CW, Capelli AM, Clarke B, LaLonde J, Lambert MH, Head MS (2006) A critical assessment of docking programs and scoring functions. J Med Chem 49(20):5912–5931
Gaieb Z, Parks CD, Chiu M, Yang H, Shao C, Walters WP, Gilson MK (2019) D3R Grand Challenge 3: blind prediction of protein-ligand poses and affinity rankings. J Comput Aided Mol Des 33(1):1–18
Ragoza M, Hochuli J, Idrobo E, Sunseri J, Koes DR (2017) Protein-ligand scoring with convolutional neural networks. J Chem Inf Model 57(4):942–957
Morrone JA, Weber JK, Huynh T, Luo H, Cornell WD (2020) Combining docking pose rank and structure with deep learning improves protein-ligand binding mode prediction over a baseline docking approach. J Chem Inf Model 60(9):4170–4179
Jiménez J, Skalic M, Martinez-Rosell G, De Fabritiis G (2018) Kdeep: protein-ligand absolute binding affinity prediction via 3d-convolutional neural networks. J Chem Inf Model 58(2):287–296
Zheng L, Fan J, Mu Y (2019) Onionnet: a multiple-layer intermolecular-contact-based convolutional neural network for protein-ligand binding affinity prediction. ACS Omega 4(14):15956–15965
Yang L, Yang G, Chen X, Yang Q, Yao X, Bing Z, Yang L (2021) Deep scoring neural network replacing the scoring function components to improve the performance of structure-based molecular docking. ACS Chem Neurosci 12:2133
Muller U, Ben J, Cosatto E, Flepp B, Cun YL (2006) Off-road obstacle avoidance through end-to-end learning. Adv Neural Inf Process Syst 739–746
Oquab M, Bottou L, Laptev I, Sivic J (2014) Learning and transferring mid-level image representations using convolutional neural networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 1717–1724
Parks CD, Gaieb Z, Chiu M, Yang H, Shao C, Walters WP, Gilson MK (2020) D3R grand challenge 4: blind prediction of protein-ligand poses, affinity rankings, and relative binding free energies. J Comput-Aided Mol Des 34(2):99–119
Nguyen D, Gao K, Chen J, Wang R, Wei G (2020) Potentially highly potent drugs for 2019-nCoV. BioRxiv
Ragoza M, Turner L, Koes DR (2017) Ligand pose optimization with atomic grid-based convolutional neural networks. arXiv:1710.07400
Francoeur PG, Masuda T, Sunseri J, Jia A, Iovanisci RB, Snyder I, Koes DR (2020) Three-dimensional convolutional neural networks and a cross-docked data set for structure-based drug design. J Chem Inf Model 60(9):4200–4215
Riniker S, Landrum GA (2015) Better informed distance geometry: using what we know to improve conformation generation. J Chem Inf Model 55(12):2562–2574
Sánchez-Cruz N, Medina-Franco JL, Mestres J, Barril X (2020) Extended connectivity interaction features: improving binding affinity prediction through chemical description. Bioinformatics
Rogers D, Hahn M (2010) Extended-connectivity fingerprints. J Chem Inf Model 50(5):742–754
Wu Z, Ramsundar B, Feinberg EN, Gomes J, Geniesse C, Pappu AS, Pande V (2018) MoleculeNet: a benchmark for molecular machine learning. Chem Sci 9(2):513–530
Nguyen DD, Gao K, Wang M, Wei GW (2018) MathDL: mathematical deep learning for D3R grand challenge 4. J Comput-Aided Mol Des 342020:131–147
Mikolov T, Chen K, Corrado G, Dean J (2013) Efficient estimation of word representations in vector space. arXiv:1301.3781
Cadeddu A, Wylie EK, Jurczak J, Wampler-Doty M, Grzybowski BA (2014) Organic chemistry as a language and the implications of chemical linguistics for structural and retrosynthetic analyses. Angew Chem Int Ed 53(31):8108–8112
Hirohara M, Saito Y, Koda Y, Sato K, Sakakibara Y (2018) Convolutional neural network based on SMILES representation of compounds for detecting chemical motif. BMC Bioinform 19(19):83–94
Goh GB, Hodas NO, Siegel C, Vishnu A (2017) Smiles2vec: an interpretable general-purpose deep neural network for predicting chemical properties. arXiv:1712.02034
Vaswani A, Shazeer N, Parmar N, Uszkoreit J, Polosukhin I (2017) Attention is all you need. arXiv:1706.03762
Lin Z, Feng M, Santos CN, Yu M, Xiang B, Zhou B, Bengio Y (2017) A structured self-attentive sentence embedding. arXiv:1703.03130
Shin B, Park S, Kang K, Ho JC (2019) Self-attention based molecule representation for predicting drug–target interaction. In: Machine learning for healthcare conference. Proceedings of Machine Learning Research (PMLR) (pp. 230–248)
Zheng S, Li Y, Chen S, Xu J, Yang Y (2020) Predicting drug-protein interaction using quasi-visual question answering system. Nat Mach Intell 2(2):134–140
Gaulton A, Bellis LJ, Bento AP, Chambers J, Davies M, Hersey A, Overington JP (2012) ChEMBL: a large-scale bioactivity database for drug discovery. Nucleic Acids Res 40:1100–1107
Liu Z, Li Y, Han L, Li J, Liu J, Zhao Z, Wang R (2015) PDB-wide collection of binding data: current status of the PDBbind database. Bioinformatics 31(3):405–412
Varela-Rial A, Majewski M, Cuzzolin A, Martínez-Rosell G, De Fabritiis G (2020) SkeleDock: a web application for scaffold docking in play molecule. J Chem Inf Model 60(6):2673–2677
Ihlenfeldt WD, Takahashi Y, Abe H, Sasaki SI (1994) Computation and management of chemical properties in CACTVS: an extensible networked approach toward modularity and compatibility. J Chem Inf Comput Sci 34(1):109–116
McNutt A, Francoeur P, Aggarwal R, Masuda T, Meli R, Ragoza M, Koes D (2021) GNINA 1.0: molecular docking with deep learning
Su M, Yang Q, Du Y, Feng G, Liu Z, Li Y, Wang R (2018) Comparative assessment of scoring functions: the CASF-2016 update. J Chem Inf Model 59(2):895–913
Li AH, Bradic J (2018) Boosting in the presence of outliers: adaptive classification with nonconvex loss functions. J Am Stat Assoc 113(522):660–674
Prechelt L (1998) Early stopping-but when? Neural networks: tricks of the trade. Springer, Berlin, pp 55–69
Lam PC, Abagyan R, Totrov M (2018) Hybrid receptor structure/ligand-based docking and activity prediction in ICM: development and evaluation in D3R Grand Challenge 3. J Comput Aided Mol Des 33(1):35–46
Sahu S, Shukla A (2009) Fortran 90 implementation of the Hartree–Fock approach within the CNDO/2 and INDO models. Comput Phys Commun 180(5):724–734
Wingert BM, Oerlemans R, Camacho CJ (2018) Optimal affinity ranking for automated virtual screening validated in prospective D3R grand challenges. J Comput Aided Mol Des 32(1):287–297
Ye Z, Baumgartner MP, Wingert BM, Camacho CJ (2016) Optimal strategies for virtual screening of induced-fit and flexible target in the 2015 D3R Grand Challenge. J Comput Aided Mol Des 30(9):695–706
He X, Man VH, Ji B, Xie XQ, Wang J (2019) Calculate protein-ligand binding affinities with the extended linear interaction energy method: application on the Cathepsin S set in the D3R Grand Challenge 3. J Comput Aided Mol Des 33(1):105–117
Wang J, Wang W, Kollman PA, Case DA (2006) Automatic atom type and bond type perception in molecular mechanical calculations. J Mol Graph Model 25(2):247–260
Salomon-Ferrer R, Case DA, Walker RC (2013) An overview of the Amber biomolecular simulation package. Wiley Interdiscip Rev 3(2):198–210
Funding
The study is supported by National Research Council of Science & Technology (NST) grant by the Korea government (MSIP) (No. CAP-17-01-KIST Europe).
Author information
Authors and Affiliations
Contributions
The study was designed by SL, and YL. SL wrote the code and performed the analysis. The original manuscript was written by SL, and YL. All authors (SL, YL, JY, and YK) have reviewed and edited the manuscript. YL and YK acquired the funding. All authors have given approval to the final version of the manuscript.
Corresponding author
Ethics declarations
Conflict of interest
We declare no conflict of interest
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Supplementary Information
Below is the link to the electronic supplementary material.
Rights and permissions
About this article
Cite this article
Lim, S., Lee, Y.O., Yoon, J. et al. Affinity prediction using deep learning based on SMILES input for D3R grand challenge 4. J Comput Aided Mol Des 36, 225–235 (2022). https://doi.org/10.1007/s10822-022-00448-3
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10822-022-00448-3