Improving quantitative structure–activity relationship models using Artificial Neural Networks trained with dropout

Mendenhall, Jeffrey; Meiler, Jens

doi:10.1007/s10822-016-9895-2

Improving quantitative structure–activity relationship models using Artificial Neural Networks trained with dropout

Published: 01 February 2016

Volume 30, pages 177–189, (2016)
Cite this article

Journal of Computer-Aided Molecular Design Aims and scope Submit manuscript

1169 Accesses
31 Citations
1 Altmetric
Explore all metrics

Abstract

Dropout is an Artificial Neural Network (ANN) training technique that has been shown to improve ANN performance across canonical machine learning (ML) datasets. Quantitative Structure Activity Relationship (QSAR) datasets used to relate chemical structure to biological activity in Ligand-Based Computer-Aided Drug Discovery pose unique challenges for ML techniques, such as heavily biased dataset composition, and relatively large number of descriptors relative to the number of actives. To test the hypothesis that dropout also improves QSAR ANNs, we conduct a benchmark on nine large QSAR datasets. Use of dropout improved both enrichment false positive rate and log-scaled area under the receiver-operating characteristic curve (logAUC) by 22–46 % over conventional ANN implementations. Optimal dropout rates are found to be a function of the signal-to-noise ratio of the descriptor set, and relatively independent of the dataset. Dropout ANNs with 2D and 3D autocorrelation descriptors outperform conventional ANNs as well as optimized fingerprint similarity search methods.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Artificial intelligence to deep learning: machine intelligence approach for drug discovery

Article 12 April 2021

Deep learning in drug discovery: an integrative review and future challenges

Article Open access 17 November 2022

From UK-2A to florylpicoxamid: Active learning to identify a mimic of a macrocyclic natural product

Article Open access 17 April 2024

References

Sliwoski G, Kothiwale S, Meiler J, Lowe EW Jr (2014) Computational methods in drug discovery. Pharmacol Rev 66(1):334–395. doi:10.1124/pr.112.007336
Article Google Scholar
Srivastava N, Hinton G, Krizhevsky A, Sutskever I, Salakhutdinov R (2014) Dropout: a simple way to prevent neural networks from overfitting. J Mach Learn Res 15:1929–1958
Google Scholar
Krizhevsky A, Sutskever I, Hinton GE (2012) Imagenet classification with deep convolutional neural networks. Adv Neural Inf Process Syst 25:1097–1105
Google Scholar
Deng L, Hinton G, Kingsbury B (2013) New types of deep neural network learning for speech recognition and related applications: an overview. In: IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), IEEE. pp 8599–8603
Myint KZ, Wang L, Tong Q, Xie XQ (2012) Molecular fingerprint-based artificial neural networks QSAR for ligand biological activity predictions. Mol Pharm 9(10):2912–2923. doi:10.1021/mp300237z
Article CAS Google Scholar
Butkiewicz M, Lowe EW Jr, Mueller R, Mendenhall JL, Teixeira PL, Weaver CD, Meiler J (2013) Benchmarking ligand-based virtual High-Throughput Screening with the PubChem database. Molecules 18(1):735–756. doi:10.3390/molecules18010735
Article CAS Google Scholar
Mueller R, Dawson ES, Niswender CM, Butkiewicz M, Hopkins CR, Weaver CD, Lindsley CW, Conn PJ, Meiler J (2012) Iterative experimental and virtual high-throughput screening identifies metabotropic glutamate receptor subtype 4 positive allosteric modulators. J Mol Model 18(9):4437–4446. doi:10.1007/s00894-012-1441-0
Article CAS Google Scholar
Sliwoski G, Lowe EW, Butkiewicz M, Meiler J (2012) BCL:EMAS—enantioselective molecular asymmetry descriptor for 3D-QSAR. Molecules 17(8):9971–9989. doi:10.3390/molecules17089971
Article CAS Google Scholar
Hartman JH, Cothren SD, Park SH, Yun CH, Darsey JA, Miller GP (2013) Predicting CYP2C19 catalytic parameters for enantioselective oxidations using artificial neural networks and a chirality code. Bioorg Med Chem 21(13):3749–3759. doi:10.1016/j.bmc.2013.04.044
Article CAS Google Scholar
Ahmadi M, Shahlaei M (2015) Quantitative structure-activity relationship study of P2X7 receptor inhibitors using combination of principal component analysis and artificial intelligence methods. Res Pharm Sci 10(4):307–325
Google Scholar
Dahl GE, Jaitly N, Salakhutdinov R (2014) Multi-task neural networks for QSAR predictions. arXiv preprint arXiv:14061231
Dahl G (2012) Deep learning how I did it: Merck 1st place interview. http://blog.kaggle.com/2012/11/01/deep-learning-how-i-did-it-merck-1st-place-interview/. Accessed Aug 14 2015
Cherkasov A, Muratov EN, Fourches D, Varnek A, Baskin II, Cronin M, Dearden J, Gramatica P, Martin YC, Todeschini R, Consonni V, Kuz’min VE, Cramer R, Benigni R, Yang C, Rathman J, Terfloth L, Gasteiger J, Richard A, Tropsha A (2014) QSAR modeling: where have you been? Where are you going to? J Med Chem 57(12):4977–5010. doi:10.1021/jm4004285
Article CAS Google Scholar
Sadowski J (1997) A hybrid approach for addressing ring flexibility in 3D database searching. J Comput Aid Mol Des 11(1):53–60
Article CAS Google Scholar
Berenger F, Voet A, Lee XY, Zhang KYJ (2014) A rotation-translation invariant molecular descriptor of partial charges and its use in ligand-based virtual screening. J Chem Inf. doi:10.1186/1758-2946-6-23
Google Scholar
Sliwoski G, Mendenhall J, Meiler J (2015) Autocorrelation descriptor improvements for QSAR: 2DA_Sign and 3DA_Sign. J Comput Aid Mol Des. doi:10.1007/s10822-015-9893-9
Google Scholar
Todeschini R, Consonni V (2009) Molecular descriptors for chemoinformatics. Methods and principles in medicinal chemistry, vol 41, 2nd edn. Wiley-VCH, Weinheim
Book Google Scholar
Sastry M, Lowrie JF, Dixon SL, Sherman W (2010) Large-scale systematic analysis of 2D fingerprint methods and parameters to improve virtual screening enrichments. J Chem Inf Model 50(5):771–784. doi:10.1021/ci100062n
Article CAS Google Scholar
Xing L, Glen RC (2002) Novel methods for the prediction of logP, pK(a), and logD. J Chem Inf Comput Sci 42(4):796–805
Article CAS Google Scholar
Ertl P, Rohde B, Selzer P (2000) Fast calculation of molecular polar surface area as a sum of fragment-based contributions and its application to the prediction of drug transport properties. J Med Chem 43(20):3714–3717
Article CAS Google Scholar
Gasteiger J, Marsili M (1980) Iterative partial equalization of orbital electronegativity—a rapid access to atomic charges. Tetrahedron 36(22):3219–3228. doi:10.1016/0040-4020(80)80168-2
Article CAS Google Scholar
Gilson MK, Gilson HS, Potter MJ (2003) Fast assignment of accurate partial atomic charges: an electronegativity equalization method that accounts for alternate resonance forms. J Chem Inf Comput Sci 43(6):1982–1997. doi:10.1021/ci034148o
Article CAS Google Scholar
Miller KJ (1990) Additivity methods in molecular polarizability. J Am Chem Soc 112(23):8533–8542. doi:10.1021/Ja00179a044
Article CAS Google Scholar
PubChem (2009) PubChem Substructure Fingerprint. ftp://ncbi.nlm.nih.gov/pubchem/specifications/pubchem_fingerprints.pdf. Accessed May 05 2014
Rumelhart DE, Hinton GE, Williams RJ (1986) Learning representations by back-propagating errors. Nature 323(6088):533–536. doi:10.1038/323533a0
Article Google Scholar
Mysinger MM, Shoichet BK (2010) Rapid context-dependent ligand desolvation in molecular docking. J Chem Inf Model 50(9):1561–1573. doi:10.1021/ci100214a
Article CAS Google Scholar
Weisstein EW (2000) Normal sum distribution. Wolfram Research, Inc. http://mathworld.wolfram.com/NormalSumDistribution.html. Accessed Nov 1 2015
Valcu M, Valcu CM (2011) Data transformation practices in biomedical sciences. Nat Methods 8(2):104–105. doi:10.1038/nmeth0211-104
Article CAS Google Scholar
LeCun Y, Bottou L, Orr G, Müller K-R (1998) Efficient BackProp. In: Orr G, Müller K-R (eds) Neural networks: tricks of the trade, vol 1524. Lecture Notes in Computer Science. Springer, Berlin, pp 9–50. doi:10.1007/3-540-49430-8_2
Prati RC, Batista GE, Silva DF (2014) Class imbalance revisited: a new experimental setup to assess the performance of treatment methods. Knowl Inf Syst 1–24
Batista GEAPA, Prati RC, Monard MC (2005) Balancing strategies and class overlapping. Adv Intell Data Anal VI Proc 3646:24–35
Article Google Scholar
Nakama T (2009) Theoretical analysis of batch and on-line training for gradient descent learning in neural networks. Neurocomput 73(1–3):151–159. doi:10.1016/j.neucom.2009.05.017
Article Google Scholar
Wilson DR, Martinez TR (2003) The general inefficiency of batch training for gradient descent learning. Neural Netw 16(10):1429–1451. doi:10.1016/S0893-6080(03)00138-2
Article Google Scholar
Wu W, Wang J, Cheng M, Li Z (2011) Convergence analysis of online gradient method for BP neural networks. Neural Netw 24(1):91–98. doi:10.1016/j.neunet.2010.09.007
Article Google Scholar
Igel C, Husken M (2003) Empirical evaluation of the improved Rprop learning algorithms. Neurocomputing 50:105–123. doi:10.1016/S0925-2312(01)00700-7
Article Google Scholar
Jain AN, Nicholls A (2008) Recommendations for evaluation of computational methods. J Comput Aid Mol Des 22(3–4):133–139. doi:10.1007/s10822-008-9196-5
Article CAS Google Scholar
Gasteiger J (2003) Handbook of chemoinformatics: from data to knowledge. Wiley-VCH, Weinheim
Book Google Scholar
Ba J, Frey B (2013) Adaptive dropout for training deep neural networks. Adv Neural Inf Process Syst 26:3084–3092
Google Scholar
Mueller R, Rodriguez AL, Dawson ES, Butkiewicz M, Nguyen TT, Oleszkiewicz S, Bleckmann A, Weaver CD, Lindsley CW, Conn PJ, Meiler J (2010) Identification of metabotropic glutamate receptor subtype 5 potentiators using virtual high-throughput screening. ACS Chem Neurosci 1(4):288–305. doi:10.1021/cn9000389
Article CAS Google Scholar
Marsili M, Gasteiger J (1980) Pi-charge distribution from molecular topology and pi-orbital electronegativity. Croat Chem Acta 53(4):601–614
Google Scholar
Gilson MK, Gilson HSR, Potter MJ (2003) Fast assignment of accurate partial atomic charges: an electronegativity equalization method that accounts for alternate resonance forms. J Chem Inf Comput Sci 43(6):1982–1997. doi:10.1021/Ci034148o
Article CAS Google Scholar

Download references

Author information

Authors and Affiliations

Department of Chemistry, Center for Structural Biology, Institute of Chemical Biology, Vanderbilt University, 7330 Stevenson Center, Station B 351822, Nashville, TN, 37235, USA
Jeffrey Mendenhall & Jens Meiler

Authors

Jeffrey Mendenhall
View author publications
You can also search for this author in PubMed Google Scholar
Jens Meiler
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Jens Meiler.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Mendenhall, J., Meiler, J. Improving quantitative structure–activity relationship models using Artificial Neural Networks trained with dropout. J Comput Aided Mol Des 30, 177–189 (2016). https://doi.org/10.1007/s10822-016-9895-2

Download citation

Received: 15 September 2015
Accepted: 15 January 2016
Published: 01 February 2016
Issue Date: February 2016
DOI: https://doi.org/10.1007/s10822-016-9895-2

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Improving quantitative structure–activity relationship models using Artificial Neural Networks trained with dropout

Abstract

Access this article

Similar content being viewed by others

Artificial intelligence to deep learning: machine intelligence approach for drug discovery

Deep learning in drug discovery: an integrative review and future challenges

From UK-2A to florylpicoxamid: Active learning to identify a mimic of a macrocyclic natural product

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Improving quantitative structure–activity relationship models using Artificial Neural Networks trained with dropout

Abstract

Access this article

Similar content being viewed by others

Artificial intelligence to deep learning: machine intelligence approach for drug discovery

Deep learning in drug discovery: an integrative review and future challenges

From UK-2A to florylpicoxamid: Active learning to identify a mimic of a macrocyclic natural product

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation