Abstract
Quantitative structure–activity relationship (QSAR) is a branch of computer aided drug discovery that relates chemical structures to biological activity. Two well established and related QSAR descriptors are two- and three-dimensional autocorrelation (2DA and 3DA). These descriptors encode the relative position of atoms or atom properties by calculating the separation between atom pairs in terms of number of bonds (2DA) or Euclidean distance (3DA). The sums of all values computed for a given small molecule are collected in a histogram. Atom properties can be added with a coefficient that is the product of atom properties for each pair. This procedure can lead to information loss when signed atom properties are considered such as partial charge. For example, the product of two positive charges is indistinguishable from the product of two equivalent negative charges. In this paper, we present variations of 2DA and 3DA called 2DA_Sign and 3DA_Sign that avoid information loss by splitting unique sign pairs into individual histograms. We evaluate these variations with models trained on nine datasets spanning a range of drug target classes. Both 2DA_Sign and 3DA_Sign significantly increase model performance across all datasets when compared with traditional 2DA and 3DA. Lastly, we find that limiting 3DA_Sign to maximum atom pair distances of 6 Å instead of 12 Å further increases model performance, suggesting that conformational flexibility may hinder performance with longer 3DA descriptors. Consistent with this finding, limiting the number of bonds in 2DA_Sign from 11 to 5 fails to improve performance.
Similar content being viewed by others
Abbreviations
- 2DA:
-
2D autocorrelation
- 3DA:
-
3D autocorrelation
- ANN:
-
Artificial neural network
- BCL:
-
BioChemical library
- CADD:
-
Computer aided drug discovery
- GPCR:
-
G-protein coupled receptor
- HTS:
-
High-throughput screen
- LB-CADD:
-
Ligand-based CADD
- logAUC:
-
Area under the logarithmic ROC curve
- LOO:
-
Leave-one-out
- QSAR:
-
Quantitative structure–activity relationship
- RDF:
-
Radial distribution function
- ROC:
-
Receiver operating characteristic
- VDW:
-
Van der Waals
References
Sliwoski G, Kothiwale S, Meiler J, Lowe EW Jr (2014) Computational methods in drug discovery. Pharmacol Rev 66(1):334–395. doi:10.1124/pr.112.007336
Salt DW, Yildiz N, Livingstone DJ, Tinsley CJ (1992) The use of artificial neural networks in QSAR. Pestic Sci 36(2):161–170. doi:10.1002/ps.2780360212
Butkiewicz M, Lowe EW, Meiler J (2012) Bcl::ChemInfo—qualitative analysis of machine learning models for activation of HSD involved in Alzheimer’s Disease. In: Computational intelligence in bioinformatics and computational biology (CIBCB), 2012 IEEE symposium on, 9–12 May 2012, pp 329–334. doi:10.1109/cibcb.2012.6217248
Trinajstić N (1992) Chemical graph theory. In: Mathematical chemistry series, 2nd edn. CRC Press, Boca Raton
Balaban AT (1998) Topological and stereochemical molecular descriptors for databases useful in QSAR, similarity/dissimilarity and drug design. SAR QSAR Environ Res 8(1–2):1–21. doi:10.1080/10629369808033259
Hemmer MC, Steinhauer V, Gasteiger J (1999) Deriving the 3D structure of organic molecules from their infrared spectra. Vib Spectrosc 19(1):151–164. doi:10.1016/S0924-2031(99)00014-4
Broto P, Moreau G, Vandycke C (1984) Molecular structures: perception, autocorrelation descriptor and SAR studies. Perception of molecules: topological structure and 3-dimensional structure. Eur J Med Chem 19(1):61–65
Hopfinger AJ, Wang S, Tokarski JS, Jin B, Albuquerque M, Madhav PJ, Duraiswami C (1997) Construction of 3D-QSAR models using the 4D-QSAR analysis formalism. J Am Chem Soc 119(43):10509–10524. doi:10.1021/ja9718937
Shahlaei M (2013) Descriptor selection methods in quantitative structure–activity relationship studies: a review study. Chem Rev 113(10):8093–8103. doi:10.1021/cr3004339
Moreau G, Broto P (1980) The auto-correlation of a topological-structure—a new molecular descriptor. Nouv J Chim 4(6):359–360
Butkiewicz M, Lowe EW Jr, Mueller R, Mendenhall JL, Teixeira PL, Weaver CD, Meiler J (2013) Benchmarking ligand-based virtual high-throughput screening with the PubChem database. Molecules 18(1):735–756. doi:10.3390/molecules18010735
Kubinyi H, Folkers G, Martin YC (1998) 3D QSAR in drug design. Qdsar, vol 2. Kluwer, Dordrecht
Kiralj R, Ferreira MMC (2009) Basic validation procedures for regression models in QSAR and QSPR studies: theory and application. J Braz Chem Soc 20:770–787
Manchester J, Czermiński R (2009) CAUTION: popular “Benchmark” data sets do not distinguish the merits of 3D QSAR methods. J Chem Inf Model 49(6):1449–1454. doi:10.1021/ci9000508
Gasteiger J, Marsili M (1978) A new model for calculating atomic charges in molecules. Tetrahedron Lett 19(34):3181–3184. doi:10.1016/S0040-4039(01)94977-9
Gasteiger J, Marsili M (1980) Iterative partial equalization of orbital electronegativity—a rapid access to atomic charges. Tetrahedron 36(22):3219–3228. doi:10.1016/0040-4020(80)80168-2
Guillen MD, Gasteiger J (1983) Extension of the method of iterative partial equalization of orbital electronegativity to small ring systems. Tetrahedron 39(8):1331–1335. doi:10.1016/S0040-4020(01)91901-5
Bauerschmidt S, Gasteiger J (1997) Overcoming the limitations of a connection table description: a universal representation of chemical species. J Chem Inf Comput Sci 37(4):705–714
Streitwieser A (1961) Molecular orbital theory for organic chemists. Wiley, New York
Gasteiger J, Saller H (1985) Calculation of the charge distribution in conjugated systems by a quantification of the resonance concept. Angew Chem Int Ed Engl 24(8):687–689. doi:10.1002/anie.198506871
Gilson MK, Gilson HS, Potter MJ (2003) Fast assignment of accurate partial atomic charges: an electronegativity equalization method that accounts for alternate resonance forms. J Chem Inf Comput Sci 43(6):1982–1997
Gasteiger J, Hutchings MG (1983) New empirical models of substituent polarisability and their application to stabilisation effects in positively charged species. Tetrahedron Lett 24(25):2537–2540
Gasteiger J, Hutchings MG (1984) Quantitative models of gas-phase proton-transfer reactions involving alcohols, ethers, and their thio analogs. Correlation analyses based on residual electronegativity and effective polarizability. J Am Chem Soc 106(22):6489–6495. doi:10.1021/ja00334a006
Miller KJ (1990) Additivity methods in molecular polarizability. J Am Chem Soc 112(23):8533–8542. doi:10.1021/ja00179a044
Sadowski J, Gasteiger J (1993) From atoms and bonds to three-dimensional atomic coordinates: automatic model builders. Chem Rev 93(7):2567–2581. doi:10.1021/cr00023a012
Cleves AE, Jain AN (2006) Robust ligand-based modeling of the biological targets of known drugs. J Med Chem 49(10):2921–2938. doi:10.1021/Jm051139t
Hristozov DP, Oprea TI, Gasteiger J (2007) Virtual screening applications: a study of ligand-based methods and different structure representations in four different scenarios. J Comput Aided Mol Des 21(10–11):617–640. doi:10.1007/s10822-007-9145-8
Clark RD, Webster-Clark DJ (2008) Managing bias in ROC curves. J Comput Aided Mol Des 22(3–4):141–146. doi:10.1007/s10822-008-9181-z
Acknowledgments
Work in the Meiler laboratory is supported through NIH (R01 GM080403, R01 GM099842, R01 DK097376, R01 HL122010, R01 GM073151, U19 AI117905) and NSF (CHE 1305874).
Author information
Authors and Affiliations
Corresponding author
Electronic supplementary material
Below is the link to the electronic supplementary material.
Rights and permissions
About this article
Cite this article
Sliwoski, G., Mendenhall, J. & Meiler, J. Autocorrelation descriptor improvements for QSAR: 2DA_Sign and 3DA_Sign. J Comput Aided Mol Des 30, 209–217 (2016). https://doi.org/10.1007/s10822-015-9893-9
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10822-015-9893-9