ReviewInformaticsDescriptors and their selection methods in QSAR analysis: paradigm for drug design
Introduction
Differentiating between drug-like from non-drug-like molecules is essential to reduce the cost associated with failed drug development. Various in silico approaches have shown the potential for screening chemical databases against the desired biological targets for the development of new potential leads [1]. Among them, ligand-based virtual screening has become popular because of its ability to screen millions of molecules rapidly from available chemical databases [2]. QSAR modeling is an important approach in drug discovery that correlates molecular structure with biological and pharmaceutical activities [3]. Such 2D methods rely on the calculation and comparison of molecular properties with the aim of identifying molecules that are similar with respect to the query molecule. Compared with 3D (or structure-based) methods, 2D approaches require substantially lower calculation times and, therefore, are mostly used as preliminary filters to reduce the number of compounds that can be used for further screening in later stages of drug development [4]. These 2D approaches are widely used in academia, industry, and research institutions worldwide. For the development of a QSAR model, one should consider it in terms of (i) the fundamental chemistry of the set of analogs, including any outliers; (ii) quantitatively correlating and summarizing the relations between chemical structure alterations and relevant changes in biological endpoint to determine the chemical properties that are the most likely determinants of the biological activities of the drug candidate; (iii) optimizing the existing leads to improve their biological activities; and (iv) predicting the biological activities of untried compounds.
Different QSAR approaches have been developed over the past few decades 5, 6, 7, 8. These approaches can determine the reliable relations between variations in the values of calculated descriptors and the biological activity for a series of chemical molecules, so that they can be used for predicting the activity of untried or newly synthesized compound(s). The chemical structures used in QSAR model building are encoded by a substantial number of molecular descriptors. The model is built by using only a few descriptors that are valid for closely related compounds. Most of the learning algorithms become computationally intractable when numbers of features are large, such as in the training algorithm and production steps. High-throughput data used in statistical modeling pose a challenge to accurate prediction. Given the large amount of inherent noise and variation in samples and their high dimensionality, there is the risk of overfitting [9]. Thus, there is a need for descriptor selection to improve model performance and avoid overfitting.
Descriptor selection methods provide a way of reducing computation time, improving prediction performance, and providing a better understanding of the data in machine learning. Descriptor selection is an important step for several reasons [10], including: (i) using only a few descriptors increases the interpretability and understanding of resulting models; (ii) It can reduce the risk of overfitting from noisy redundant molecular descriptors; (iii) it can provide faster and cost-effective models; and (4) it removes the activity cliff. However, noisy, redundant, or irrelevant descriptors should be removed in a way that the dimension of the input space is reduced without any loss of significant information [3]. In this review, we provide an update on, and a brief explanation of, commonly used descriptors, with a particular emphasis on their selection approaches for the development of more reliable, predictable, and generalized QSAR models.
Section snippets
Molecular descriptors
Despite great advances in the field of drug design, the use of descriptors to define the molecular structure of biologically active compounds is the main method utilized to discover new lead molecules. Descriptors are the chemical characteristic of a molecule in numerical form, used for QSAR/QSPR studies. Fig. 1 depicts the basic definition of these descriptors. Mathematical representation of these descriptors has to be invariant to the size of the molecule and the number of atoms it contains
Concluding remarks
Molecular descriptors are an essential part of the methodological toolbox used to study structure–property correlations and are widely used to optimize the characteristics of compounds in molecular design. Reliable prediction of these descriptors is significant for the development of predictable QSAR models, because accurate predictions can limit the number of expensive and time-consuming experiments required to synthesize the active novel hits with optimized pharmacodynamic and pharmacokinetic
Acknowledgments
We acknowledge DBT (Department of Biotechnology), Government of India for the support and internal facilities of the department. This work was supported by internal funds from the Biotechnology Unit, AMU and ICMR Grant; AMR/5/2011-ECD-1 and DBT grant; BT/PR8281/BID/7/448/2013 and BT/HRD/NBA/34/01/2012 to A.U.K.
References (79)
- et al.
Structure based virtual screening to discover putative drug candidates: necessary considerations and successful case studies
Methods
(2015) Machine-learning approaches in drug discovery: methods and applications
Drug Discov. Today.
(2015)Highly discriminating distance-based topological index
Chem. Phys. Lett.
(1982)Hydrogen bonding descriptors in the prediction of human in vivo intestinal permeability
J. Mol. Graph. Model.
(2003)- et al.
A wrapper method for feature selection using support vector machines
Inf. Sci.
(2009) - et al.
A GA-based feature selection and parameters optimization for support vector machines
Expert Syst. Appl.
(2006) Hybrid-genetic algorithm based descriptor optimization and QSAR models for predicting the biological activity of Tipranavir analogs for HIV protease inhibition
J. Mol. Graph. Model.
(2010)Similarity-based virtual screening using 2D fingerprints
Drug Discov. Today.
(2006)Analysis and comparison of 2D fingerprints: insights into database screening performance using eight fingerprint methods
J. Mol. Graph. Model.
(2010)The connectivity index 25 years after
J. Mol. Graph. Model.
(2001)
Novel molecular description for structure–property studies
Chem. Phys. Lett.
Wiener index revisited
Chem. Phys. Lett.
A novel set of Wiener indices
J. Mol. Graph Model
Virtual screening strategies: a state of art to combat with multiple drug resistance strains
MOJ. Proteomics Bioinform.
Descriptor selection methods in quantitative structure–activity relationship studies: a review study
Chem. Rev.
In silico virtual screening approaches for anti-viral drug discovery
Drug Discov. Today.
QSAR modeling: where have you been? Where are you going to?
J Med. Chem.
Activity prediction and identification of mis-annotated chemical compounds using extreme descriptors
J. Chemometrics.
Comparative analysis of QSAR-based vs. chemical similarity based predictors of GPCR binding affinity
Mol. Inf.
Top scoring pairs for feature selection in machine learning and applications to cancer outcome prediction
BMC Bioinform.
Feature selection methods in QSAR studies
J. AOAC Int.
Graph Theory
Topological descriptors in drug design and modeling studies
Mol. Diver.
Structural determination of paraffin boiling points
J. Am. Chem. Soc.
On characterization of molecular branching
J. Am. Chem. Soc.
Indexes of molecular shape from chemical graphs
Acta Pharm. Jugosl.
The first Zagreb index 30 years after
MATCH Commun. Math. Comput. Chem.
Molecular Connectivity in Structure Activity Analysis
Generalized molecular descriptors
J. Math. Chem.
TMACC: interpretable correlation descriptors for quantitative structure–activity relationships
J. Chem. Inf. Model.
Interpretable correlation descriptors for quantitative structure–activity relationships
J. Cheminf.
Handbook of Molecular Descriptors
Atomic physicochemical parameters for three-dimensional structure-directed quantitative structure–activity relationships I. Partition coefficients as a measure of hydrophobicity
J. Comput. Chem.
Lipophilicity indices for drug development
J. Appl. Biopharm. Pharmacokinet.
The parameterization of lipophilicity and other structural properties in drug design
Adv. Drug. Res.
Lipophilicity in drug discovery
Expert Opin. Drug Discov.
Calculation of hydrophobic constant (log P) from pi. and f constants
J. Med. Chem.
Intramolecular hydrogen bonding to improve membrane permeability and absorption in beyond rule of five chemical space
Med. Chem. Commun.
Cell permeability beyond the rule-of-5
Adv. Drug Deliv.
Cited by (248)
The substitution sites of hydroxyl and galloyl groups determine the inhibitory activity of human pancreatic α-amylase in twelve tea polyphenol monomers
2024, International Journal of Biological Macromolecules