Computer aided drug design

Computer based method can help in discovery of leads and can potentially eliminate chemical synthesis and screening of many irrelevant compounds, and in this way, it save time as well as cost. Molecular modeling systems are powerful tools for building, visualizing, analyzing and storing models of complex molecular structure that can help to interpretate structure activity relationship. The use of various techniques of molecular mechanics and dynamics and software in Computer aided drug design along with statistics analysis is powerful tool for the medicinal chemistry to synthesis therapeutic and effective drugs with minimum side effect.


Introduction
Drugs are the chemicals that prevent, treat, and make a diagnosis disease-restoring health to diseased individuals. As such, they play a central role in new medicine. For many years the strategy for discovering new drug consisted of taking a lead structure and developing a chemical programme for finding analog molecule exhibiting the desired biological properties. The process involved several trial and error cycles patiently developed and analyzed by medical chemist utilizing their experience and chemical intuition to ultimately select a candidate analog for further development. The entire process is laborious, expensive, and perhaps when looked today, conceptually inelegant [1]. Medicinal chemistry is the branch of science, which provides these drugs through discovery or design. As ever increasing understanding of the nature of the disease how cell works and how drug influence these processes has in the last two decades led more and more to the purposeful design, synthesis and evaluation of the candidate drug molecules [2]. The strategies of drug design have altered significantly within the past few decades. Whereas chemistry, biological activity hypotheses, and animal experiments dominated drug research, especially in its "golden age," from the sixties to the eighties of the last century, a lot of new technologies have developed over the past 20 years [3].
It was recognized in the 60s, that computer based method can help in discovery of leads and can potentially eliminate chemical synthesis and screening of many irrelevant compounds, and in this way, it save time as well as cost [4][5][6]. Molecular modeling has opened the way to the discovery of lead structure by a rational approach, and its central role in rational drug design has become fully apparent. The putative ligand can either be extracted from the large library of compounds [7][8] or can be obtained by joining the molecular fragment [9][10][11] or atom computationally. After that second pre requirement is an accurate prediction of the binding affinities which are a much more difficult task to achieve. A computational method that attempts to design leads, different in nature and in the degrees of the simplifying assumption they use [12].

Computer Aided Drug Design
Computer aided drug design CADD can contribute not only the design of potent compounds but too many of the step of going 'from concept to clinic.' In the development of the new drug, CADD methodologies and technology are used to calculate molecular properties to aid in drug design process [14]. CADD methods are aimed at using the information which is in the three-dimensional structure of the unliganded target to design completely new lead compounds de novo, as well as to construct large virtual combinatorial libraries of compounds that then can be screened computationally before going to attempt and expense of actually synthesizing and testing them [15]. Some example of successful use of CADD for drug discovery  Design of thymidylate syntheses inhibitors as anticancer agents [16],  HIV protease inhibitors as antiviral agents [17],  Neutrophil elastase inhibitors [18],  Carbonic anhydrase inhibitor as antiglaucoma agent [19],  Discovery of novel sweetners [20] Molecular modeling Molecular modeling [21] has become a well-established research area during the last decade due to advances in computer hardware and software. Molecular modeling systems are powerful tools for building, visualizing, analyzing and storing models of complex molecular structure that can help to interpretate structure activity relationship.

Molecular modeling functions
There are many possible method and applications of molecular modeling 1) Structure generation or retrieval Molecular structure may be generated by avariety of procedures. If a crystal structure exists, it can be found in the Cambridge crystallographic data file and turned into molecular coordinates by astandard method. CONCORD [22] programme is used to convert the 2D structure into 3D molecular coordinates. LUDI [23] is programme to generate novel structure quickly according to some algorithm.
2) Structure visualization One of the most popular uses of molecular modeling system is to visualize molecular structure and interaction by a different method [24]. Representationof the structure is done with the stick model, Ball, and Stick model, Space -filling and Surface model.

3) Conformation generation
Molecular modeling to calculate possible conformation and deduce which conformation (s) is important for the property of interest. Monte Carlo technique produced conformation that can be analyzed statistically or energetically [25].

4) Deriving bioactive conformations
Given a molecule with biologically active, it is necessary to determine the conformation(s) of the molecule associated with its activity. If one assumes that two or more molecule binds in common modes, then the search for a common conformation may be made. For searching bioactive conformer involves the calculation of energetic since only low energy conformations and the second approach involves the generation of possible conformations in a systematic way.So that all possible binding patterns can be explored on the basis of criteria that one would be interested in include energy, allowed bonding and nonbonding distance, torsional angular increment, and other factors. Gundet al. have performed systematic conformations searching coupled with distance and energy parameters to determine bioactive conformations of a series of semi rigid nicotinic agonists [26]. 4

1234567890
The

5) Molecule superposition and alignment
Computing properties of molecules often involve comparisons across a homologous series, and that is done by superimposition or alignment of the molecules so that their differences become obvious and interpretable. When molecules possess a large, rigid common substructure, their alignment is relatively easy. In general case, the molecules are sufficiently different in structure or conformation that their alignment is unobvious and perhaps not unique [27]. Molecular modeling used for superimposition by differentprograms like DISCO and Aladdin [28,29]. Crippen [29] superimposed molecules of thedifferent structure using 3-D quantitative structureactivity relationship (QSAR) method based on distance geometry. 6) Deriving the pharmacophoric pattern A pharmacophore may be defined as the essential geometric arrangement of atoms or functional groups necessary to produce a given biological response. A given set of biologically active molecules which produces activity by the same mechanism are assumed to have the same essential pharmacophore. Pharmacophores have been described as topologic (graph-theoretic or connectivitybased structural fragment) and topographic (geometric, usually 3-D patterns) [29,30].
Automatic generation of pharmacophoric patterns is the goal of several modeling packages [31]. Automated pharmacophore mapping is now available in programs from Biocad, Biosyn, Chemical Design Ltd., and Tripos.

7) Receptor mapping
When the structure of the receptor is not known, a "receptor map" may be constructed using the principle of receptor ligand complementarily.
A number of algorithms are available to compute unions of molecular volumes. Pseudoelectron density functions calibrated to reproduce Van der waals radii have been mapped to a 3-D grid to compute the union, intersection, and substraction of volumes. Analytical volume representation by Connolly may be an alternative which would allow optimization of volume overlap [32,33].

8) Estimating biological activities
In reality, drugs vary in activity from being very active or potent to being inactive. QSAR is a technique that quantifies the relationship between structure and biological data and is useful for optimizing the groups that modulate the potency of a molecule [34].
Hanschet al. suggested that if one employed cluster analysis on all possible substituent's and included one member from each of the clusters of similar substitutions, the designed ligands would show more independent variation in physical properties [35].

9) Molecular interactions
Modeling the interaction of a drug with its receptor is a complex problem because there are too many degrees of freedom and insufficient knowledge of the effect of solvent on the binding association and is stereospecific determined by a fit of the molecule to the receptor.Many forces are involved in the intermolecular association: hydrophobic, dispersion or Van der Waals, hydrogen bonding and electrostatic [36]. This type of interaction is determined by the structure or fit of the drug molecule to the receptor site and induces a common biological response. 10) Calculation of molecular properties Molecular properties are important indicators of theutility of various molecules and categorized as physical (electronic, thermodynamic, physical state), chemical (reactivity, solubility, dynamics, explosively) or biological (enzyme inhibition, receptor toxicity, metabolism).

11) Energy calculations
Geometry optimization and energy minimization of the substrates can be carried out using quantum chemical ab initio and or semi empirical methods. For larger systems, molecular mechanics methods (force field techniques) can be employed. The three majortheoretical, computational methods of calculating properties of molecules include empirical, semiempricaland abinitio (QM) methods. Molecular mechanics [37] and semiemperical methods rely on embedded empirical parameters, whereas, ab initio quantum mechanical methods are potentially capable of reproducing an experiment without such parameters.
a) Molecular mechanics Molecular mechanics method is less complicated, fast and is able to handle very large systems including enzymes [38,39]. Molecular mechanics methods are widely used to give accurate structure and energies for molecules. Molecular Mechanics energy minimization involves successive iterative computations where an initial conformation is submitted to full geometry optimization. All parameters defining the geometry of the system are modified by small increments until the overall structural energy reaches a local minimum. The local minimum, however, may not be the global minimum.
b) Semi empirical calculation The semi empirical [40] method uses themathematical formulation of the wave function which describes hydrogen like orbits. Semi empirical calculations differ in the approximations that are made concerning repulsion between electrons in adifferent orbital. The approximations are adjusted by parameterizing values to corresponding to either abinitio data or available experimental data. The earliest methods used were the extended Huckel theory and CNDO. Improved method includes MINDO-3, MNDO, and AMI. MOPAC contains the MINDO/3 and MNDO programs and has been parameterizing. c) Quantum mechanics : ab initio The solution of the Schrödinger equation with approximations is the basis of semiemprical calculation, whereas the solution of the Schrödinger equation without approximations is the basis of the highest level quantum mechanical calculation so-called "abintio" method. Abintio method are most useful for case where there is no experimental data to draw from but suffer from the disadvantages that much computer power is need and therefore the method is not routinely for systems with more than 50 heavy atoms [41].
Two major modeling strategies currently used in the conception of new drugs:  Direct drug design  Indirect drug design

Direct drug design
In this approach, the three dimensional features of known receptor sites are directly considered which are obtained from the X-ray crystallography or NMR analysis for the design of the lead structure. The X-ray crystal structure of macromolecules and their complexes with ligand may be obtained from protein data bank. If a protein structure is known, it can be used for explicit docking of ligands.

Docking
One goal of computational chemistry is to predict the binding interactions of molecules. Docking is a direct drug design approach which deals study of interactions between drug, and receptor [42]. Docking studies may help to increase ligand specificity; and also better therapeutic index can be achieved. Docking plays a role in the QSAR methods and homology modeling and is useful in the Computational docking always requires two components, which may be briefly characterized as "searching" &"scoring." The searching is to find the orientation and conformation of the interacting molecules corresponding to the global minimum of the free energy of binding. "Scoring" [43] refers to the fact that any docking procedure must evaluate and rank the configurations generated by the search process. Scoring is actually composed of three different aspects relevant to the docking and design:  Ranking of the configurations generated by the docking search for one ligand interacting with a given protein; this aspect is essential to detect the binding mode best approximating the experimentally observed situation.
 Ranking different ligands with respect to the binding to one protein, that is, prioritizing ligands according to their affinity; this aspect is essential in virtual screening.
 Ranking one or different ligands with respects to their binding affinity to different proteins; this aspect is essential for the consideration of selectivity and specificity [44].
 Various approaches to scoring aspects [45] that try to capture the essential elements of proteinligand interactions have been described by several workers as outlined in Table 1. Homology modeling Homology modeling [47,48,49] involves taking a known sequence of an unknown structure and mapping it against a known structure of one or several similar (homologous) proteins. It would be expected that two proteins of similar origin and function would have reasonable structural similarity. Therefore, it is possible to use the known structure as a template for modeling the structure of the unknown structure [48]. Homology modeling approaches consists of following steps [49,50] as shown in Figure 2.

Indirect drug design
The design is based on the comparative analysis of the structural features of known active and inactive molecules that are complementary toa hypothetical receptor site. A successful application of this technique has been in the designing of drug Sildenafil by Pfizer laboratory, England [51].

Quantitative structure activity relationship (QSAR)
Quantitative structure activity relationship (QSAR), a method introduced in 1960, is a useful tool to establish quantitatively the relationship between various physicochemical properties and biological activity of compounds. One of the most important goals of new age of QSAR as an integral part of The drug design and discovery is a unique growth of bimolecular database which contained data on chemical structure and in some cases, biological activity of chemical [52].

Role of QSAR in design of better drugs
o QSAR allows making a quantitative prediction of the potency of new analogs [53]. o It is used for the design of series based on a lead molecule. o One can also find anew region on the receptor and thus start a whole new series. o QSAR helps to decide to stop synthesis in a series. o Helps to find the mechanism of action of the molecule and thus complement receptor mapping technique. o The QSAR result can be used to understand interactions between functional groups in the molecules of greatest activity, with those of their target. o Quantifying the relationship between structure and activity provides an understanding of the effect of structure on activity. 2) Free Wilson analysis (additivity model or denovo approach) It is true Structure activity relationship model [56]. It is based on the following assumption:

Drawback of QSAR
 The entire drug listed should have the same parent structure.
 The substitution pattern in various derivatives has to be same. The equation is solved by multilinear regression using the presence (1) or absence (0) of the different substituents as independent dummy parameters, while the measured activity served as thedependant variable.

3) Mixed approach
It is Intercorelation between Free Wilson parameters and physicochemical properties of the substituent's used [57]. Form the general formulation of linear Hansch equation (Φi is any physicochemical property) group contribution a i can be derived for each substituent's under consideration (Eq.1Φij is a physicochemical property j of the substituent Xi).
It was apowerful tool for the quantitative description of large and structurally diverse data sets.

Descriptors used in QSAR
The various physicochemical properties of the compound can be represented by means of descriptors in QSAR [58]. Some descriptors used in QSAR are shown in Table 2

Different approaches that can be used in QSAR
1) 1D QSAR 1D QSAR [59] makes use of structural descriptors only, like functional groups or atom centered fragments and is empirical in nature. It uses indicator feature and zeroes for its absence as sole parameters. It can be used in conjunction with other 2D or 3D variables. QSAR model is generated by some statistical method, and the parameter is contributing to the biological activity then the functional group is essential if negatively contributing then not essential to biological activity.
Biological activity = f (molecular or fragment descriptor) 3) 3D QSAR 3D QSAR [67][68] is a well recognized method at a molecular level the interaction that produced an observed biological response are usually non covalent and that such steric and electrostatic interaction can account for many of the observed molecular properties. Extension of the traditional QSAR approaches has been developed which explicitly uses the 3D geometry of structure during the development of the QSAR model. The fundamental problems when trying to develop a good and predictive 3D QSAR model is the identification of the bioactive conformation(s) of the investigated compounds and how to align them. 3D QSAR is quantitative models that relate the biological activity of small molecules with their properties calculated in 3D space.
A variety of techniques can broadly be used in 3D QSAR: The a. Comparative molecular field analysis (COMFA) COMFA slowly developed from the first attempt to place molecule in the grid and to correlate properties of the molecules with biological activities. COMFA [69][70][71] introduced in 1988, depends on the fact that biological activity is especially sensitive to spatially localized differences in molecular field intensities. It is the most common approach for the determination of steric and electronic features of series. It is also identifying a region in space that is favorable or unfavorable for ligand receptor interactions. Molecules are described with molecular interaction fields similar to those computed by PLS, and cross-validation [72,73] and the output is displayed as contours superimposed on the molecules. Contour maps are helpful in suggesting new compound likely to have higher property values and potential for synthesis. b. Molecular shape analysis It was introduced in 1980. Since the shape of the molecule is important in the receptor interaction, hence MSA is used to incorporate shape data in QSAR equations. The basic goal of MSA [74] is to identify the biologically relevant conformation without knowledge of the receptor geometry and in a quantitative fashion, explain the activity of a series of analogues using only the structure activity table between a reference structure and the compound of data set. c. 3D-QSAR based on molecular similarity metrics The basic underlying principle of both active subset selection and structure activity relationship studies assumes that compounds similar to biologically active ones should also be active and vice versa. The use N x N similarity matrices as the input vector [76]. During the process of molecule alignment, one of the compounds is very potent ligand, and the other are superimposed on it by maximizing the degree of steric and electronic similarity [77][78]. A the matrix can be generated by pair wise compound shape similarity comparison, in which each matrix entry is a measure of shape similarity between the corresponding pair of molecules. Analysis can be done by using PLS. d. Distance geometry method Distance geometry represents molecule in term of interatomic distances rather than certain coordinates [79][80][81][82][83]. Because the distance between atom does not change with rotation and translation of the whole molecule, the analysis of distance matrix can detect if and how a certain atom of two or more molecules overlap without performing a superposition. Distance geometry offers not only the projection of important intratomic distance that relates to biological activity but also to postulate critical intermolecular-binding distance that may be involved in the ligand-receptor interaction. e. Quantitative binding site model, COMPASS Compass algorithm that automatically deduces interchemotype relationships and generates predictive quantitative models of receptor binding based solely on structure-activity data [84]. It predicts the bioactive conformation, alignment and binding affinities of a series of ligands in an automated procedure based on surface type properties. COMPASS uses steric, hydrogen bond donor and hydrogen bond acceptor distances [85]. f. Receptor surface model.
Receptor surface model (RSM) [86] generates a surface loosely enclosing the common volume of the most potent ligands. Points on this surface are described by the proficient of the average partial charges, electrostatic potential, and hydrogen bonding ability or by the average hydrophobic of the most active compounds. By this method user decide which active compound to use and how close to the Vander walls envelope of the overlapped molecule the surface should be. Genetic function algorithms [87] can be used to detect which set of molecule produces the most predictive models. It is a collection of chemical feature distribution in 3Dspace that is intended to represent the group in the molecule that participates in important binding interaction between drugs and their receptor [89]. Pharmacophore model is spatial arrangement of atoms are functional group believed to be responsible for biological activity [90]. i. 3D QSAR based on the intermolecular contribution to binding energy In this method use of intermolecular binding energies of ligand structure and receptor interaction independent and biological activity values as dependent variables to derive a significant QSAR model to predict the binding potency of another ligand. j. Comparative binding energy analysis Comparative binding energy analysis (COMBINE) [92][93] can be regarded as the receptor dependent analogue method to COMFA. The different region of the receptor serves as a probe for the elucidation of major interaction sides. For each ligand, the intermolecular and interamolecular energies are calculated for the ligand receptor complexes, the unbound ligand, and the receptor. k. De NOVO ligand Design approaches De novo ligand [94] design algorithms are applicable when the structure of the receptor is available, and they generally involve predicting a ligand's complementary to the receptors active side by using a geometry or partial force field based scoring function. l. Hypothetical active site lattice Hypothetical active site lattice (HASL) was described in 1988, the same year as COMFA. HASL is related to the COMFA methodology and also to MSA [95,96].The HASL approach represents each of the shapes of the molecules as a collection of 3Dgrid points. It determines the number of lattice points that represent a molecule and also the resolution of the generated receptor map. m. Genetically Evolved receptor models Genetically evolved receptor models (GERM) is based on the principle to produce atomic level models of receptor sites based on atrial set of ligands [97]. It optimizes the correlation of the biological potency of molecules with the potential energy of the interaction of the ligands with probes placed on the union surface of the superimposed molecules. The novel feature of GERM is the character of the probes is not fixed, but rather evolves during the calculation using agenetic algorithm [98,99].

4) 4D QSAR
4D QSAR tool with the fourth dimension being the possibilities to represent each molecule by an ensemble of conformations, orientation, prolongation states and thereby reducing the bias associated with ligand alignment [100].

5) 5D QSAR
The Concept of 4D QSAR (Software Quasar) has been extended to 5D QSAR [101] by an additional degree of freedomthe fifth dimensionwhich allows for multiple representations of the topology of the quasi atomistic receptor surrogate. While this entity may be generated using up to six different induced fit protocols, the simulated evolution converges to a single model and that 5D QSAR due to 13

1234567890
The the fact that model selection may vary throughout the entire simulationyields less biased results than 4D QSAR where only a single induced fit model can be evaluated at a time.

Simple linear regression
Simple linear regression (SLR) 102Describes the relationship between a single independent variable (X) and a single dependent variable (Y), assuming a linear (straight line) relationship between the dependent variable and the independent variable. Y = a + bx ± e Whereais the intercept, b is the slope, e is an error term.

Multiple linear regressions
Multiple linear regressions [102] (MLR) calculates, QSAR equation by using multiple variables in a single equation. The variable should be independent and to minimize the possibility of chance correlations. The number of independent variablescan not be more than the one fifth of the compound in the training set. It describes the relationship between two or more (m) independent variables (X1, X2, Xm) and a single dependent variable (Y), assuming a linear (straight line) relationship between the dependent variable and the m independent variables. Y = a + b1X1 + b2X2 + …. bmXm± e Whereais the intercept, b'sare the slopes, e is an error term.

Stepwise multiple linear regressions
It calculates QSAR equation by adding one value at a time testing each addition for significance. Only variable found to be significant are used in the final QSAR equation. It is a multiple regression procedures where independent variables (Xi) are entered into the multiple regression equations. Each independent variable is tested to see whether it meets specific selection criteria before it is allowed to enter (forward stepwise) or remain in (backward stepwise) the regression.

Sequential linear multiple regression
It is a process in which all the permutations and combinations of the parameters are tested sequentially to get the best set of n parameters which contribute significantly to the dependent variable optimally.

Partial least square analysis
It is the most promising new approach in themultivariate statistic. Hundred or even thousand of theindependent variable in (the X block) can be correlated with one of several dependant variables (the Y block). PLS is used when the X data contain colinearties or when N is less than 5 M, where N is the number of compounds and M is the number of dependant variable. Often perfect correlations are obtained in PLS analysis due to a large number of X variable. It can be used to solve the problems like the ratio of compounds to descriptor should greater than five, and the descriptor should not be intercorelated. PLS can handle numerous and even collinear variables.
6. Discriminate analysis Discriminate analysis [103] separates things with different properties e.g. active and inactive compound, by deriving a function of other features which gives thebest separation of individual class. So it gives bioactivity classification such as active Vsinactive. 14

1234567890
The

Cluster analysis
The non statistical method that is used to study the relationship between compounds based on their physical and biological properties. The cluster was found on the basis of thedistance between compounds in a space formed by the physical or biological properties. Each compound is a point in this space [104].
8) Principle component analysis PCA [104] analysis has recently been used to describe the biological activities and molecular diversity of heterocyclic aromatic ring fragment. The aim of this study was to identify principal component which correlates the chemical structures with biological activities and to enable medicinal chemists to rationally select which heterocyclic rings to synthesize in order to optimize biological activities. It is the fraction of the variability in the dependent variable that is explained by the variability in the dependent variable(s). The explanatory power of the regression is summarized by its "R-squared" value, computed from the sums-of-squares terms as -R 2 = SSR/SST = 1-SSE/SST If the regression is "perfect," all residuals are zero, SSE is zero, and R 2 is 1. If the regression is a total failure, the SSR equals the SST, no variance is accounted by regression, and R 2 is zero.
c) Adjusted coefficient of determination (R 2 adj) It provides an unbiased estimate of R 2 . The R 2 value for a regression can be made arbitrarily high simply by including more and more predictors in the model. The adjusted R 2 is one of the several statistics that attempts to compensate for this artificial increase in accuracy. The adjusted R 2 is given by

R 2 adj = 1-(n-1)/ (n-p)*(1-R 2 )
Where, n = sample size, p = number of predictors in the model, not counting the constant term, R 2 adj value is always lower than R 2 ii) t statistic The t value for each coefficient tests the null hypothesis that the true value of the coefficient does not differ from 0.It tests the hypothesis that there is no relationship between the dependent variable and the independent variable associated with a given data.
Looking for a t-ratio greater than 2 in absolute value is a common rule of thumb for judging significance because it approximates the 0.05 significance level.
t statistic for the regression slope  tests whether it is significantly different from 0 (or any particular value, b); it is calculated as tn-k-1 = ( -b)/(Se/ x 2 ) 16

1234567890
The It is closely related to the Fisher ratio (F).
The best model will present the highest value of the FIT function . In theR 2 pred calculation, predicted value is replaced by estimated value.

5) Cross-Validation a) Leave One Out
In the case of leave one out (LOO) [106,107] where by sequentially removing one data point at a time. The ttraining dataset is divided into subsets (number of subsets = number of data points) of equal size. The model is build using these subsets, and the predict and (dependent variable) value of the data point that was not included in the subset is determined, this is the predicted value. Mean of predict and will be same for R 2 and LOO.q 2 (since all the data point will be sequentially considered as predict and in the LOO subset) b) Cross-validated correlation coefficient value (q 2 ) It is obtained by LOO method, where a model is built with N-1 compounds, and the N th compound is predicted. Each compound is left out from the model derivation and predicted in turn.
In LOO calculation, predicted value is replaced by LOO predicted value. The condition q 2 >0.5 is the basic requirement for declaring a model to be a valid one.
c) Predicted residual sum of squares The predicted residual sum of squares (PRESS) procedure is equivalent to "leave-one-out" cross-validation, as described previously. The PRESS statistic is defined as is the residual for observation i computed as the difference between the observed value of the predictand and the prediction from a regression model calibrated on the set of n−1(i) observations from which observation i was excluded. d) Standard error of prediction The standard error of prediction (S P RESS ) is the sample prediction of the variance of the regression predicted residuals.

S PRESS = (PRESS/n-K-1)
e) Standard deviation of prediction Standard deviation of prediction (S DEP ) is described as

SDEP =  (PRESS/n)
Internal consistency of the model is supported by S P RESS and S DEP . These values are calculated from LOO cross validation. Lower values of these parameters describe the better predictability of this model. This value depends upon the variability (log difference) in the observed value of predictand (e.g. for two log difference it should be below 0.3).

f) Bootstrapping
Bootstrapping analysis was performed for further access to the robustness and statistical confidence. In bootstrapping repetitively analyzed sub-samples of the data. Each sub-sample is a random sample with replacement from the full sample. One data point can be represented more than once or not at all, but the total number of data points should remain constant.
The bootstrapping analysis gives an overview of thecontribution of individual molecules to the QSAR model.
i. The Bootstrapping squared correlation coefficient Bootstrapping squared correlation coefficient (R 2 bs is the average squared correlation coefficient calculated during the validation procedure that is computed from a subset of data points used one at a time for the validation procedure. g) Chance Chances of fortuitous correlation were tested with the help of Chance statistics. It is evaluated as the ratio of the equivalent regression equations (equal or greater than conventional R 2 ) to the total number of randomized sets. In the randomized set, Independent parameters values are frozen, and dependent parameter values are shuffled. Later R 2 values for these randomized sets are calculated. A chance value of 0.001 corresponds to 0.1% chance of fortuitous correlation). Chance = (Nos of R 2 >= conventional R 2 ) / Number of randomized sets 18

1234567890
The