Non-Stochastic and Stochastic Atom-based 3D-Chiral Linear Indices and their Applications to Central Chirality Codification

Non-stochastic and stochastic 2D linear indices have been generalized to codify chemical structure information for chiral drugs, making use of a trigonometric 3D-chirality correction factor. These descriptors circumvent the inability of conventional 2D non-stochastic linear indices to distinguish σ-stereoisomers. In order to test the potential of this novel approach in drug design we have modelled the angiotensin-converting enzyme inhibitory activity of perindoprilate’s σ-stereoisomers combinatorial library. Two linear discriminant analysis models, using non-stochastic and stochastic linear indices, were obtained. The models shown an accuracy of 100% and 96.65% for the training set; and 88.88% and 100% in the external test set, respectively. Canonical regression analysis corroborated the statistical quality of these models (Rcan of 0.78 and of 0.77) and was also used to compute biology activity canonical scores for each compound. After that, the prediction of the σ-receptor antagonists of chiral 3-(3hydroxyphenyl)piperidines by linear multiple regression analysis was carried out. Two statistically significant QSAR models were obtained when non-stochastic (R = 0.982 and s = 0.157) and stochastic (R = 0.941 and s = 0.267) 3D-chiral linear indices were used. The predictive power was assessed by the leave-one-out cross-validation experiment, yielding values of q = 0.982 (scv = 0.186) and q = 0.90 (scv = 0.319), respectively. Finally, the prediction of the corticosteroid-binding globulin binding affinity of steroids set was performed. The best results obtained in the cross-validation procedure with non-stochastic (q = 0.904) and stochastic (q = 0.88) 3D-chiral linear indices are rather similar to most of the 3D-QSAR approaches reported so far. The validation of this method was achieved by comparison with previous reports applied to the same data set. The non-stochastic and stochastic 3D-chiral linear indices provide a powerful alternative to 3D-QSAR.


Introduction
Asymmetry of atomic configurations is very important feature in determining the physical, chemical and biological properties of chemicals substances [1]. The non-superimposable mirror image isomers are called enantiomers, but may also be referred to as enantiomorphs, optical isomers or optical antipodes [2]. The molecules with identical 2D structural formulas containing more than one asymmetric atom as referred to as σ-diastereomers [3]. Most of the physical as well as chemical properties of chiral molecules are similar. At the same time, it is well know that many biological molecules are chiral and that the chirality plays an essential role in defining biological activity [1]. The case of thalidomide is an example of a problem that was, at least, complicated by the ignorance of stereochemical effects [4]. Thus, whenever a drug is to be obtained in a variety of chemically equivalent forms (such as a racemate); it is both good science and good sense to explore the potential for in vivo differences between these forms. In this connection, the regulation of Food & Drug Administration (FDA) requires a detailed study of both enantiomers [5].
Several quantitative measures of chirality have been developed in the past and were extensively reviewed [6][7][8]. Buda and Mislow distinguished between two classes of measures [6]. In the first class 'the degree of chirality expresses the extent to which a chiral object differs from an achiral reference object'. In the second one 'it expresses the extent to which two enantiomorphs differ from one another'. These methods yield a single real value, usually an absolute quantity that is the same for both enantiomorphs. A different idea was to incorporate R/S labels into conventional topological indices (TIs) [9]. Derived chirality descriptors were correlated with biological activity by Julián-Ortiz et al. [10], Golbraikh et al. [1] and more recently by González-Díaz et al. [11]. These indices are refereed as chirality TIs (CTIs). The main purpose on developing these descriptors is to be able to account for chiral molecules, which are well known to play an import role in medicinal chemistry. Very few of these descriptors have been reported in the literature to date, although the necessity of a more serious effort in this direction has been recognized by researchers in the area [12].
Recently, a novel scheme to the rational -in silicomolecular design and to QSAR/QSPR has been introduced by one of the present authors TOMOCOMD (acronym of TOpological MOlecular COMputer Design). It calculates several new families of molecular descriptors. In this sense, quadratic and linear indices have been defined in analogy to the quadratic and linear mathematical maps [13,14]. This approach has been successfully employed in QSPR [13,[15][16][17] and QSAR [14,[18][19][20][21][22] studies, including studies related to nucleic acid-drug interactions [23,24], and central chirality codification [25]. Finally, an alternative formulation of our approach for structural characterization of proteins was carried out recently [26,27].
The main aim of the present paper is to extend 2D linear indices of the "molecular pseudograph's atom adjacency matrix" in order to codify chirality related structural features. The problem of classification of ACE (Angiotesin-Converting Enzime) inhibitors, the prediction of σ-receptor antagonist activities and corticosteroid-binding globulin binding affinity of the Cramer's steroid data set are selected as illustrative example of method applications. These examples will be used as matter of comparison with other CTIs, 3D and quantum chemical descriptors as well.

2D non-Stochastic and Stochastic linear indices
The atom, atom-type and total 2D non-stochastic and stochastic linear indices of the "molecular pseudograph`s atom adjacency matrix" for small-to-medium sized organic compounds have been explained in some detail elsewhere [13,14,20]. However, an overview of this approach will be given.
For a given molecule composed of n atoms, the "molecular vector" (X) is constructed and the k th atom linear indices, f k (x i ), are calculated as a linear maps on ℜ n [f k (x i ): ℜ n → ℜ n ; thus f k (x i ): Endomorphism on ℜ n ] in canonical basis as shown in Eq. 1, where, k a ij = k a ji (symmetric square matrix), n is the number of atoms of the molecule, and X 1 ,…,X n are the coordinates or components of the "molecular vector" (X) in a system of canonical basis vectors of ℜ n . The components of the "molecular" vector are numeric values, which can be considered as weights (atom-labels) for the vertices of the pseudograph. Certain atomic properties (electronegativity, density, atomic radius, etc) can be used with this propose. In this work Pauling electronegativity was selected as atom weights [28].
The coefficients k a ij are the elements of the k th power of the symmetric square matrix M(G) of the molecular pseudograph (G) and are defined as follows: [14,16,20,22] where, E(G) represents the set of edges of G. P ij is the number of edges (bonds) between vertices (atoms) v i and v j and L ii is the number of loops in v i .
Note that linear indices's matrices, M k , are graph-theoretic electronic-structure models; like an "extended Hückel MO model". The M 1 matrix considers all valence-bond electrons (σ -and πnetworks) in one step and their power (k = 0, 1, 2, 3…) can be considering as an interactingelectron chemical-network model in k step. This model can be seen as an intermediate between the quantitative quantum-mechanical Schrödinger equation and classical chemical bonding ideas [10].
The present approach is based on a simple model for the intramolecular movement of all outershell electrons. Let us consider a hypothetical situation in which a set of atoms is free in space at an arbitrary initial time (t 0 ). In this time, the electrons are distributed around atom nucleus.
Alternatively, these electrons can be distributed around cores in discrete intervals of time t k . In this sense, the electron in an arbitrary atom i can move to other atoms at different discrete time periods t k (k = 0, 1, 2, 3,…) throughout the chemical-bonding network.
The k th stochastic molecular pseudograph´s atom adjacency matrix [S k (G)] can be obtained from where, k a ij are the elements of the k th power of M and the SUM of the i th row of M k are named the k-order vertex degree of atom i, i k δ . The k th s ij elements are the transition probabilities with the electrons move from atom i to j in the discrete time periods t k . Note, that k th element s ij take into consideration the molecular topology in k step throughout of the chemical-bonding (σ -and π -) network. Table 1 depict the calculation of the linear indices of the molecular pseudograph's atom adjacency matrix for 2-chloro-propionaldehyde. In the definition of the * X, as chiral molecular vector, the chemical symbol of the element is used to indicate the corresponding electronegativity value + 3D-chirality factor. That is: if we write O it means χ(O) (oxygen Pauling electronegativity) + sin((ω A +4∆)π/2). Therefore, if we use the canonical basis of ℜ 5 , the coordinates of any vector * X coincide with the components of that chiral molecular vector. sin((ω A +4∆)π/2) is the trigonometric chirality correction factor and take different values in order to codify specific stereochemical information such as chirality. 3D-chiral descriptor reduces to simples (2D) linear indices ones for molecules without specific 3D characteristics.

3D (R)-stereoisomer 'Classical' 2D-indices 3D (S)-stereoisomer Local and total non-stochastic chiral linear indices of order 0-
On the other hand, the defining equation (1) for f k (x i ) may be written as the single matrix equation: where [X] is a column vector (a nx1 matrix) of the coordinates of X in the canonical basis of ℜ n and M k the k th power of the matrix M of the molecular pseudograph (map's matrix).
Total (whole-molecule) linear indices are linear functional (some mathematicians use the term linear form, which means the same as linear functional) on ℜ n . That is, the k th total linear index The mathematical definition of these molecular descriptors is the following: where n is the number of atoms and f k (x i ) are the atom's linear indices (linear maps) obtained by Eq. 1. Then, a linear form f k (x) can be written in matrix form, for each molecular vector X∈ ℜ n .
[u] t is a n-dimensional unitary row vector. As can be seen, the k th total linear index is calculated by summing the local (atom) linear indices of all atoms in the molecule.

3D-Chiral linear indices.
The total and local linear indices, as defined above, can not codify any information about 3D molecular structure. In order to solve this problem we introduced a trigonometric 3D-chirality correction factor in molecular vector X [25]. In these sense, a chirality molecular vector is obtained ( * X), where the components of X (for instance, Pauling electronegativity (X A ) [28] of the atom A) are substituted by the following term [χ A + sin((ω A +4∆)π/2)].
ω A = 1 and ∆ is an odd number when A has R (rectus), E (entgegen), or a (axial) notation according to Cahn-Ingold-Prelog rules (8) = 0 and ∆ is an even number, if A does not have 3D specific enviroment = -1 and ∆ is an odd number when A has S (sinister), Z (zusammen), or e (ecuatorial) notation according to Cahn-Ingold-Prelog rules Thus, this 3D-chirality factor sin((ω A +4∆)π/2) takes different values in order to codify specific stereochemical information such as chirality, Z/E isomerism, and so on. This factor therefore takes values in the following order 1 > 0 > -1 for atoms that have specific 3D environments. The chemical idea here is not that the attraction of electrons by an atom depends on their chirality, due to experience shows that chirality does not change the electronegativities of atoms in the molecule in an isotropic environment in an observable way [29]. This correction has principally a mathematical means and must not be source of any misunderstanding. That is to say, this approach can be seen as a simplification of molecular structure. However, in other level of the theoretical chemistry this procedure has also been used. As was recalled by Dewar almost 20 years ago, the Schrödinger equation is not exact; it is only an approximation where electron spin is incorporated in the results only as an artifact [30].
A severe limitation of the Golbraikh-Bonchev-Tropsha (GBT) approach is the existence of different chirality corrections and we had great difficulty in selecting one of these. In this sense, Gonzalez et al. [11] introduced an exponential chirality factor (exp (ω A ∆)), which eliminated indetermination in the selection of chirality and 3D scales for stochastic topologic indices.
Unfortunately, this exponential factor does not solve the problem in GBT-like approaches. In this connection, the present trigonometric 3D-chiral correction factor is invariant with respect to the selection of other chirality scales for all kinds of such chiral topologic indices (GBT-like ones). Table 2 depicts the values of the trigonometric 3D-Chirality correction factor for all allowed values of ω A and ∆ (GBT-like chirality scale and other alternative chirality scales). In Table 2 clearly shown that the trigonometric 3D-chirality factor is invariant with respect to the selection of all possible real scales. That is to say, the factor gets ever the values 1, 0 and -1 for R, nonchiral and S atoms. As outlined above the demonstration of invariance for this factor with respect to other 3D features such as a/e substitutions and Z/E or π-isomer is straightforward to realize by homology. Henceforth, we do not need to answer the question regarding the best value for chirality correction at lest for linear scales [1,10,11].
A very interesting point is that the present 3D-chiral descriptor reduces to simples (2D) linear indices ones for molecules without specific 3D characteristics because sin(0+4∆)π/2 = 0, being ∆ zero or any even number. That is, when all the atoms in the molecule are not chiral, the

TOMOCOMD-CARDD (Computed-Aided 'Rational' Drug Design) molecular descriptors or any
GBT-like chiral topologic index do not change upon the introduction of this factor. This means that * X = X and thus, * f k (x) = f k (x).

Chemometric analysis
Statistical analysis was carried out with the STATISTICA software [32]. The considered tolerance parameter (proportion of variance that is unique to the respective variable) was the default value for minimum acceptable tolerance, which is 0.01. Forward stepwise procedure was fixed as the strategy for variable selection. The principle of parsimony (Occam's razor) was taken into account as strategy for model selection. In connection, we selected the model with a high statistical signification but having as few parameters (a k ) as possible. Finally, the calculation of percentages of global good classification (accuracy) and Matthews's correlation coefficient (MCC) in the training and test sets permitted the assessment of the model [33]. MCC is always between -1 and +1. A value of -1 indicates total disagreement (all-false predictions) and +1 total agreement (perfect predictions). The MCC is 0 for completely random predictions and therefore, it yields easy comparison with respect to random baseline. That is to say, MCC quantifies the strength of the linear relation between the molecular descriptors and the classifications, [33] and it may often provide a much more balanced evaluation of the prediction than, for instance, the percentages.

Linear Discriminant Analysis
We also developed the linear discriminant canonical analysis by checking the following statistic: Canonical regression coefficient (R can ), Chi-squared and its p-level [p(χ 2 )] [34].
On the other hand, Multiple Linear Regression (MLR) was carried out to predict σ-receptor antagonist activities of 3-(3-hydroxyphenyl)piperidines and the corticosteroid-binding globulin (CBG) binding affinity of a steroid data set. The quality of the models was determined examining the regression's statistic parameters and of the cross-validation procedures [35,36]. In this sense, the quality of models was determined by examining the determination coefficients (also know as squared regression coefficient; R 2 ), Fisher-ratio's p-level [p(F)], standard deviations of the regression (s) and the leave-one-out (LOO) press statistics (q 2 , s cv ) [35,37].

QSAR Applications and comparison with other theoretical studies
To evaluate the effectiveness of 3D-chiral linear indices, we have tested their ability to predict pharmacological properties in groups with a known stereochemical influence. First a data set of 32 perindoprilate stereoisomers, an angiotensin-converting enzyme (ACE) inhibitors, was used to test the applicability of the method [11,38]. ACE acts in plasma and blood vessels, removing the C-terminal dipeptide of undecapeptide Angiotesin I to produce the potent blood vessel constricting octapeptide Angiotesin II. In addition, ACE inactivates the hypotensive nonapeptide Bradykinin. For these reasons, ACE is the biological target of many important antihypertensive drugs called ACE inhibitors (ACEIs) [38]. Is this study active is taken to a mean a compound that has an IC 50 value no higher than 110 nm.
After that, a short data set of seven pairs of chiral N-alkylated 3-(3-hydroxyphenyl)piperidines that bind to σ-receptors, are also selected as illustrative example of the 3D-chiral linear indices application. The σ-receptors mediate severe side effects induced by various dopamine antagonists [10].
Finally, in order to validate even more 3D-chiral linear indices in QSAR studies, we select a molecular set that is well-know to QSAR researchers, the so-called Cramer's steroid data set.
This data set was introduced by Cramer et al in 1988 [39] using Comparative Molecular Field Analysis (CoMFA) methodology and since then has become a benchmark for the assessment of novel QSAR methods [40,41]. Various groups used this data set to compare the quality of their 3D-QSAR methodologies. Hence, this data set has become one of the most often discussed ones and can be seen as point of reference data set for novel molecular descriptors [42]. Even though this data set is not the ideal 3D benchmark data set, [42] it was used for the shake of comparability [43]. We use this molecular set, because all compounds in this data set contain chiral atoms, and binding affinities of these compounds are available [39]. Some structures of these compounds were drawn incorrectly in the original paper and were corrected in a recent work [41].

Classification of the ACE inhibitory activity of 32 perindopirilate's σ-stereoisomers
We In Table 3 [11] and the same number of variables that Marrero-Ponce et al. [25] used for develop their model using other 3D-chiral TOMOCOMD-CARDD descriptors. However, the accuracy of the model 9 for the training set is the best of all equations for this data set. In the model 10 this parameter, for the training and test set, are equal to the obtained when the 3D-chiral quadratic indices [25] were used and both are better than obtained for González-Díaz et al. (see Table 4). [11] On the other hand, canonical analysis is used here to test both the ability of 3D-chiral nonstochastic and stochastic linear indices to discriminate between the two groups of stereoisomers and also to order these compounds accordingly with their stability profile.
Canonical analysis is used here to test both the ability of 3D-chiral quadratic indices to discriminate between the two groups of stereoisomers and also to order these compounds accordingly with their stability profile. 3D-chiral total non-stochastic and stochastic linear indices & LDA ACEinhibitory activity canonical analysis principal root are given below: When LDA analysis is applied to solve the two-group classification problem we ever find two classification functions. However, we cannot use these two classification functions to evaluate all the compounds and obtain a bivariate stability map because they are not orthogonal [34]. To solve this problem we used canonical analysis in this case the dimensional reduction caused by canonical analysis makes possible to obtain a 1-dimension stability map [34]. That is the same that we can order all compounds taking into account its canonical scores. The canonical scores of all stereoisomer of perindoprilate appear in Table 3. For example, we can detect an overall ascendant tendency of canonical scores of equation (11) when they are plotted in the same order in which IC 50 increases (activity decreases). As it is expected, the over all mean of canonical root scores for the group of active isomers (lowest IC 50 values) has an opposite sign (-) with respect to the other group [(+); highest IC 50 values] [34].

Modelling σ-receptor antagonist activities of 3-(3-hydroxyphenyl)piperidines
We will now discuss the ability of 3D-chiral linear indices to predict σ receptor antagonist activities. 3D-linear indices are non-symmetric and reduce to classical descriptors when symmetry is not codified (see Table 1). Moreover, Gónzalez-Díaz et al. conclude that σ receptor antagonist activities are not a pseudoscalar property [11] and we can expect at least a good correlation with 3D-linear indices.
This experiment also permitted us to compare our method with others previously reported approaches. The MLR analysis was used to develop QSAR models for the σ receptor antagonist activities. The obtained models using non-stochastic linear indices are the follow: where, N is the size of the data set, R 2 is the squared regression coefficient (determination coefficient), s is the standard deviation of the regression, F is the Fischer ratio and q 2 (s cv ) are the squared correlation coefficient (standard deviation) of the cross-validation performed by the LOO procedure. This statistics indicate that these models are appropriate for the description of chemicals studied here. In the Table 5 are show the structure and values of experimental and calculated Log IC 50 for this data set. In the development of the first quantitative model for description of activities (Eq.13), one compound was detected as statistical outlier. Once rejected the statistical outlier, the Eq. 14 was obtained with better statistical parameters.
When the stochastic linear indices were used, the obtained model for the σ receptor antagonist activities is given below: The comparison with other methods previously reported for the same activity is shown in Table   6. As it can be seen, our models have statistical parameter slightly better than models obtained with MARCH-INSIDE molecular descriptors [11] and other chiral TIs [10], and our statistics are

Prediction of the Corticosteroid-Binding Globulin (CBG) binding affinity of a Steroid family.
The training set used to validate our methodology is made up of 31 molecules. Table 7 gathers the entire studied set with the actual binding affinities, taken from Robert et al. [45]. Due to the studied steroid molecular structures have been already depicted in several papers, they will not be included here. For more details see, for example Figure 1 in reference 39 or Figure 1 in This study also permitted us to compare our method with others 3D QSAR methods such as MQMS, MaP, CoMMA, TQSAR and so on. The MLR analysis was used to develop QSAR models for the corticosteroid-binding globulin binding affinity. The obtained models using nonstochastic linear indices are the follow: In addition, using stochastic linear fingerprints to describe the CBG binding affinity we obtained two models which are given below: In the development of the quantitative model (Eq.18), three compounds were also detected as statistical outlier. Once rejected these chemicals (4, 10 and 20), a new model (Eq.19) was obtained with better statistical parameters. Notice that this new model explains more than the 92% of the variance of the experimental CBG values. These two models uses seven variables each one to describe 31 and 28 steroids, respectively.
All these results are summarized in Table 8, where a comparison with other computational scheme can be more easily performed. Nevertheless notice that the present QSAR method, nonstochastic and stochastic 3D-chiral linear indices, obtains comparable results to other highly predictive QSAR models; even when they use more sophisticated statistic methods such as: partial least squared, principal components analysis, non-linear neural network techniques and so on. Many of the models objects of comparison were obtained from different procedures based on quantum mechanics and/or geometric principles as well as molecular mechanic approaches.

Final conclusions
Our studies demonstrated that 3D-chiral linear indices can be successfully applied in QSAR studies which include chiral molecules. Therefore, we suggest that 2D-QSAR methods enhanced by chirality descriptors present a powerful alternative to popular 3D-QSAR approaches.
We have shown here that the generalized TOMOCOMD-CARDD approach is not only able to discriminate between active and inactive perindoprilate stereoisomers, but also to codify information related to pharmacological property highly dependent on molecular symmetry of a set of seven pairs of chiral N-alkylated 3-(3-hydroxyphenyl)-piperidines that bind σ-receptors, and to predict the corticosteroid-binding globulin binding affinity of the Cramer's steroid data set. This result is only a preliminary conclusion and a deeper analysis of the potential of the 3Dchiral linear indices is necessary. However, we show that for three data sets chiral-QSAR models that use 3D-chiral linear indices had better or similar predictive ability as compared to other previously reported chiral and/or 3D-QSAR Methods.