Inhibition of Tetrahymena pyriformis growth by Aliphatic Alcohols and Amines : a QSAR Study

A Quantitative StructureActivity Relationship (QS AR) study was undertaken to evaluate the relative t oxicity of a mixed series of 21 (linear and branched-chain) al cohols and 9 normal aliphatic amines in term of the 50% inhibitory growth concentration (IGC 50) of Tetrahymena pyriformis . The applied simple linear regression approach is based on theoretical 3D (geometrical ) molecular descriptors from DRAGON package, and some calculated logP descriptors. The robustness and the predictive performance of the models were verifie d using both internal (cross-validation by LOO and LMO; boo tstrap) and external statistical validations. ClogP turned out to be the best descriptor to model the consider ed endpoint. It may be interchanged with geometrica l descriptor ADDD without relevant variations in the statistical parameters.


INTRODUCTION
The impact of the potential hazard of untested chemicals, a challenge confronting national and international regulatory agencies [1][2][3][4] can be measured by experimental investigations, but this approach is both quite expensive and time consuming [5].An alternative is to rely on QSAR (Quantitative Structure-Activity Relationships) models that describe a mathematical relationship between the structural features of a set of chemicals and the particular activity associated with them [6,7] .Several QSAR models predicting acute chemical toxicity for aquatic environment have been published [8][9][10][11][12].They are based mainly on the logarithm of the octanol-water coefficient (logP, also referred to as logKow) as this hydrophobicity term reproduces the ability of a substance to enter cells through the lipid membranes and indicates both the toxicant uptake and baseline toxicity.
Albeit the number of compounds with a measured value for the logP was estimated to be 30 000 [13], which seems at a first glance to be high, this is negligible compared to the rapidly increasing number of compounds for which logP values are desired but missing.Furthermore, the experimental determination is tedious, time-consuming and demands a high purity of the solute [14]; none of these preconditions are compatible with highthroughput techniques, there is, therefore, an ongoing interest in methods for the prediction of logP values.Over recent decades various approaches (fragmental, atom-based, conformationdependent methods) [15][16][17][18] have been developed that are mostly implemented and available as computer programs.However, even in these calculations it is not uncommon to have differences of several order of magnitude [19,20].For these reasons logP cannot be considered a univocal descriptor, which brought different authors [19][20][21][22][23][24] to propose toxicity models based exclusively on other structural theoretical molecular descriptors.The present paper proposes predictive simple linear regression QSAR models to evaluate the relative toxicity of organic chemicals, in terms (of the logarithm of the inverse) of the 50% inhibitory growth concentration (IGC50) of Tetrahymena pyriformis.Models based on different kinds of logP (calculated values for AlogP, MlogP and ClogP), are compared to the optimal model constructed using a single 3D (geometrical) descriptor calculated from the chemical structure alone.

Experimental Data
Two different toxicants were studied: a set of 21 (linear and branched-chain) alcohols and 9 normal aliphatic amines, selected to reflect diversity in chain length and branching.These toxicants, which are both nonionic and nonreactive, inhibit the growth concentration of Tetrahymena pyriformis the most tested common freshwater hymenostome ciliate, which approximately measures 50 µm in length and 30 µm in width [25].The ciliates were grown in axenic culture with population density being measured spectrophotometrically as optical density (absorbance) at 540 nm following 48h of incubation.The set of experimental data was taken from Schultz [26].

Estimation of octanol /water partition coefficient
ClogP ( ≡ calculated logP) [17]   The software version of the system developed by Hansch and Leo [27], using the Rekker's additive scheme [28], is known as ClogP (or calculated logP).It is based on different fragmental constants and correction terms.Fragment constants were derived from solutes where the fragment occurs in isolation.Furthermore, the bonding environment was taken into account (alkyl, benzyl, vinyl, styryl, and aromatic neighbors) resulting in five values per fragment.If a fragment in combination with the bonding environment was missing but at least two values for the same fragment with different neighbors could be found an interpolation was attempted to derive the missing data.The correction factors have been calculated from the corrections required for the specific interactions being modelled.For instance, the interaction of the two hydroxyl groups in diethylene glycol increases the logP value by 0.85 compared with two hydroxyl ©UBMA -2014 53 groups that do not interact.This value is then taken as the correction term for a twoneighbored hydroxyl group [29].
The decomposition of the molecular structure into fragments is performed by using a unique and simple set of rules, thus obtaining a unique solution; the fragments are either atoms or polyatomic groups.
AlogP ( ≡ Ghose and Cripen model based on atomic increment system) [30] Several models have been published where the fragments are defined on a purely atomic level.This simplifies both the recognition of fragments and the calculation, as correction substructures are not applied (see Eq.( 1)).
The most frequently used atomic increment system, AlogP, was developed by Ghose and Crippen [31].Atoms are classified by their neighboring environment and carbon atoms additionnally by their hybridization.Estimated logP for any compound is given by: where n i is the occurrence of the ith atom type and a i is the corresponding hydrophobicity constant.The AlogP model implemented in DRAGON has been evaluated on a set of 2648 compounds with known experimental logP taken from the NCI open Data Base.The resulted correlation coefficient r is 0.915.

MlogP ( ≡ Moriguchi model based on structural parameters)
This is a model described by a regression equation based on 13 structural parameters [32,33].
The regression coefficients have been evaluated by a training set of 1230 organic molecules including general aliphatic aromatic and heterocyclic compounds containing the following atoms: C, H, N, O, S, P, F, Cl, Br, I [30].The statistical parameters of the model are r = 0.952; SE = 0.422; F 0 (13;1216) = 900.4

Geometrical Descriptors Generation
The chemical structure of each compound was sketched on a PC using the HYPERCHEM program [34] and pre optimized using MM + molecular mechanics method (Polack-Ribiere algorithm).The final geometries of the minimum energy conformation were obtained by the semi empirical PM3 method at a restricted Hartree-Fock level with no configuration interaction applying a gradient norm limit of 0.01 kcal.Ǻ -1 .mol -1 as a stopping criterion.The resulted geometries were used as input for the generation of (74) 3D-geometrical descriptors using the DRAGON software (version 5.3) [30].Geometrical descriptors being defined from the three dimensional structure of the molecule, which involves the knowledge of the relative positions of the atoms in 3D space provide information and discrimination power also for similar molecular structures and molecule conformations.

Chemometric Methods
Models with one variable were performed by the software MOBYDYGS [35] using the Ordinary Least Square regression (OLS) method.Population of 74 regression models corresponding to each of the 74 Geometry descriptors were ordered according to their decreasing internal predictive performance, verified by Q 2 , optimal non-logP model was then selected and compared to the three logPbased models.
The goodness of fit of the calculated models were assessed by means of the multiple determination coefficients, R 2 , and the standard deviation error in calculation (SDEC).
( ) Cross validation techniques allow the assessment of internal predictivity (Q 2 LMO cross validation; bootstrap) in addition to the robustness of model (Q 2 LOO cross validation).Cross validation methods consist in leaving out a given number of compounds from the training set and rebuilding the model, which is then used to predict the compounds left out.This procedure is repeated for all compounds of the training set, obtaining a prediction for everyone.If each compound is taken away one at a time the cross validation procedure is called leaveone-out technique (LOO technique), otherwise

©UBMA -2014 54 leave-more-out technique (LMO technique
).An LOO or LMO correlation coefficient, generally indicated with Q 2 , is computed by evaluating the accuracy of these "test" compounds prediction.
The "hat" of the variable y, as is the usual statistical notation, indicates that it is a predicted value of the studied property, and the sub index "i/i" indicates that the predicted values come from models built without the predicted compound.TSS is the total sum of squares.The predictive residual sum of squares (PRESS) measures the dispersion of the predicted values.It is used to define Q 2 and the standard deviation error in prediction (SDEP).

n PRESS SDEP = (4)
A value Q 2 > 0.5 is generally regarded as a good result and Q 2 > 0.9 as excellent [36,37].However, studies [38,39] have indicated that while Q 2 is a necessary condition for high predictive power a model, is not sufficient.To avoid overestimating the predictive power of the model LMO procedure (repeated 5000 times, with 5 objects left out at each step) was also performed (Q 2 L(5)O ).In bootstrap validation technique K ndimensional groups are generated by a randomly repeated selection of n-objects from the original data set.The model obtained on the first selected objects is used to predict the values for the excluded sample, and then Q 2 is calculated for each model.The bootstrapping was repeated 8000 times for each validated model.By using the selected model the values of the response for the test objects are calculated and the quality of these predictions is defined in terms of Q 2  ext , which is defined as ( ) where the sum runs over the test set objects (n ext ).

RESULTS AND DISCUSSION
The best one dimensional non-logP model was obtained using the average distancedistance degree (ADDD) index .It encodes information on the molecular folding [40,41] information about molecular diffusion easiness through biological barriers like membranes.Distance/distance matrices, denoted as D/D, were defined as quotient matrices in terms of geometric r ij distances and topological distance The row sums of these matrices contain information on the molecular folding; in effect, in highly folded structures, they tend to be relatively small as the inter-atomic distances are small while the topological distances increase as the size of the structure increases.Therefore, the average row sum is a molecular invariant called average distance-distance degree, that is: A being the number of molecule atoms.Carbö -Dorca et al. [42] reported a QSAR study where the same data was examined, these authors constructed a predictive model using, as a molecular descriptor, the expectation value of the inter electronic repulsion energy operator presented as a kind of quantum self-similarity measure (QS-SM).The correlation results reached R 2 = 0.9240, Q 2 = 0.9090 and SE = 0.415, which are inferior than the present approach.
The value of R 2 attests the good fitting performances of the model.In general, the larger the magnitude of the F ratio, the better the model predicts the property values in the training set.The large F ratio of 603.82 indicates that the model does an excellent job of predicting the -log IGC50 values.The model is robust, the difference between R 2 and Q 2 is small (<1%).Figure 1 shows a plot contrasting experimental and cross-validated -logIGC 50 .The point dispersion is small, although there is one point a little bit far away from the rest (1propylamine).SDEP is similar to SDEC, so this model has internal predictivity not so dissimilar from fitting power.The model demonstrates a very good stability in internal validation (difference between Q 2 and Q 2 L(5)O is 0.33% ), while bootstrapping confirms the internal predictivity and stability of the model.Though small sized the data set underwent statistical validation by preliminary random splitting of the chemicals into training (20 chemicals) and validation (10 chemicals) sets.The small size of the published experimental data set [26] did not allow a more drastic splitting.The information obtained by Q 2 ext is somewhat optimistic.In fact with small data sets (20-30 chemicals), completely new chemicals external predictivity can only be verified a posteriori, case -by-case.

CONCLUSION
From the results and discussion above we conclude that: 1.Among the logP descriptors selected to model the inhibition of Tetrahymena pyriformis

Figure- 1
Figure-1 Experimental versus cross-validation activity for the training set objects.

Table 1 .
Relative toxicity and molecular descriptors data for the selected aliphatic alcohols and amines

Table 2 .
Coefficients for the ordinary least squares calculated models.

Table 3 .
Summary statistics for the one dimensional calculated models.
2014 57 growth by aliphatic alcohols and amines, Clog P is the best.2. Geometrical descriptor ADDD (Average Distance/Distance Degree), a theoretical molecular descriptor, and Clog P can be interchanged without relevant variations in the statistical results.3. The non-logP model obtained in this study has very good fitting performances, is robust and with acceptable predictive power.