Development and Validation of Quantitative Structure-Activity Relationship Models for Compounds Acting on Serotoninergic Receptors

A quantitative structure-activity relationship (QSAR) study has been made on 20 compounds with serotonin (5-HT) receptor affinity. Thin-layer chromatographic (TLC) data and physicochemical parameters were applied in this study. RP2 TLC 60F254 plates (silanized) impregnated with solutions of propionic acid, ethylbenzene, 4-ethylphenol, and propionamide (used as analogues of the key receptor amino acids) and their mixtures (denoted as S1–S7 biochromatographic models) were used in two developing phases as a model of drug-5-HT receptor interaction. The semiempirical method AM1 (HyperChem v. 7.0 program) and ACD/Labs v. 8.0 program were employed to calculate a set of physicochemical parameters for the investigated compounds. Correlation and multiple linear regression analysis were used to search for the best QSAR equations. The correlations obtained for the compounds studied represent their interactions with the proposed biochromatographic models. The good multivariate relationships (R 2 = 0.78–0.84) obtained by means of regression analysis can be used for predicting the quantitative effect of biological activity of different compounds with 5-HT receptor affinity. “Leave-one-out” (LOO) and “leave-N-out” (LNO) cross-validation methods were used to judge the predictive power of final regression equations.


Introduction
Serotonin (5-hydroxytryptamine, 5-HT) is a neurotransmitter in the central nervous system (CNS) that plays a significant role in migraine attacks, mood regulation, sleep, appetite, sexual function, anxiety treatment, depression, and schizophrenia. This neurotransmitter interacts with fourteen serotoninergic receptor subtypes, which are classified into seven families (5-HT 1−7 ). Until now, we can explain the importance of functional groups 1-4; the physiological and therapeutic importance of the group 5-7 receptors is not yet known. With the exception of 5-HT 3 receptor, which belongs to the family of ionotropic receptors, all are G proteincoupled receptors (metabotropic receptors) [1].
Many models of ligand binding describe interaction of ligands with serotonin receptors binding sites, but there are differences in them even for the same subtype of 5-HT receptor. Based on the available bibliographic data, it is established that the essential role in the creation of drugserotonin receptor complex is played by the following amino acids: aspartic acid (Asp155), serine (Ser159), phenylalanine (Phe340), asparagine (Asn333), tryptophan (Trp200, 236, 367), tyrosine (Tyr370), and threonine (Thr196) [2][3][4][5][6][7][8][9][10][11][12]. A detailed description of the model binding sites was presented in earlier works [13,14]. The information will make it possible to build an analytical model of interaction ligands with the 5-HT receptor and the initial prediction of potential biological activity of its ligands.
Chromatographic systems including chemical elements of the biological environment, simulating the conditions under which the studied compounds would act in a living organism (the so-called biochromatography) and thus the 2 The Scientific World Journal data obtained from such research (column, e.g., HPLC and thin-layer chromatography, TLC), are used very frequently in quantitative structure-activity relationship (QSAR) studies [15][16][17][18][19][20][21][22][23][24]. QSAR analysis consists in a mathematical treatment that is used to predict values of biological activity from physical characteristics of the structure of chemicals (molecular descriptors).
In the literature there are examples of biochromatographic data analyses, their implications for molecular pharmacology, and application in predicting pharmacological activity of drugs [25][26][27][28][29][30][31][32]. In such studies are employed both parameters obtained from the experimental interaction with the environment (e.g., biochromatography) and calculated physicochemical parameters, which result from the construction of chemical compounds.
This paper is a continuation of the studies [13,14], whose purpose is to determine the possibility of using data obtained from thin-layer chromatography and computercalculated physicochemical parameters to build regression equations that will predict the receptor binding affinity (pK i ) and agonist (pD 2 ) and antagonistic (pA 2 ) activity of selected compounds acting on serotonin receptors. The choice of model compounds for the binding elements in the receptor structure was based on the literature data on the use of propionic acid to mimic aspartic acid structure in interaction with the ligands of the histamine H 3 receptor [33,34].

Materials.
Samples of the drugs 1-20 used in this work were purchased either as medical products or as standard substances [14]. All of them have biological activity toward serotoninergic receptors, see Table 1. The active substances were isolated from medical products with methods described according to specific monographs presented in the Polish Pharmacopoeia, and information is available in The Merck Index [35]. Pharmacological details are given elsewhere [14].

Chromatography.
Reversed-phase thin-layer chromatography system (RP2 TLC) was used for determination of the chromatographic data. The experiments were carried out twice in each variant of the stationary and mobile phase. TLC silica gel 60 RP2 F 254 glass plates (silanized; 20 × 20 cm, Merck, Darmstadt, Germany) were used as the stationary phase. Chromatograms were developed in two mobile phases, denoted as DS A and DS B : (i) acetonitrile : methanol : buffer pH 7.4 (0.02 mol/L ammonium acetate) (40 : 40 : 20, v/v/v; DS A ) and (ii) acetonitrile : methanol : methylene chloride : buffer pH 7.4 (60 : 10 : 10 : 20, v/v/v/v; DS B ). All plates were first prerun for 1.5 h with the mobile phase, dried at ambient temperature, and then impregnated with 0.03 mol/L analogues of binding L-amino acid solutions ((a)-(g), see below) to obtain the corresponding biochromatographic models. The models were denoted as follows: (a) for propionic acid: S1, (b) ethylbenzene: S2, (c) 4-ethylphenol: S3, (d) propionamide: S4, (e) propionic acid + ethanol (1 : 1, v/v): S5, (f) propionic acid + ethanol + ethylbenzene (1 : 1 : The plates were impregnated with the solutions (a)-(g) by spraying in automatic TLC spray chamber (ChromaJet DS20, Desaga, Germany) and then air-dried. The impregnated and dried adsorbent layers were ready for chromatography. Additional plates (two for each type of mobile phase) were left clean for control analysis (C), without analogues of amino acids solutions.
The compounds 1-20 were weighed on analytical laboratory scales with 0.1 mg accuracy and then dissolved in methanol to obtain 1.0 mg/mL concentrations. The compounds in 1.0 μL quantities were applied onto the previously prepared plates by means of Desaga AS30 TLC applicator (Desaga, Germany), at 1.0 cm intervals. The distance from the lateral edges was 2 cm. The start line was set at the level of 1.5 cm from the lower edge of the plate. The chromatograms were developed in a horizontal chromatographic chamber with an eluent dispenser, DS-II-20 × 20 (CHROMDES, Lublin, Poland) to the height of 12 cm above the lower edge of the plate. The duration of chromatograms development was 35 ± 2 min and 28 ± 2 min (for eluents DS A and DS B , resp.). The developed lanes were scanned densitometrically at 280 nm by means of a Desaga CD 60 densitometer with Windows-compatible ProQuant software (Desaga, Germany). For the particular compounds, the retention or retardation factor (R f ) values were read, and then the R M values were calculated [36]: R M = log (1/R f − 1). The R M values used for analysis constituted a mean from two reproducible experiments. R M(S1) -R M(S7) and R M(C) values for the analytes were presented in the course of the described quantitative analysis as S1-S7 and C, respectively, whereas the derivatives of these results were denoted with symbols: C-S(1-7) and S(1-7)/C. Using parameters of C-S(1-7) and S(1-7)/C has been justified in an earlier work [14]. The results of chromatographic analysis are presented in Table 2.

Calculation of the Molecular Descriptors.
The semiempirical method AM1 with Polak-Ribière algorithm (HyperChem v. 7.0 program) [37] and ACD/Labs v. 8.0 program [38] were employed to calculate a set of physicochemical parameters for the investigated compounds, see Table 3. The following set of variables was collected using (i) HyperChem program-the total energy (E T , kcal·mol −1 ), the binding energy (E b , kcal·mol −1 ), the heat of formation (ΔH F , kcal·mol −1 ), the total dipole moment (μ, D), the energy of the highest occupied molecular orbital (ε HOMO , eV), the energy of the lowest unoccupied molecular orbital (ε LUMO , eV), and the net atomic charge on the nitrogen atom (Q N ), (ii) the module-QSAR Properties ChemPlus 2.1 included in Hyperchem software-the grid surface area (A S ,Å 2 ), the molar volume (V m ,Å 3 ), the hydration energy (E H , kcal·mol −1 ), the logarithm of the octanol/water partition coefficient (log P), the molar refractivity (R m , A 3 ), polarizability (α,Å 3 ), and the molecular weight (M W , g·mol −1 ), and (iii) ACD/Labs 8.0 program-the distribution

Statistical Analysis.
Stepwise multiple linear regression and correlation analysis were carried out using STATISTICA 9.0 program [39]. Values of biological activity (pK i , pD 2 and pA 2 ) of the analyzed compounds were used as dependent variables, as independent variables were applied in the chromatographic data and the calculated physicochemical descriptors.
The relationships between the behaviour of compounds 1-20 in chromatographic environments (C, S1-S7), their physicochemical properties, and their biological activity gave mathematical models whose statistical quality was estimated using of the following statistical indicators: the correlation coefficient (R), the squared correlation coefficient (the coefficient of determination, R 2 ), the variance ratio F, and the standard error of estimate (s), and the statistical significance (P-level) of the results was determined as P ≤ 0.05 (see Table 4).
The correlation between the biological activities with the various variables and the intercorrelation of descriptors was analyzed with the help of the correlation matrix. If two descriptors showed the correlation coefficient |R| > 0.5, one of them would be removed. The respective intercorrelation coefficients between the parameters occurring in the established regression models are given in Table 5. Evaluation of the best correlation models was carried out by validation of each model using general internal crossvalidation procedures such as the "leave-one-out" (LOO) and "leave-N-out" (LNO). These kinds of internal validation are recommended if the number of compounds is small [40,41]. The detailed procedures of these kinds of internal validation were described in an earlier work [14].
The cross-validated squared correlation coefficient (Q 2 ), predicted residual sum of squares (PRESS), standard deviation based on PRESS (S PRESS ), and standard deviation of error of prediction (SDEP) were used to evaluate the predictive power the developed models. Some criteria for the reliability prediction and robustness of the models are suggested by authors [42][43][44][45][46]: R 2 > 0.6 and Q 2 > 0.5;

Results and Discussion
The present work is a continuation of the previous studies [13,14], whose purpose is to determine the possibility of using data obtained from thin-layer chromatography and computer-calculated physicochemical parameters to build regression equations that will predict the receptor binding affinity (pK i ) and agonist (pD 2 ) and antagonistic (pA 2 ) activity of selected compounds acting on serotoninergic receptors. Similar studies have been carried out on compounds toward β-adrenergic [47,48] and histamine [49][50][51][52] receptors. Compound The Scientific World Journal In this study, we took advantage of the data based on the structure and function of serotoninergic receptors [12][13][14][15][16][17][18][19][20][21][22]. On the basis of the information, was established that the following amino acids: aspartic acid (Asp155), serine (Ser159), phenylalanine (Phe340), asparagine (Asn333), tyrosine (Tyr370), threonine (Thr196), and tryptophan (Trp200, 236, 367), located within 5-HT receptors play the most important role in ligands binding. This information made it possible to think out a hypothetical model of drugserotonin receptor interaction, in which amino acids were introduced into the stationary phase of chromatographic environment [14].
The amino acids used to modify the stationary phase in previous work [14] contain amine and carboxyl groups which under biological conditions do not participate in the formation of the active complex. They remain within the protein structure forming peptide bonds. The presence of these active groups in a chromatographic system might lead to additional interactions with the compounds studied.
In further experiments with our models of interactions with 5-HT receptors, we have thus used compounds (as analogues of the key receptor amino acids) with structures corresponding to those of crucial amino acids but without the amine and carboxyl groups which form peptide bonds in the receptor protein. And so for example, aspartic acid was therefore replaced with propionic acid, phenylalanine with ethylbenzene, tyrosine with 4-ethylphenol and asparagine with propionamide. Application of ethanol and isopropyl alcohol (as analogues of serine and threonine, resp.) as the substances modifying the stationary phase individually was devoid of sense. Ethanol and isopropyl alcohol were used only in multicomponent solutions for impregnation of plates. The choice of model compounds for the binding elements in the receptor structure was based on the literature data on the use of the propionic acid structure in studies of the interaction of aspartic acid with the ligands of the histamine H 3 receptor [33,34].
The correlation and the stepwise multiple linear regression analyses were carried out to answer the question whether there is any relationship between the behaviour of the compounds 1-20 in chromatographic environments S1-S7 and their biological activity (pK i , pD 2 , and pA 2 ).
First, we analysed the relationship between the biological activity data and behaviour of the examined compounds in chromatographic environment of the control (C: without analogues of amino acids). The calculated correlation coefficient values (R) were (the regression equations are not presented in the text) 0.13 and 0.06 (pK i , n = 19, for DS A and DS B phases, resp.), 0.04 and 0.37 (pD 2 , n = 13, for DS A and DS B phases, resp.), and 0.02 and 0.10 (pA 2 , n = 16, for DS A and DS B phases, resp.). As indicated by the analysis, there was no correlation between serotoninergic activities of particular compounds 1-20 and their C-chromatographic data. It led to conclusions that the other significant relationships can depend upon the specific biochromatographic environment. Distinct relationships between values of biological activity and interactions data of the compounds 1-20 can be observed with all the models (S1-S7) (see Table 4).
Under the conditions of experiment with DS A mobile phase, distinct relationships were found for the compounds with acknowledged binding affinity pK i to the 5-HT receptor ((1), Table 4) and determined agonistic activity pD 2 ((6), Table 4). The binding affinity pK i of the compounds to 5-HT receptor was described on the basis of models S1, S2, and S3. The relationship explains 68% of the variance and simultaneously describes the potential interactions between the ligands and amino acid residues: Asp155, Phe340, and Tyr370. In the case of agonistic activity pD 2 , chromatographic models S4 and S5 demonstrate a significant effect on regression equation, which explains 84% of the variance and simultaneously describes the potential interactions between the ligands and amino acid residues: Asn333, Asp155, and Ser159. All types of interactions between the structural elements of the receptor hydrophobic pocket and the chemical substance in the drug-receptor complex are represented in the above cases: ionic and hydrogen bonds, as well as stabilization of aromatic ligands rings by hydrophobic forces. For compounds with antagonistic activity pA 2 , the correlation was not statistically significant (11).
In the case of DS B mobile phase and for compounds with determined biological activity pK i (2) and pD 2 (7), the equations explain only 46-53% of total variance. The satisfactory results yielded the analysis of correlation between the data characterizing antagonistic activity pA 2 and chromatographic parameters (R = 0.83; Equation (12)). The relationship explains 69% of the total variance and describes interactions of the ligands with amino acids: aspartic acid, tyrosine, serine, and phenylalanine (S1, S3, and S6 models).
At the next stage of the study, in the multiple regression analysis the molecular descriptors were employed as independent variables ( Table 3). The contribution of the same descriptors in the development of regression equations was described in previous papers [13,14]. The final mathematical models for biological activity (pK i and pA 2 ) explain 69% of the total variance ( (3) and (13)), and model for pD 2 explains only 57% of total variance (8).
Considering the role of molecular descriptors in prediction of biological activity, these parameters were included in the regression analysis together with chromatographic data. The share of the calculated molecular descriptors in the analysis of chromatographic models for DS A phase was presented in the form of (4), (9), and (14). The satisfactory result gave the regression analysis for compounds with pK i and pA 2 activity ((4) and (14)) where the mathematical models explain 60-68% of the total variance. For the dependent variable of pD 2 , the regression model explains 86% of the total variance, but the number of cases (n = 13) does not qualify for the introduction into analysis of the three independent variables.
Testing correlation for the development phase DS B yielded good results of regression analysis for biological activity pK i and pA 2 ( (5) and (15)). In both cases, the mathematical models explain 80% of the total variance. For compounds with agonistic activity pD 2 , the correlation explains only 60% of the total variance ((17)- (18)). In the above models, you can see the contribution of both chromatographic data and the molecular descriptors.   It can be seen, in the case of these and previous studies [13,14], that combining chromatographic data with physicochemical parameters has improved the results of QSAR analysis. In (3)-(5), (8)- (10), and (13)-(15), we can see clear influence of electronic (the net atomic charge on the nitrogen atom, the total dipole moment, the energy of the lowest unoccupied molecular orbital), thermodynamic (the distribution coefficient, the logarithm of the octanol/water partition coefficient, the molar volume, the heat of formation, and the molar refractivity), and structural (the number of H-bond donors) descriptors as independent variables determining biological activity.
As can be seen, Q N , log P, and V m contribute positively and μ, log D, and HD contribute negatively to 5-HT receptor binding affinity. Antagonistic activity has positive influence on ΔH F , R m , and log P and negative influence on μ. Descriptor log D contributes positively to agonistic activity. Moreover, in all equations, One notes the influence of biochromatographic environments as the proposed models of drug-receptor interaction.
On the basis of such analyses, mathematical equations describing all the types of ligands interactions with 5-HT receptors can be proposed: (5), (6), and (15). The models, together with the statistical and validation parameters, are given in Table 5. Table 6 presents the correlation matrix, where it is shown that the selected descriptors from the above equations are not highly correlated. Table 7 and Figures 1, 2, and 3 report the comparison of observed and predicted values of biological activity for (5), (6), and (15).

C-S4
C-S5 C-S6 ΔH F log P HD pA 2 pK i (A) C-S4 1.00 C-S5  According to authors [45][46][47][48][49], terms for a reliable model: Q 2 > 0.5 and R 2 > 0.6, Q 2 LOO ≤ R 2 ≥ Q 2 LNO and Q 2 LOO ≈ Q 2 LNO are fulfilled in the equations in Table 5. The relation R 2 adj < R 2 confirms that models are not overparameterized. These equations can be proposed as the tools for prediction of 5-HT activity of novel compounds characterized by various structures with 78-84% probability of obtaining a reliable result.
On the basis of the results described, it is clearly apparent that the simple analogues of amino acids important for ligand-receptor interaction are useful for building analytical models of the serotoninergic activity of drug candidates.

Conclusions
The QSAR models of compounds acting on serotoninergic receptors have been developed based on chromatographic data and electronic, thermodynamic, and structural descriptors. In all the types of chromatographic systems described above, models based on interaction of the compounds 1-20 with substances modifying the stationary phase have been found regression. The chemicals (propionic acid, ethylbenzene, 4-ethylphenol, and propionamide) used to impregnate the plates can represent the interaction of the compounds examined with the crucial amino acids (Asp, Phe, Tyr, and The Scientific World Journal   Asn, resp.). The proposed biochromatographic systems can describe an interaction which is possible between the ligands and the appropriate analogues of amino acids. A lack of correlation between the activity of the compounds and their behaviour in the control of chromatographic environment confirmed the important role of the presence of compounds modifying the stationary phase of chromatographic systems in construction of analytical drug-receptor interaction models. The predictive ability of models was demonstrated by using "leave-one-out" and "leave-N-out" cross-validation procedures. The results indicate that these models can be successfully used to predict the activity of 5-HT receptor ligands.