Chemometric Approach in Studying of the Retention Behavior and Lipophilicity of Potentially Biologically Active N-Substituted-2-phenylacetamide Derivatives

A atividade biológica potencial de moléculas depende largamente da sua lipofilicidade. A lipofilicidade de derivados de 2-fenilacetamida N-substituída foi investigada experimentalmente, aplicando cromatografia em camada delgada em fase inversa (RP-TLC em RP 18 F254s) na presença de etanol e de dioxano, e usando pacotes de software. A fim de estabelecer a dependência entre a lipofilicidade obtida de diferentes formas foram usados análise de regressão linear e métodos multivariados. Agrupamentos aproximadamente semelhantes dos parâmetros lipofílicos e dos compostos testados foram registados no caso de ambos os métodos quimiométricos usados. Todos os resultados obtidos confirmam o fato de que as análises de regressão linear e multivariada aplicadas oferecem oportunidades para comparar os dados sobre a retenção cromatográfica e parâmetros lipofílicos dos derivados de fenilacetamida investigados. Os resultados sugerem que a lipofilicidade das moléculas estudadas depende largamente da natureza dos substituintes ligados ao átomo de nitrogênio e, por outro lado, que as constantes retenção cromatográfica, RM , determinada pelo método de RP-TLC, são semelhantes à medida padrão de lipofilicidade, log P, o que torna este método adequado para a previsão de lipofilicidade.


Introduction
Amides as a large group of organic molecules show a wide range of biological activities.Among them, phenylacetamide derivatives play an important role, due to their application in human, veterinary and plant medicine.
They are known for their analgesic, 1 anti-microbial, 2 anti-convulsant, 3 anti-arrhythmic, 4 anti-tuberculotic, 5 and anti-tumor activities. 6,7Generally, the type and intensity of biological activities of phenylacetamides depend strongly on the nature of the substituent attached to the basic molecule. 8nderstanding the relationship between the activity, structure and the physicochemical properties of the examined compound provides the opportunity to identify its potential bioactivity.The selection of structural parameters that are important for the behavior of molecules in a biological medium is facilitated by applying the quantitative structure-activity relationship (QSAR), the quantitative structure-property relationship (QSPR) and the quantitative structure-retention relationship (QSRR) models.
Lipophilicity is one of the most significant molecular descriptor of compounds, which indicates the biological activity of substance, thus determining its transport through a biological system.In the pharmaceutical and environmental chemistry, it is widely used for predicting the relationship between the biological activity and chemical behavior of a substance. 9In most cases, lipophilicity can be quantitatively characterized as log P (the logarithm of the ratio of the concentrations of solute in a saturated 1-octanol-water system). 10,11This parameter is often used as a descriptor in QSAR studies. 12Along with many traditional methods for the determination of molecule lipophilicity, reversed phase thin-layer chromatography (RP-TLC) plays an important role due to its simplicity, possibility of examination of a large number of samples in small quantities, reproducibility and the modest price.The chromatographic retention constant, R M 0 , obtained in reversed phase thin-layer chromatography is widely used as a measure of lipophilicity, together with reference lipophilic parameter log P. [13][14][15][16][17][18] The aim of this study was to investigate the chromatographic behavior of N-substituted-2-phenylacetamide derivatives using thin-layer chromatography on a reversed phase.The effects of the substituent in tested molecules and the influence of the used organic solvent were investigated as well.In order to examine and visualize similarities and differences between chromatographically and mathematically obtained lipophilicity and grouping of the compounds based on chemical properties of functional groups, linear regression analysis (LR) and two multivariate analysis, cluster analysis (CA) and principal component analysis (PCA), were performed.
The cluster analysis is a convenient method for dividing a group of objects into classes (clusters) based on their similarity. 19CA searches for objects which are close together in the variable spaces and puts them into the same cluster.There are many methods for searching for clusters.Hierarchical cluster analysis begins by considering each object as a cluster of size one and compares the distances between clusters.The two points which are closest together are joined to form a new cluster.This procedure is repeated and if continued indefinitely, will result in grouping all the points together.
There are different ways of calculating the distance between two clusters containing more than one member (single linkage, complete linkage, average linkage, Ward's method, etc.).Successive stages of grouping can be shown on a dendrogram.The vertical axis of a dendrogram is a measure of similarity between two objects in obtained clusters.Non-hierarchical cluster analysis is the opposite of hierarchical cluster analysis.
Principal component analysis (PCA) is a powerful technique which is used for reducing the amount of data without much loss of information. 19The idea of PCA is to find the new variables, principal components (PC), which are linear combinations of the original variables.In mathematical terms the principal components are the eigenvectors of the covariance matrix.Corresponding to each PC (eigenvector) is an eigenvalue, which gives the amount of variance in the data set which is explained by that principal component.The principal components are formed in a way that, unlike the original variables, they are not correlated with each other.However, the principal components are also chosen so that the first principal component (PC1) accounts for most of the variation in the data set, while the second (PC2) accounts for the next largest variation and so on.Hence, when significant correlation occurs, the number of useful PCs is much smaller then the number of original variables.Usually, two PCs are sufficient to describe most of the retention data variations in the chromatographic retention data analysis. 20

Experimental
The structures of the examined phenylacetamide derivatives are presented in Figure 1.
Pre-coated RP-18W/UV 254 plates (Macherey-Nagel GmbH and Co., Duren, Germany) were used as a stationary phase.Solutions for chromatographic investigations were prepared by dissolving 2 mg mL -1 of compounds in ethanol.Mixtures of liquid chromatography (LC) grade organic modifiers (J.T. Baker, Deventer, the Netherlands) and filtered bi-distilled water were used as mobile phase.On the plates were spotted aliquots of 0.2 mL of freshly prepared solutions and the migration distance was 5 cm.Plates were developed in normal unsaturated chambers at room temperature by ascending technique with aqueous solutions of two organic modifiers: ethanol (j = 0.36-0.52,v/v) and dioxane (j = 0.30-0.52,v/v).After development and drying, the plates were examined under the UV light at l = 254 nm as dark spots.At least three chromatograms were developed for each combination of solvent-solute and the average R f values were calculated.
The obtained experimental data were processed by software package Origin, version 6.1.Standard lipophilicity values, log P, were calculated using Virtual Computational Chemistry Laboratory, VCCLAB. 21The CA and PCA procedures were performed by Statistica v.12 software (StatSoft Inc., Tulsa, OK, USA).

Determination of the retention behavior of the investigated N-substituted-2-phenylacetamide derivatives
In order to determine the retention behavior of the tested N-substituted-2-phenylacetamide derivatives experimentally, high performance thin-layer chromatography (HPTLC) on reversed phase RP 18 F254s was used in the presence of one protic solvent (ethanol) and one aprotic solvent (dioxane).The retention behavior of the investigated compounds in the selected mixture of solvents is presented in Table 1.
From the results presented in Table 1 it is evident that the retention behavior of the investigated N-substituted-2phenylacetamide derivatives is affected by the nature of the substituent -R related to the nitrogen atom of the amide group, as well as the used organic modifier.This can be explained by a different interaction that occurs between the tested compounds and the applied organic solvent during chromatographic analysis.Two kinds of interactions are dominant in the case of the investigated compounds: hydrophobic interactions of substituent -R with the nonpolar stationary phase and polar interactions of the amide group with mobile phase.By observing the obtained R f values of the same compound in different organic modifiers it is obvious that there exists no noticeable difference between them.This phenomenon can be explained by the more pronounced interactions between solute and polar mobile phase compared to the interaction of solute with non-polar stationary phase.
The retention data in Table 1 show that the retention of compounds depends more on the nature of the substituent on the nitrogen atom than on the selected organic solvent.From all the physicochemical properties, which may affect the chromatographic behavior of these compounds, the polarity of substituent -R related to the nitrogen atom of the amide group and its electron moving abilities are the most responsible parameters for the different behavior of the investigated molecules in the analyzed systems.The polarity of substituent -R related to the nitrogen atom of the amide group influences the possibilities of the molecule to achieve interaction with stationary chromatographic phases.Therefore, this physicochemical property has the greatest impact on retention of the investigated derivatives in the analyzed conditions.For example, compound 11 in both applied modifiers exhibits the lowest retention, because it has benzenesulfonic acid as a substituent, which provides its greater polarity compared to other compounds.Unlike   22 Consequently, the amide group in the presence of a heterocyclic substituent achieved different interaction with polar mobile phase than in the presence of aromatic rings, which causes the separation of these compounds during TLC.This phenomenon becomes visible in the correlation retention properties of the investigated compounds with standard measure of lipophilicity.
Experimental lipophilicity of the studied N-substituted-2-phenylacetamide derivatives was determined by thinlayer chromatography on reversed phase in the presence of different amounts of used solvents.Based on the experimentally determined R f values for each composition of the mixture, R M values were calculated using equation 1: Calculated R M values were extrapolated to zero concentration of organic modifier using the following equation: where j is the volume fraction of the organic solvent in the mobile phase, m is the slope of a linear plot and the intercept is the retention constant, R M 0 .These equations were applied individually on each compound in both modifiers and good linear relationships were obtained.The results for the obtained correlation are presented in Table 2.The validity of linear dependences in the selected field of work for all tested organic modifiers is confirmed with high values of the correlation coefficients, r.

Determination of the lipophilicity of N-substituted-2phenylacetamide derivatives using mathematical methods
The standard measure of lipophlilicity, log P, of the investigated compounds was calculated by using VCCLAB.
Data shown in Table 3 indicate different values of partition coefficient, log P, of the investigated compounds, probably because of the different ways of calculating this parameter.The highest value of partition coefficient was obtained for compound 2, as was also chromatographically registered, and the lowest value in most cases was obtained for compound 8.This fact is different from the experimental results, where the lowest lipophilicity was registered for compound 11.
Correlation between the retention constant, R M 0 , and the standard measure of lipophilicity, log P Given that the retention constant, R M 0 , describes the overall effect of the intermolecular interaction of the compound with the stationary and mobile phases, it has been confirmed that this chromatographic parameter can be used to express and determine lipophilicity.In order to establish the dependence between the standard measures of lipophilicity (log P) calculated in different ways, and experimentally determined lipophilicity (chromatographic retention constants, R M 0 ), these two values were correlated using a linear regression analysis and different multivariate analysis.][31][32] The correlation results obtained by using the linear regression analysis for one of the calculated lipophilicity values, AClog P, and chromatographic retention constants, R M 0 , obtained in dioxane are presented in Figure 2.
As Figure 2 shows, the investigated compounds are grouped into two groups.One group comprises molecules with an aromatic group as substituent (2, 9, 10 and 11), while the second group consists of phenylacetamide derivatives that have a heterocyclic ring as a substituent (1, 3, 4, 5, 6, 7 and 8) related to the nitrogen atom of the amide group.A similar distribution of the tested parameters occurred during the chromatographic retention constant determined in ethanol.The correlation matrix calculated for various log P and R M 0 for the investigated compounds using linear regression analysis is shown in Table 4.
The results presented in Table 4 confirm that good linear relationships exist between retention constants, R M 0 , obtained by reversed-phase thin layer chromatography and computationally obtained log P coefficients.Much better correlation is registered in the case of compounds which have an aromatic ring as a substituent.The best relationship in most cases was obtained for the correlation between R M 0 in both modifiers and kowwin, and the lowest was registered between R M 0 and Mlog P (Table 4).Good linear relationships registered between chromatographic retention constants, R M 0 , in both modifiers and the partition coefficients, log P, as a standard measure for lipophilicity of the examined compounds, confirm that the results obtained by the thin-layer chromatography on reversed phase can be used successfully as a measure of lipophilicity of the newly synthesized N-substituted-2phenylacetamide derivatives.
In addition to the method of linear regression, correlation of the measured parameters of lipophilicity was also preformed by applying different multivariate methods: cluster analysis (CA) and principal component analysis (PCA).
The CA and PCA procedures were performed on the data matrix, where the rows (cases) correspond to the phenylacetamide derivatives, whereas the columns (variables) correspond to the lipophilicity calculated in different ways.In the cluster analysis the Euclidean distance was used as the measure of dissimilarity between objects, and Ward's linkage method was applied to test the linkage measurement.
The results obtained by using cluster analysis for different parameters of lipophilicity are presented in Figure 3, and for the investigated compounds in Figure 4.
According to the results presented in Figure 3, the cluster analysis has grouped different lipophilic parameters into two clusters.The first cluster includes lipophilic parameters calculated in different ways, (xlog P, Alog P, Mlog P and ABlog P) mainly obtained by using the atombased methods, except ABlog P.
The second cluster contains the remaining lipophilic parameters, mainly obtained by using the fragmental methods except AClog P.
Deviation of ABlog P can be explained by the fact that this lipophilic parameter, in contrast to the other fragmental log P, combines the advantages of reductionistic and constructionistic approaches. 33Clog P belongs to atom-based methods, in which the obtained result is highly dependent on the atom type parameterization.Separation of AClog P from the other atomic lipophilic parameters can be interpreted with Actelion's AClog P definition strategy which this log P follows. 34hat is important to emphasize is that the second cluster can be divided into two sub-clusters; one sub-cluster involves lipophilic parameters obtained experimentally (R M 0 obtained in used organic modifier), and the other sub-cluster lipophilic includes the parameters milog P, AClog P and kowwin.
The grouping of the studied lipophilic parameters in this way indicates that the cluster analysis is sensitive enough to distinguish between the experimental and the computational values of lipophilic parameters, and also this distribution of parameters suggests that experimental values are in better agreement with the standard measure of lipophilicity with which they formed the same cluster.
Figure 4 shows that the application of the CA procedure on phenylacetamide derivatives resulted in the dendrogram with three clearly defined clusters.The first cluster contains the two most lipophilic molecules (2 and 5), the second cluster comprises compounds that have the most polar character of all (8, 9 and 11) and the third cluster involves compounds with moderate lipophilicity.
Principal component analysis is a very useful technique for investigations which contain a lot of data obtained in different ways, because elimination of the redundancy of the data and reduction of the data volume can be achieved by recognizing the basic components.PCA has the ability to decompose the original retention data matrix into several products of multiplication: loading (retention data) and score (investigated compounds) vectors, 35 whereby new variables, so-called principal components, PC, are obtained.A maximum of the total variance should account for the first principal component (PC1), the second PC should be uncorrelated with the first one and should account for a  maximum of residual variance, and so on until the total variance is accounted for.
In our investigation the three principal components have described 98% (PC1 85.44%, PC2 10.94% and PC3 1.62%) of total variance in the data.
Loading plots, which can show similarity between the analyzed data obtained by applying PCA procedures on the lipophilicity calculated in different ways (variables), are presented in Figure 5 for two dominant PC (PC1 vs. PC2).
Two specific groups of lipophilic parameters can be registered in Figure 5.The first group includes lipophilic parameters which have negative values for both PCs (AClog P, ABlog P, Mlog P, Alog P and xlog P), and which were obtained by using atom-based method and fragmental ABlog P. Partial separation of Mlog P, Alog P and xlog P is observed.The lipophilic parameters obtained experimentally and the two remaining fragmental partition coefficients, milog P and kowwin formed the second group with negative values for PC1 and positive values for PC2.
This classification is very similar to the grouping of compounds, which was obtained in cluster analysis; the only difference is the grouping of AClog P. PCA, unlike CA, has recognized and classified AClog P among the other lipophilic parameters obtained by using the atom-based method.
As in the case of cluster analysis, it can be concluded that greater similarity exists between the chromatographic retention constant, R M 0 , and the fragmental partition coefficients (milog P and kowwin) than with other log P values.The worst agreement can be expected between R M 0 and Mlog P, Alog P, xlog P.
The possibility of separation of the tested derivatives based on differences in their lipophilicity is one of the important features of PCA.Corresponding score plots obtained by PCA for the two dominant PC (PC1 85.44% and PC2 10.94%) are presented in Figure 6.
From Figure 6 it is evident that the first two PCs resulted in almost perfect classification of the compounds based on the nature of the substituent related to the nitrogen atom of the amide group.The investigated compounds were grouped more or less in accordance with their chemical structures.The differences between them are reflected in the value of PC1.Compounds with lipophilic substituent have negative values of PC1, while in the case of compounds with unlipophilic groups a positive PC1 is registered.The most lipohilic compound 2 has the most negative value of PC1, which is the occurrence already registered in the case of the substituted phenylacetamide derivatives. 36The most positive PC1 was calculated for compound 8 which has the smallest log P values.
It should also be noted from Figure 6 that compounds are separated based on the value of PC2.Compounds which contain aromatic rings as substituents have negative PC2 values, in contrast to compounds with heterocyclic substituents, which are described with positive PC2 values.Grouping of compounds in such way cannot be registered in cluster analysis but is in a full accordance with the results obtained by the linear regression analysis.

Conclusions
Eleven N-substituted-2-phenylacetamide derivatives were studied in order to define their lipophilicity.Lipophilic characteristics of the investigated compounds were determined in two ways: experimentally by using reversed phase thin layer chromatography in presence of different organic modifiers (ethanol and dioxane), and computationally from structural formula applying different mathematical methods within relevant software packages.
The obtained chromatographic results indicate that the retention of investigated compounds depends on the nature on the substituent on the nitrogen atom rather than the selected organic solvent.In order to establish the dependence between experimentally obtained R M 0 as a measure of lipophilicity and partition coefficient, log P,  as a standard measure of lipophilicity, classical linear regression analysis and multivariate methods, CA and PCA were performed.
The obtained results indicate that all used methods gave an approximately similar grouping of the studied lipophilic parameters and investigated compounds.In this way two important things are demonstrated.First, that chromatographic retention parameters of the investigated N-substituted-2-phenylacetamide derivatives obtained by RP-TLC are in good agreement with the standard measure of lipophilicity, which means that R M 0 of the investigated compounds could be used for the description of their lipophilicity.Second, that the used chemometric methods are able to reduce the dimensionality of the data and create a connection between large numbers of data obtained in different ways.

Figure 3 .
Figure 3. Dendrogram of the lipophilicity parameters in the space of 11 measured values.

Figure 4 .
Figure 4. Dendrogram of 11 compounds in the space of 11 lipophilicity parameters.

Figure 5 .
Figure 5. Loading plots as a result of PCA.

Figure 6 .
Figure 6.Score plots as a result of PCA.

Table 1 .
R f values of N-substituted-2-phenylacetamide derivatives in the C-18 RP-TLC stationary phase in a variety of mobile phases containing 40% organic modifier and 60% water

Table 2 .
Extrapolated R M 0 values, slope (m), correlation coefficients (r), and the standard deviation of estimation (sd), of TLC equations R M = R M 0 + mj

Table 3 .
Different computational log P values calculated for the investigated N-substituted-2-phenylacetamides Figure 2. Relationships between AClog P values and retention constant, R M 0 , in dioxane

Table 4 .
Correlation aIn correlation are included compounds with aromatic substituent; b in correlation are included compounds with heterocyclic substituent.