QSAR Study on Caffeine Derivatives Docked on Poly ( A ) RNA Polymerase Protein Cid 1

Caffeine is the most commonly ingested alkylxantine and is recognized as a psycho-stimulant. It improves some aspects of cognitive performance, however it reduces the cerebral blood flow both in animals and humans. In this paper a QSAR study on caffeine derivatives, docked on the Poly(A)RNA polymerase protein cid1, is reported. A set of forty caffeine derivatives, downloaded from PubChem, was modeled, within the hypermolecule strategy; the predicted activity was LD50 and prediction was done on similarity clusters with the leaders chosen as the best docked ligands on the Poly(A)RNA polymerase protein cid1. It was concluded that LD50 of the studied caffeines is not influenced by their binding to the target protein.


INTRODUCTION
AFFEINE (1,3,7-trimethylxanthine) is found in vary quantities in some plants: coffee beans, tea leaves, cocoa beans etc. [1][2][3] Caffeine selectively reverts the inhibitory effect of adenosine. [4]There are evidences that caffeine might cause an increase in hypoxic pulmonary vasoconstriction but improbably it contributes to the development of high altitude pulmonary edema. [5,6]tructure of Poly(A) RNA polymerase protein cid1 (see Figure 1) revealed that caffeine can be accommodated at the active site, the binding difference within different derivatives suggesting how this enzyme selects UTP (pyrimidine nucleoside triphosphate) over other nucleotides. [7]olecular docking has become a standard tool in computational chemistry for predicting the binding affinity and orientation of small molecule ligands to protein targets in order to predict the activity of ligands. [8]n a previous work, [9] we have performed a QSAR study on a set of flavonoids, by the similarity cluster prediction approach, proposed by TOPO Group Cluj. [10]In this paper we continue the investigation with a docking study to identify the geometric description of a pharmacophore in the interaction of this class of ligands with Poly(A) RNA polymerase protein cid1. [11]uantitative structure-activity relationship (QSAR) searches relate the molecular structure information to biological and other activities by developing a quantitative model.Because of their great number and positive biological effects, caffeine is a popular subject for QSAR.
In this study, clusters of similar structures (aimed to be quasi-congeneric subsets, in a better prediction of the toxicological activity) were chosen, with the leaders the best scored in the docking on the target protein cid1.

MOLECULAR DATA
Molecular docking was carried out by using AutoDock Vina docking software, [12][13][14] in order to explore the binding mode of caffeine derivatives (Table 1) at the binding pocket of Poly(A) RNA polymerase protein cid1 and to understand their structure-activity relationship.The protein Poly(A) RNA polymerase protein cid1 (Figure 1) was downloaded from RCSB protein data bank, bearing the PDB code-4FH3. [15]A set of 40 Caffeine derivatives were taken from Pub-Chem Database (in Smiles code, Table 1).The three dimensional structure of the caffeine was downloaded in sdf format using Pubchem [16] and converted to PDB format using OpenBabel 2.3.2 [17] for further use in docking studies.For targeting protein 4FH3 interactions, the critical binding motifs were replaced by caffeine derivative ligands.The ligands, with their molar mass, molecular formula, and number of torsions are given in Table 2.

COMPUTATIONAL DETAILS
In the present study, a molecular docking analysis has been performed on 40 caffeine derivatives on the Poly(A) RNA polymerase protein cid1; a further QSAR study was done to predict their LD50.The structures have been optimized at HF (6-2g(p)) level of theory, in gas phase, by Gaussian 09. [18]Topological indices have been computed by TOPOCLUJ software; [19] some of them (Cluj indices: IEmax and IEmin, SD) and LD50 (on mouse, oral route administered) are listed in Table 3 with the highest correlation QSAR model.

Docking at Poly(A) RNA Polymerase Protein Cid1
To study the interaction between caffeine derivatives and 4FH3, AutoDock Vina, a molecular modeling program, was run; data were collected in Table 4. Interaction ligand-protein is illustrated in Figures 2 and 3. A grid box size of x = -13.133Å, y = 2.669 Å, z = -10.786Å was generated and allocated at the center of the receptor binding site.
To obtain a pharmacophore model that fits at the receptor Poly(A) RNA polymerase protein cid1, conformers with the most favorable interactions with the receptor resulting from docking, were chosen.Ligands 2, 18, 23 and 38 have the lowest binding energy between -7.5 and -7.3; based on these compounds we constructed the pharmacophore (by using HyperChem7.52 [20] and PyMOL [21] software programs).The resulting pharmacophore is shown in Figure 5.
It contains three pharmacophore centers:  Nucleophilic site of the substituted imidazole nitrogen atom  Strong nucleophilic site of carbonyl groups  Nitrogen atom substituted by an isobutyl group

QSAR STUDY
This study was performed following Diudea's algorithm: [22] an alignment of molecules over a hypermolecule [25] is performed and described by correlation weighted local descriptors (e.g.fragment mass, partial charges, etc.) coupled with a predictive validation of the model within similarity clusters [23] performed for each molecule in the test set.
Data Set A set of 40 molecular structures, belonging to the class of caffeine, have been downloaded from the Pubchem database (Table 1), together with their LD50.The set was split into the training set (25 molecules) and test set (15 molecules, taken with the lowest docking energy).
A hypermolecule (Figure 6) was built up as the reunion of all structural features in the 40 molecules under study.Hypermolecule works like a biological receptor, over which the ligands (i.e.caffeines) are aligned.Thus, according to this fitting, binary vectors were constructed, with 1 when for a given position of the hypermolecule exists an atom in the current molecule, and zero, otherwise.In the above binary vectors, the values 1 are next replaced by local characteristics: partial charges, mass fragments or local topological descriptors.We used here partial charges in building the weighted vector for every molecule; the modeled property was LD50.

Data Reduction
Before starting to build the models, the descriptors with a variance lower than 10 % and intercorrelation larger than 0.80 have been discarded.With the reduced number of desriptors, a correlation over all the positions in the hypermolecule was performed; the correlating coefficients of the statistically significant positions in the

Model Validation (a) Leave-one-out
The performances in leave-one-out analysis related to the models listed as the best in Table 5 are presented in Table 6. [25]) External Validation The values LD50 for the test set of caffeine were calculated by using entry 11 in Table 5.Data are listed in Table 7 and the monovariate correlation: LD50 = 0.918 × LD50calc.+ 129.9; n = 15; R 2 = 0.929; s = 153.272;F = 169.735 is plotted in Figure 7.

(c) Similarity Cluster Validation
Validation can also be performed by using clusters of similarity: each of the 15 molecules in the test set (chosen as the best scored in the docking set) is the leader of its own cluster, selected by 2D similarity among the 20 structures of the learning set (each cluster comprising about 14-17 molecules).The values LD50 for the test set of caffeine were calculated by using the learning equations (with the same descriptors as in entry 11, Table 5) from each of the 15 clusters.Data are listed in Table 8 and the monovariate    QSAR study results show that, if one uses the similarity cluster validation (R 2 = 0.951) the correlation is higher than in case of the external validation (R 2 = 0.929).
The lowest binding energy of the molecules in the test set correlates with LD50calc.R 2 = 0.032, with no statistical meaning; it means that the toxicity of caffeines is not related to the interaction with this protein cid1, more studies being necessary to find the cause of their toxicity.However, the lowest docking energy ligands were helpful in the choice of molecules in the test set and this choice was clearly better (R 2 = 0.951) than in case of the random choice (R 2 = 0.893 -see Caffeine CEEJ, [26] computed, however by the mass fragment description ).

CONCLUSIONS
In this paper a qsar study on 40 caffeine derivatives, docked on the protein (4FH3), was reported.Molecular docking was performed to investigate the binding modalities of ligands toward possible targets comprised in poly (A) polymerase Cid1 (4FH3).A further QSAR study suggested that LD50 is not a result of interaction of caffeines with Cid1 protein, the docking energies being not correlated with the reported toxicity.However, the docking information was helpful in the choice of leaders for the similarity test set, increasing the accuracy of the predicted LD50 values.

Figure 2 .
Figure 2. Active site analysis by Ligand Explorer.

Figure 3 .
Figure 3.The interaction of caffeine with Poly(A) RNA polymerase protein cid1.

Figure 4 .
Figure 4.The free energy of binding elicited at the vicinity of active site by the caffeine ligands.

Table 3 .
LD50 and topological indices computed for the caffeines in Table1

Table 4 .
The final Lamarckian genetic algorithm docked state -Binding energy of ligands with the active site of the protein during nine conformations

Table 5 .
The best models in LD50 in the training set of caffeine in Table1

Table 7 .
Calculated values of LD50 for the molecules in the test set (Table1)

Table 8 .
Calculated values of LD50 by similarity clusters, for the molecules in the test set

Table 6 .
Leave-one-out analysis for best LD50 models