Telomerase inhibitory activity by stabilization of G-quartet: A QSAR approach using 2D autocorrelation descriptors

A QSAR approach on a dataset of 546 inhibitors of telomerase activity, by interaction with G-quadruplex DNA, was carried out using 2D autocorrelation descriptors. A linear discriminant analysis (LDA) was made on a training set of 437 compounds and it was assessed with 109 compounds belongs to the test set. The model good classified the 80.09% of the dataset and the percentage of good prediction was 78.89%. Only 5 compounds were not classified. The 2D autocorrelation descriptors are able to explain the factors that stabilize the G-quadruplex structure and consequently the inhibition of telomerase by this biological mechanism.


Introduction
Telomeres are repetitive DNA sequences at the ends of linear chromosomes that protect the chromosome from recombination, end-to-end fusion and nuclease degradation. 1 In human cells, the telomeric DNA is typically composed of 5-15 kb of double-stranded pairs of tandem repeats of the guanine-rich sequence TTAGGG with a single-stranded 30-end overhang necessary to ensure complete chromosomal DNA replication.With each cell division, telomeres shorten by 50-200 bp because synthesis of the lagging strand of DNA is unable to replicate the 3'-end overhang.When the telomeres shorten to a critical length, "normal" cells stop growing and enter to state of senescence where end-to-end fusion and chromosomal instability leads to cell death.A cell can escape from this normal cycle and become immortal by stabilising (capping) the length of its telomeres. 2,3is happens almost always under activation of the enzyme telomerase. 4lomeres are believed to exist in different conformations together with several telomereassociated proteins, such as telomere repeat factors (TRF1, TRF2) and POT1. 5 The Goverhang is accessible for telomerase extension in the open state or inaccessible in a capped (or closed) conformation that involves the formation of a T-loop motif. 5Although the T-loop structure has not been defined in detail, it may be created by the invasion of the G-overhang into the duplex region of the telomere. 6Uncapping of the telomere ends leads to telomeric dysfunction characterized by end-to-end fusion, inappropriate recombination, anaphase bridges, and G-overhang degradation that may lead to either apoptosis or senescence. 7,8cause of the repetition of guanines, the G-overhang is prone to formation of a four stranded G-quadruplex structure that has been shown to inhibit telomerase activity in vitro. 9,10e evidence that most cancer cells activate telomerase whereas normal cells are usually devoid of telomerase activity (with the exception of ongoing proliferating cells such as lymphocytes, basal keratinocytes, intestinal crypt cells, CD34 expressing peripheral blood stem cells, and germline cells) 11 has naturally lead to extensive investigations to detect this protein and its activity for a potential use in cancer diagnosis and prognosis, and to eventually monitor the tumor response to therapy.Finally, these data have greatly inspired the development of various strategies to target telomere and telomerase for cancer therapy.
Small molecules that stabilize G-quadruplexes are effective as telomerase inhibitors and several series of compounds have been identified.The ligands that stabilize G-quadruplex structures include cationic porphyrins, 12 perylenes, 13 amidoanthracene-9,10-diones, 14 2,7disubstituted amidofluorenones, 15 acridines, 16,17 ethidium derivatives, 18 disubstituted triazines, 19 fluoroquinoanthroxazines, 20 indoloquinolines, 21 dibenzophenanthrolines, 22 bisquinacridines, 23 pentacyclic acridinium, 24 telomestatin, 25 and the recently discovered bisquinolinium derivatives. 26,27Due to the peculiar features of the quadruplex structure, as compared to classical double-stranded Β-DNA, a selective recognition of telomeric Gquadruplex by small molecule ligands should be possible. 28,29Some partial selectivity for Gquadruplex relative to duplex DNA was obtained with triazine 19 and with ethidium derivatives 30 and selectivity was significantly enhanced with the natural product telomestatin, 25 with a new series of 2,6-pyridin-dicarboxamide derivatives, 27 and with a porphyrin derivative. 29e search for new drugs against cancer plays a central role in the research programs of pharmaceutical companies and many governmental organizations due to the impact of this disease.Computational models that are able to predict the biological activity of compounds by its structural properties are powerful tools to design highly active molecules.In this sense, quantitative structure-activity relationships (QSAR) studies have been successfully applied for modeling biological activities of natural and synthetic chemicals. 31Graph-theoretical and topological methods are included in the most QSAR studies.Among these methods, 2D spatial autocorrelations has been successfully used for modeling log P values, 32 biological activities, 33,34 for pharmaceutical research 35 and toxicological research. 36

2D-autocorrelation descriptors
In this research were used the 2D autocorrelation descriptors to developed the QSAR function.The mathematical details of the method have been largely reported, [37][38][39][40][41][42] thus we will outline only the fundamental remarks.
As is well-know the binding of a substrate to its receptor dependents on the shape of the substrate and a variety of effects such as the molecular electrostatic potential, polarizability, hydrophobicity and lipophilicity. 43[51][52] Autocorrelation vectors have several useful properties.First, a substantial reduction in data can be achieved by limiting the topological distance, l.Second, the autocorrelation coefficients are independent of the original atom numberings, so they are canonical.
Finally, the length of the correlation vector is independent of the size of the molecule. 53r the autocorrelation vectors, H-depleted molecular structure is represented as a graph G and physico-chemical properties of atoms as real values assigned to the vertices of G (Table 1).
These descriptors can be obtained by summing up the products of certain properties of two atoms located at given topological distances or spatial lag in G.
Three spatial autocorrelation vectors were employed for modeling the inhibitory activity.
Moran's index 54 ( ) where d ij is the topological distance or spatial lag between atoms i and j.Spatial autocorrelation measures the level of interdependence between properties, and the nature and strength of that interdependence.It may be classified as either positive or negative.In a positive case, all similar values appear together, while a negative spatial autocorrelation has dissimilar values appearing in close association. 54,55In a molecule, Moran's and Geary's spatial autocorrelation analysis tests whether the value of an atomic property at one atom in the molecular structure is independent of the values of the property at the neighboring atoms.If dependence exists, the property exhibits spatial autocorrelation. 33,34e calculation of the 2D-autocorrelations descriptors was carried out by means of the software package DRAGON version 5.4. 56We used atomic masses, atomic van der Waals volumes, atomic Sanderson electronegativities and atomic polarizabilities as weighting properties.Autocorrelation vectors were calculated at spatial lags l ranging from 1 up to 8. The total number of computed descriptors was 96.Descriptors with constant or near to constants values were discarded.
Compounds in the external prediction series were never used to develop the classification function.
One of the most important steps in computer-aided search of novel Telomerase inhibitors compounds is to design a representative, randomized training and predicting series.With this aim we selected a large and widely variable data set of 546 compounds inhibitors of telomerase with reported activity by interacting with G-quadruplex DNA.The following table shows the different families of compounds belongs to our data base.µM. 71For this reason, we used this break point.Further, the whole data was dividing in two parts using k-Means Cluster analysis (k-MCA), 49,51,[81][82][83]  set was carried out taking randomly compounds belonging to each cluster.To ensure a statistically acceptable data partition into several clusters, we took into account the number of members in each cluster and the standard deviation of the variables in the cluster (as low as possible).We also made an inspection of the standard deviation between and within clusters, the respective Fisher ratio and their p-level of significance considered to be lower than 0.05. 80,84

Results and Discussion
Once performed a representative selection of training set it was used to fit the discriminant function.The model selection was subjected to the principle of parsimony.
Then we chose a function with high statistical significance but having few parameters (descriptors) as possible.

Structural Interpretation
The factors that stabilize the G-quartet structure are known.Some of them such as The equation 1 shows a positive contribution of MATS5e.This descriptor is able to explain the contribution of electronegative atoms on radius of 5Å such as O and S atoms, although is not possible to relate this kind of interaction with factors related with the the stabilization of G-quartet structure, at this distance.
Also the variable ATS4p has a negative contribution to the inhibition of telomerase.This descriptor is related with polarizability at 4Å of distance.At this distance the hydrophobic interaction not playing a main role yet.For example the quinoxalines (See figure 2a) has three chlorine atoms, and their big size and high polarizability it is in a ratio near to 4Å.This compound has a IC 50 > 1 μM. 85The same occurs with GATS4e

Concluding Remarks
Despite some criticism, there is an increasing necessity of topological-indices-based QSAR models in order to rationalize the drug discovery process.In this sense, the 2Dautocorrelations approach has been extended to the discovery of novel drugs leads, 86,87 but the QSAR models developed with these descriptors have been used only for reduced or homologous series of compounds.Consequently, the model capacity to predict the activity for different structural features is decreased.In the present paper the 2Dautocorrelation approach has been used to obtain good predictive linear models in order to account for TI activity.Thence, we can assert that the 2D-autocorrelation descriptors may be used as an efficient alternative to massive screening of drugs.
I(P k , l), c(p k , d) and A(p k , l) are Moran's index, Geary's coefficient and Broto-Moreau's autocorrelation coefficient at spatial lag l respectively; p ki and p kj are the values of property k of atom i and j respectively; P K is the average value of property k and δ(l, d i j ) is a Dirac-delta function defined as:
variable, which contributes of negative form to the activity.The results shown in the fig 2a and 2b evidence this performance.In the second case (fig 2b), the acridone has six atoms of fluorine to this distance and this compound has IC 50 >139 μM 69

Table 1 .
Representation of different molecular graphs G and topological distances or

Table 1 .
Some kinds of chemical compounds used in this study.
training set 80% of the whole data, with 437 (183 compounds with tel IC 50 ≤ 1 µM and 254 compounds with tel IC 50 >1 µM) and test set 20% of the data with 109 compounds (47 compounds with tel IC 50 ≤ 1 µM values and 62 with tel IC 50 >1 µM).Selection of the training and prediction To derive a discriminant function that permits the classification of chemicals as active (Telomerase Inhibitors) or inactive (non-Telomerase Inhibitors) we use the linear discriminant analysis in which 2D autocorrelations descriptors were used as independent variables.The classification model obtained is given below together with the statistical parameters of the LDA: One of the most important criteria for the acceptance or not of a discriminant model, such as model(1), is based on the statistics for the external prediction series.Model 1 2= 1.61 ρ =26.14In this model the coefficient U is the Wilk's statistics, D 2 is the squared Mahalanobis distance, and F is the Fisher ratio.The Wilk's U-statistics for the overall discrimination can take values on the range from 0 (perfect discrimination) to 1 (no discrimination).For the discrimination of active/inactive compounds studied here, the model classified