MI-DRA 3 D : new model for reconstruction of US FDA drug-target network and theoretic-experimental studies of rasagiline derivatives inhibitors of AChE .

The Neurodegenerative diseases have been increasing in the last years. Many of the drug candidates to be used in the treatment of neurodegenerative disease present specific 3D structural features. One important protein in this sense is the acetylcholinesterase (AChE); which is the target of many Alzheimer's dementia drugs. Consequently, the prediction of Drug-Proteins Interactions (DPIs/nDPIs) between new drug candidates with specific 3D structure and targets it is about the major importance. For it, we can use Quantitative Structure-Activity Relationships (QSAR) models to carry out rational DPIs prediction. Unfortunately, many previous QSAR models developed to predict DPIs take into consideration only 2D structural information and codify the activity against only one target. To solve this problem we can develop one 3D multi-target QSAR (3D MtQSAR) models. In this communication, we introduce the technique MI-DRA 3D a new predictor for DPIs based two different well-known software. We use the software MARCH-INSIDE (MI) and DRAGON to calculate 3D structural parameters for drugs and targets respectively. Both classes of 3D parameters were used as input to train Artificial Neuronal Network (ANN) algorithms using as benchmark datasets the complex network (CN) formed by all DPIs between US FDA approved drugs and their targets. The entire dataset was downloaded from Drug Bank. The best 3D Mt-QSAR predictor found is one ANN of type Multilayer Perceptron (MLP) with profile MLP 37:37-24-1:1. This MLP classifies correctly 274 out of 321 DPIs (Sensitivity = 85.35%) and 1041 out of 1190 nDPIs (Specificity = 87.48%), corresponding to training Accuracy = 87.03%. We validated the model with external predicting series with Sensitivity = 84.16% (542/644 DPIs; Specificity = 87.51% (2039/2330 nDPIs) and Accuracy = 86.78%. The new CNs of DPIs reconstructed from US FDA can be used to explore large DPIs databases in order to discover both new drugs and/or targets. We carried out theoretic-experimental studies to illustrate the practical use of MI-DRA 3D. First, we reported the prediction and pharmacological assay of 22 different rasagiline derivatives with possible AChE inhibitory activity.

DPIs of one drug enabling us to re-construct large drug-target or DPIs Complex Networks (CNs).In this study, we reported the prediction and pharmacological assay of 22 different rasagiline derivatives with AChE inhibitory activity.The present work reports the attempts to calculate within unified DPIs.All this can help to design new inhibitors of AChE.A very good MI-DRA 3D QSAR model was obtained, and the subsequent combined QSAR & CN analysis may become of major importance for the prediction of the activity of new compounds against different targets or the discovery of new targets.In this sense we reported an illustrative study that combines both experiment and theory to show how to use this model in practical situations.We reported the prediction and pharmacological assay of rasagiline derivatives with AChE inhibitory activity.In Figure 1 we depict a flowchart with the main steps given in this work to train and validate the ANN classifier.

Computational methods 2.1.1 MOPAC AM1 Optimization geometry method using CS CHEM 3D.
Molecular structures of all FDA drugs were generated with CHEM 3D Ultra (version 2005).The energy of each intermediate was then minimized using the semi-empirical MOPAC method with a minimum RMS gradient of 0.100, which specifies the convergence criteria for the gradient of the potential energy surface.The geometry of the molecules was Optimized and the values of the quantum chemical descriptors of each compound were calculated using AM1.AM1 theory was used with a closed shell function.The MOPAC AM1 method was selected because it was a semi-empirical quantum chemical method and the computational time was much shorter than that needed by ab initio method.

MI-DRAGON technique
3D Parameters for drugs.The DRAGON software 4.0 [19] was utilized here to calculate the 3D parameters of drugs.It depends on whether they are computed from the chemical formula, substructure list representation, molecular graph or geometrical representation of the molecule, respectively [20,21].In this work, we calculated only GETAWAY 3D descriptors.We use these descriptors after optimized for use with 3D descriptors.3D Parameters of proteins.In previous works we have predicted protein function based on different protein structural parameters derived from a Markov matrix that account for electrostatic interactions between amino acid pairs in the 3D structure of the protein.One of the classes of parameters used was called the Shannon Entropy T θ k (R) of the Markov matrix.These values are used here as inputs to describe information about the structure of the drug target proteins (T) in order to construct the mt-QSAR models for DTPs.The detailed explanation has been published before [22][23][24][25][26][27][28][29][30] and reviewed in detail more recently [31].As follows we give the formula for T θ k (R) values and some general explanations:


Where, k p i (R) values are the absolute probabilities with which the effect of the electrostatic interaction propagates from the amino acid i th to other amino acids j th next to it and returns to i th after k-steps.These probabilities refer to: aminoacids considered isolated in the space (k = 0), interaction between aminoacids in direct contact (k = 1) or spatial (k > 1) indirect interactions between amino acids placed at a distance equal to k-times the cut-off distance (r ij = k •r cut-off ) in the residue network.Euclidean 3D space r 3 = (x, y, z) coordinates of the C α atoms of amino acids listed in protein PDB files.For calculation, all water molecules and metal ions were removed [32].All calculations were carried out with our inhouse software MARCH-INSIDE 2.0 [32].For the calculation, the MARCH-INSIDE software always uses the full matrix, never a sub-matrix, but the last summation term may run either for all amino acids are only for some specific protein regions (R) denoted as: c for core, I for inner, m for middle, and is for surface regions, respectively).Consequently, we can calculate different T θ k (R) for the amino acids contained in the regions (c, i, m, s, or t) and placed at a topological distance k each other within this orbit (k is the order) [22,23,[33][34][35].In this work, we have calculated altogether 5 (types of regions) x 6 (orders considered) = 30 T θ k (R) indices for each protein.

Statistical analysis.
Let be D θ k (G) entropy descriptors molecular that codify information about drug structure and T θ k (R) entropy descriptors that codify information about drug target proteins; we attempt to develop a simple mt-QSAR model in the form of a linear classifier with the general formula: We used Linear Discriminating Analysis (LDA) to fit this discriminant function.The model deals with the classification of a compound set with or without affinity of different receptors.A dummy variable Affinity Class (AC) was used as input to codify the affinity.This variable indicates either high (AC = 1) or low (AC = 0) affinity of the drug of the receptor.S(DTP) pred or DTP affinity predicted score is the output of the model and it is a continuous dimensionless score that sorts compounds from low to high affinity to the target coinciding DTPs with higher values of S(DTP) pred and nDTPs with lower values.In equation ( 6), b represents the coefficients of the classification function, determined by the LDA module of the STATISTICA 6.0 software package [36].We used Forward Stepwise algorithm for a variable selection.The statistical significance of the LDA model was determined calculating the p-level (p) of error with the Chi -square test.We also inspected the Specificity, Sensitivity, and total Accuracy to determine the quality-of-fit to data in training.Cases for training set were selected at random out of the cases in full data set.The remnant cases were used to validate the model.The validation of the model was corroborated with these external prediction series; these cases were never used to train the model.The ration between training/validation set was 2/1 approximately.This procedure to select training and validation sets is largely known and used to train QSAR models [37][38][39][40][41][42][43].
2.1.4ANN analysis.The non-linear mt-QSAR model was constructed using ANN analysis.All models trained were carried out in STATISTICA 6.0 [36].In so doing, we used a very simple type of ANN called Three Layers Perceptron (MLP-3) to fit this discriminant function.The model deals with the classification of a compound set with or without affinity of different receptors.A dummy variable Affinity Class (AC) was used as input to codify the affinity.This variable indicates either high (AC = 1) or low (AC = 0) affinity of the drug of the receptor.S(DTP) pred or DTP affinity predicted score is the output of the model and it is a continuous dimensionless score that sorts compounds from low to high affinity to the target coinciding DTPs with higher values of S(DTP) pred and nDTPs with lower values.In equation (2), b represents the coefficients of the LNN classification function, determined by the ANN module of the STATISTICA 6.0 software package [36].We used Forward Stepwise algorithm for a variable selection.
In addition, we can explore more complicated non-linear ANNs in order to improve the accuracy of the classifier.We processed our data with different ANNs looking for a better model.Four types of ANNs were used, namely, Probabilistic Neural Network (PNN), Radial Basic Function (RBF), Linear Neural Network (LNN), and Four Layer Perceptron (MLP-4) [44,45].The quality of all the ANNs (linear or nonlinear) was determined calculating values of Specificity, Sensitivity, and total Accuracy to determine the qualityof-fit to data in training.The validation of the model was corroborated with external prediction series.We also reported ROC-curve analysis (ROC curve can be used to select an optimum decision) for both training and validation series [44,46].
2.1.5Data set.The data set was formed by a set of marketing DPIs with a known affinity of drugs by targets.This dataset is the same benchmark data used in previous works [1,5,12,13] in this area and contains all drugs approved by the US FDA.We download this dataset from the public resource called Drug Bank [12,13,[16][17][18].The data set was formed for more than 519 drugs with their respectively 336 targets.Subsequently, we were able to collect above 4485 cases (drug-protein interactions) instead of 519 x 336 cases.In addition the data set was used to develop ANN models to perform the model.

Complex network construction.
We construct a DPIs network in order to achieve the drug and protein affinity with a network approach.Generally in this network, one node may represent a drug or a target.On the other hand, the edges represent the DPIs; express relationships between pairs of drugs with their targets [1].Anyhow, the nodes representing targets may be of at least two types.In almost all cases reported up to date each target is represented only once in the network.In this class of "static" DPIs network the target is depicted by the node corresponding to the X-ray structure of itself.In this work, we build in total two complex networks.First, we constructed the DPIs networks for the observed data and second, DPIs network predicted by the model.The common steps to construct these networks are: First, using the Excel software in a column we introduce all the proteins, the drugs used quotation marks in our database.Then in another column lists all the cases.At the beginning of this column puts the total number of vertices, there are currently two columns of the name of drugs and protein and their corresponding number of vertices.After, at the end of the columns are placed bows in the first column put the number of vertices for the drug and in another column corresponding to the protein.Then, the file was saved as a .txtformat file.After we had renamed the .txtfile as a .netfile we read it with the CentiBin software [47,48].Finally, using CentiBin we can not only represent the network but also highlight all drugs and targets (nodes) connected by a specific edge or link (DPI).Using this software we can calculate vertex centralities to analyze the relationships between drug targets.

Determinations of cholinesterases activities
The cholinesterase assay method of Ellman was used to determine the in vitro cholinesterase activity [49].The activity was measured by increase in absorbance at 412 NM due to the yellow color produced from the reaction of acetylthiocholine iodide with the dithiobisnitrobenzoate (DTNB) ion.Acetylcholinesterase from human erythrocytes, acetylcholinesterase recombinant expressed in HEK 293 cells and butyrylcholinesterase from human serum was obtained from Sigma.2.2.3 Experimental conditions and kinetics.Enzyme activity was measured using a FLUOstar Optima microplate reader.The assay medium contained phosphate buffer, pH 8.0, 20 mM DTNB, 0.01 U/ml of enzyme and 0.75 µM substrate (acetylthiocholine iodide or butyrylthiocholine iodide).The activity was determined by measuring the increase in absorbance at 412 nm at 1 min intervals for 10 min at 37 °C.In a dose-dependent inhibition studies, the substrate was added to the assay medium containing enzyme, buffer, and DTNB with inhibitor after 10 min of incubation time.All experiments were carried out in duplicate and expressed as mean ± SEM.The relative activity is expressed as the percentage ratio of enzyme activity in the absence of inhibitor, see Table 1 [50,51].We used these properties as input of our model in addition to drug molecular descriptors.The present is the first mt-QSAR model combining DRAGON and MI to predict the probability with which occurs DPIs between a drug and a protein.This type of models lie within the frontiers between classic QSAR for drugs and protein QSAR [33].Some applications of the present model are the prediction of new drugs, new protein receptors or drug targets, and drug binding sites.Based on the algorithms described in materials and methods the best linear model found was the following:   The nomenclature used in the descriptors of the equation is found in Table 2.In this equation, N is the number of cases, χ 2 is the Chi-square and p is the level of error.This model, with 10 variables, classifies correctly 256 out of 321 DPIs (Sensitivity of 79.75%) and 1014 out of 1190 nDPIs (Specificity of 85.21%).Overall Training Accuracy was 84.05%.The validation of the model was carried out by means of external predicting series.The model classifies correctly 498 out of 644 DPIs (77.33%) and 2000 out of 2330 nDPIs (85.84%) in validation series.Accuracy of validation series (predictability) was 83.99%.These results (Table 3) indicate that we developed an accurate model according to previous reports on the use of LDA in QSAR [52,53].

MI-DRA 3D ANN model.
The previous model shows good results with a relatively small number of parameters (10 parameters) and a linear equation.However, as a result of the previous section we decided to carry out an ANN analysis to seek a better model using a non-linear method.Four types of ANNs were used, namely, Probabilistic Neural Network (PNN), Radial Basic Function (RBF), Three Layers Perceptron (MLP-3), and Four Layer Perceptron (MLP-4).See, previous works about the use of these ANNs in protein QSAR [5,13].The Figure 3 depicts the network topology for some of the ANN models tested.In general, at least one ANN of every type tested was statically significant.However, one must note that the profiles of each network indicate that many of these are highly non-linear and complicated models.Models using ANN-QSAR has been demonstrated before; see, for instance, the works of Fernández and Caballero [54,55].We compare different types of networks to obtain a better model.In Table 3 we show the classification matrix of the different networks.The profiles of networks tested were RBF 1:1-1-1:1 with only one variable; LNN 227:227-1:1, which present many variables, and PNN 227:227-14797-2-2:1, which has a very high number of hidden neurons, see Table 3.After that, the simpler but more accurate ANN model found was an MLP (MLP 37:37-24-1:1) with training Accuracy = 87.03%.This was selected as the best network found because it presents both high accuracy and an adequate number of variables accounting for features relevant for DPIs.This ANN presents 37 input variables (24 d k + 13 Θ m ).This leads to 37 neurons in first or input layer (I), 24 neurons in the second layer or first hidden layer (H1) and only one neuron (DPI prediction) in the output layer (O).We depict the ROC-curve for MLP 37:37-24-1:1 to show how reliable was the network model developed, see Figure 4. Notably, the model presented had a ROC curve higher than 0.5.The model presented an area greater than 0.92.From now on we call the ANN MLP 37:37-24-1:1 as the 3D MI DRAGON predictor.

MI-DRA 3D assembly of CNs for DPIs.
The construction of multi-protein CNs that incorporates protein affinity profile for drugs or the same CNs for DPIs is relevant to drug and target screening.And is one application of this model.In order to recall the capacity of MI-DRA 3D to predict new CNs of DPIs we selected the same benchmark database used in previous works [5,13,14]; which includes US FDA approved drugs with their targets.With these goals in mind, we constructed again and manually curated the above-mentioned CN obtaining a graph with 855 vertices or nodes (drugs and proteins) and m = 1016 DPIs (edges).This CN of DPIs have D = 6.7; average topological distances D ij between all pairs of nodes.The same as before, we constructed a new CN of DPIs but connecting only pairs of nodes with DPIs predicted by MI-DRA 3D.In so doing, we obtained a value of D = 7.2 and m = 1256 DPIs.In Figure 5 we illustrated visually both CNs (observed and predicted).In the first instance, we compare this predicted network (MI-DRA 3D) with 2D MI-DRAGON predicted network [14].We compare to observing the similar or dissimilar topology (connectivity pattern structure) between them.Measured in terms of TIs such as: number of nodes (n), number of edges (m), Wiener index (W), diameter (D), the Randic connectivity index (Xr), topological distance (Dist), network average values for radiality (R), node degree (δ), eccentricity (E).In Table 4, we observe all the TIs are similar excepting n, m and w.That means both CNs has a high similarity between them.These results are very interesting, because our MI-DRA 3D model present similar results to the 2D MI-DRAGON model, which results have been published successfully before.To see how reliable and valid is our model.Not only compared to TIs to observe the similarity between both predicted networks, but we study the centrality analysis of given networks too.This type of drug screening and drug target discovery is the calculation of those nodes (drugs or proteins) which are more relevant or important (central) in the graph.
In it we can use numerical parameters that quantify the importance of a node in a graph which are called node centralities C t of type t [56].These nodes identifications using node centralities may help us to identify the most relevant drugs or proteins in analogy to similar procedures developed for PINs; networks of Protein-Protein Interactions (PPIs) [57].In Table 5 we show the predicted results of both node degree centrality (C δ ) and closeness centrality (C clo ) for proteins and drugs present in the database and compare with the predicted results of the 2D-MI-DRAGON model.The parameter C δ measures the local importance of a node by counting the number of nodes directly attached to him [57].
Conversely, C clo measures the global importance of a node in a CN by taking in consideration the inverse of the sum of D ij (C clo = 1/ΣD ij ) [58].Consequently, the higher C δ the higher is the local importance of the node but the higher C clo the lower is the global importance of the node.For instance, the protein 1HA2 is one important protein both locally and globally in this CNs with lower C clo > 4 and a C δ = 26.It means that this protein is both locally and globally important because it is the target of many drugs (high C δ ).This result is similar to obtained from 2D MI-DRAGON model.Another interesting result was simvastatin.Simvastatin is a hypolipidemic drug used to control elevated cholesterol, or hypercholesterolemia.It is a member of the statin class of pharmaceuticals.The primary use of simvastatin is for the treatment of dyslipidemia and the prevention of cardiovascular disease [59,60].Depending on our aims the more important nodes in pharmacological terms don't necessarily have to be the more central in the graph (those with higher C δ and lower C clo ), see Table 5.We show in this example, our model predicts efficiently.We found that the MI-DRA 3D model shows very similar results to the previous model, which has been published with excellent results.

Theoretic-Experimental Study is using MI-DRA 3D predictor
Finally, we illustrated in one theoretic-experimental study the practical use of MI-DRA 3D.We reported the prediction, synthesis, and pharmacological assay of 20 different rasagiline derivatives with AChE inhibitory activity.

MI-DRA 3D prediction of rasagiline derivatives vs. AChE.
In this in silico experiment we used MI-DRA 3D to predict the interaction of the rasagiline derivatives with respect to AChE.For it, we downloaded the 3D structure of AChE protein with PDB ID 1EEA and calculated their structural parameters with MI.We also generated the SMILE codes for these compounds and we use MOPAC AM1 Optimization geometry method for these compounds for calculating their 3D structural parameters with DRAGON.After that, we predicted their propensity to undergo DPIs with AChE using as inputs for the MI-DRA 3D predictor the structural parameters of both the drugs and the protein.In Table 6 we confront the results obtained using this model and the outcomes of the pharmacological assay.No compounds are selective inhibitors of AChE, which is why we used as control galantamine for AChE was.We consider the observed class of active compounds OC = 1 if compound IC 50 < 10 μM this cutoff is in the similar range than other used in previous works [61,62].As we can see in this table all the compound rasagiline derivatives present some activity, But none of these compounds have inhibitory activity in the pharmacological assays.All of our compounds in the pharmacological assay (OC = 0) were inactive.MI-DRA 3D predicted as inactive all compounds, excepting 3. The model classified correctly 19 of 22 compounds tested (86.36%).In this test, our model was compared with pharmacological testing of 22 compounds synthesized by us.And we can observe the effectiveness of our model with experimental data.Also, we note that the model predicts all compounds tested as inactive, this is important because the model allows to discriminate between active and inactive compounds.However, some compounds were not tested by pharmacological assay, that compounds were predicted as inactive using MI-DRA 3D model.We discarded pharmaceutical assays of these compounds; because we consider our model reliable.This kind of model can be used to save efforts and money to perform the pharmacological tests.This is a good example of how reliable is the MI DRAGON 3D model.[14,63].It may help to find new targets for these drugs or discard possible toxicological effects depending on the other targets predicted and/or discarded for these compounds.This type of experiment is about the major importance due to the cost in terms of animal sacrifice, time, materials and human resources of the experimental assay of all compounds against all these targets, see recent reviews by Duardo-Sánchez et al. [64][65][66][67].In fact, over a decade, the US FDA has been engaged in the applied research, development, and evaluation of computational toxicology methods used to support the safety evaluation of a diverse set of regulated products.The basis for evaluating computational toxicology methods is multifactorial, including the potential for increased efficiency, reduction in the numbers of animals used, lower costs, and the need to explore emerging technologies that support the goals of the US FDA's Critical Path Initiative (e.g.To make decision support information available early in the drug review process) [68].
In this experiment, we downloaded the 3D structure of all proteins that are targets of US FDA approved drugs.Next, we calculated the structural parameters of all these proteins with MI.We also generated the SMILE codes for these compounds and and we use MOPAC AM1 Optimization geometry method for these compounds for calculating their 3D structural parameters with DRAGON.After that, we predicted their propensity to undergo DPIs with all US FDA proteins using as inputs for the MI-DRA 3D predictor the structural parameters of both the drugs and proteins.We predicted all proteins in FDA dataset vs. the 22 rasagiline derivatives.We found that most of 22 derivatives were predicted as non-active (low DPIs scores) against most proteins in the FDA database.Consequently, MI-DRA 3D predicts a high selectivity of rasagiline derivatives as AChE inhibitors.We can reach this goal because the model predicts these compounds as nonactive with respect to most proteins that are targets of FDA drugs.Using these results, we constructed a DP-CN for rasagiline derivatives and the FDA dataset (see Figure 6).As a result we obtained a CN with 87 nodes (FDA drugs, proteins, or rasagiline derivatives) and 166 DP (edges, DTPs).As In this network we can see that protein 1EEA (AChE) is predicted to interact with compound 3, this protein is an AChE target [69].These results are good because they agree with the experimental results presented in this paper where the compound 3 show low AChE activity.The use of such complex networks can help us find and predict new drugs-protein interactions, and therefore find new drugs with improved biological activity and fewer side effects, especially in neural disease.

Conclusions
The MI-DRA 3D predictor based on structural parameters of drugs calculated with DRAGON and parameters of proteins calculated with MI.It is possible to seek excellent predictors for DPIs using as input structural parameters of drugs and proteins calculated with different programs and combined with ANN models.Combining MARCH-INSIDE and DRAGON approach and ANN is possible to seek one mt-QSAR classifier to predict with Accuracy > 85% the probability of drugs to bind more than 500 different drug target proteins approved by FDA of USA.MI-DRA 3D predictor is also useful to assemble CNs of DPIs.These CNs computationally assembles offer an alternative to discover new drugs or targets, and explore the selectivity of drugs.In this work, we exemplified these conclusions through the experimental-theoretical study of the AChE activity of new rasagiline derivatives.

Acknowledgments
The authors thank sponsorships for a research position at the University of Santiago de Compostela from Angeles Alvariño, Xunta de Galicia.

Figure 1 .
Figure 1.Flowchart of all steps given in this work to develop the new model.

Figure 2 .
Figure 2. Rasagiline derivatives used in this work.

4  5  5  2 
Entropy of all aminoacids placed in the core region and all the neighbors at distance k ≤ 4 d6  core T Entropy of all aminoacids placed in the core region and all the neighbors at distance k ≤ 5 d7Entropy of all aminoacids placed in the inner region and all the neighbors at distance k Entropy of all aminoacids placed in the middle region and all the neighbors at distance k Entropy of all aminoacids placed in the surface region and all the neighbors at distance k ≤ 0 d9

Figure 3 .
Figure 3. Generic Topology of ANN models trained in this work.

Figure 6 .
Figure 6.DP-CN for rasagiline derivatives and the FDA dataset.

Table 1 .
. Inhibitory activity of different rasagiline derivatives .Common physicochemical properties like entropy have been demonstrated to be useful on protein QSAR

Table 2 .
Detailed list of the symbols and description for all parameters present in the model.

Table 3 .
Comparison of LDA and different ANNs classification models.
DPIs: Drug-Target Pairs for compounds with high affinity; nDPIs: Drug-Target Pair for compounds with nonaffinity; Stat. is statistics, Par. is parameter

Table 5 .
Results of node degree (C δ ) and closeness centrality (C clo ) for 20 proteins and drugs.

Table 6 .
Prediction of rasagiline derivatives with MI-DRA 3D predictor DRUG OC PC Score Structure

2 MI-DRA 3D complex network of rasagiline derivatives vs. US FDA proteins.
An additional use of MI-DRA 3D was to carry out the "in silico" or virtual screening of the new compounds with respect to all other targets previously approved by US FDA