Unify QSAR approach to antibacterial activity of organic drugs against different species

There are many different kinds of pathogen bacteria species with very different susceptibility profile to different antibacterial drugs. One limitation of QSAR models are the biological activity of drugs against only one bacteria species. In previous paper we develop one unified Markov model to describe the biological activity of different drugs tested in the literature against some of the antimicrobial species. Consequently predicting the probability with which a drug is active against different bacteria species with a single unify model is a goal of the major importance. This work develops one unified Markov model to describe the biological activity of more than 70 drugs tested in the references against to 96 bacteria species. Linear Discriminant Analysis (LDA) classifying drugs as active or non-active against the different tested bacteria species processed the data. The model correctly classifies 199 out of 237 active compounds (83.9%) and 168 out of 200 non-active compounds (84%). Overall training predictability was 84% (367 out of 437 cases). Validation of the model was carring out by means of external predicting series, classifying the model 202 out 243, 83.13% of compounds. In order to show how the model function in practice a virtual screening was carring out recognizing the model as active 84.5%, 480 out of 568 antibacterial compounds not used in training or predicting series. The present is an attempt to calculate withing a unify framework probabilities of antibacterial action of drugs against many different species. ________________________________________________________________________________________ *corresponding author: gonzalezdiazh@yahoo.es or qohumbe@usc.es


Introduction
With the increase in resistance of bacteria to antibiotic treatment, attention has focussed on developing novel means of anti-microbial therapies.One approach is to exploit natural mechanisms used by mammals including humans to combat microbial invaders.Modern rational drug design widely relies on building extensive QSAR (quantitative structureactivity relationships) models which represent a substantial part of the current 'in silico' research.QSAR can then be utilized to optimizing both the activity profile for the molecule and its chemical synthesis. 1Disappointingly; QSAR studies are generally based on databases considering only structurally parent compounds acting against one single microbial species.As a consequence, to predict the antimicrobial activity for a given series of compounds one have to use/seek as many QSAR models as microbial species drugs susceptibility is desirable to predict. 2 In previous paper, we develop one unified Markov model to describe the biological activity of different drugs tested in the literature against different antimicrobial species.In this sense, it is very important the report of one single unified equation to calculate the probability of activity of a given drug against different antimicrobial species.
Bacteria infections have increased dramatically during the past years.The bacteria have been the cause of some of the most deadly diseases and widespread epidemics of human civilization.Bacterial diseases such as tuberculosis, typhus, plague, diphtheria, typhoid fever, cholera, dysentery, and pneumonia have taken a mighty toll on humanity.Water purification, immunization (vaccination) and modern antibiotic treatment continueto reduce the morbidity and the mortality of bacterial disease in the Twenty-first Century, at least in the developed world where these are acceptable cultural practices.However, many new bacterial pathogens have been recognizing in the past 25 years and many bacterial pathogens, such as Staphylococcus aureus and Streptococcus pneumoniae, have emerged with new forms of virulence and new patterns of resistance to antimicrobial agents. 3][10][11] In any case, no one of these indices have been extended yet to encode additional information to chemical structure.Our group has introduced elsewhere one Markov Model (MM) encoding molecular backbones information, with several applications in bioorganic medicinal chemistry.The method was named the MARCH-INSIDE approach, MARkovian CHemicals IN SIlico Design.It allowed us introducing matrix invariants such as stochastic entropies and spectral moments for the study of molecular properties.Specifically, the stochastic spectral moments introduced by our group have been largely used for small molecules QSAR problems including design of fluckicidal, anticancer and antihypertensive drugs.][18][19][20][21][22] In recent studies, the MARCH-INSIDE method has been extended to encompass molecular environment interesting information in addition to molecular structure.This new interpretation allows calculating molecular thermodynamic free energy for many physicochemical and biological processes. 23,24This approach is able to take into consideration for instance not only the molecular structure of the drug but the free energy of its interaction with the specific microbial organism the drug has to eliminate, too.The present study develops a single linear equation based on these previous ideas to predict the antibacterial activity of drugs against different species.

Markov model for drug-target step-bystep interaction
We will consider a hypothetical situation in which a drug molecule is free in the space at an arbitrary initial time (t 0 ).It is then interesting to develop a simple stochastic model for a step-by-step interaction between the atoms of a drug molecule and a molecular receptor in the time of beginning of the pharmacological effect.For the sake of simplicity, we consider a model in which unknown or not taken into consideration the chemical structure of the receptor.
Let be, the initial contribution of the j-th atom to the drug-receptor interaction is 0 c j (s).In this symbol the c points to contribution, the 0 indicates that we refer to the initial interaction atom-receptor, and the s indicate that the contribution depends on the specific microbial species.Afterwards, we have to define the contribution k c ij (s) of interaction between the j-th atom and the receptor given that ith atom has been interacted at previous time t k .With respect to 1 c ij (s) we must taking into consideration that once the j-th atom have interacted the preferred candidates for the next interaction are such i-th atoms bound to j by a chemical bond.In particular, immediately after of the first interaction (t 0 = 0) takes place an interaction 1 c ij (s) at time t 1 = 1 and so on.In consonance, we defined 1 c ij (s) = α ij • 0 c j (s), being α ij = 1 if the j-th atom is adjacent to the i-th one and α ij = 0 otherwise.So, one can suppose that, atoms binds to its receptor in discrete intervals of time t k .There several alternative ways in which such step-by-step binding process may occur.Figure 1 illustrates this idea.

Figure1. Stochastic drug-target step-by-step interaction
Markov Model allowed us to derive the average contributions k C s of the atoms in the molecule to the gradual interaction between the drug and the receptor at a specific time k in a given microbial species (s).We derive these k C s by summing up all the atomic contributions of interaction 0 c j (s) premultiplied by the absolute probabilities of drugtarget interaction A p k (j,s): [23][24][25] ( ) ( ) ( ) Such a model is stochastic per se (probabilistic step-by-step atom-receptor interaction in time) but also considers molecular connectivity (the step-bystep atom union in space throughout the chemical bonding system).The markov model for drug-target step-by-step interaction method was describe in a previous paper. 26

Statistical analysis
As a continuation of the previous sections, we can attempt to develop a simple linear QSAR using the MARCH-INSIDE methodology, as defined previously, with the general formula: Here, k C s act as the microbial species specific molecule-target interaction descriptors.We selected Linear Discriminant Analysis (LDA) 18 to fit the classification functions.The model deals with the classification of a set of compounds as active or not against different microbial species.A dummy variable (Actv) was used to codify the antimicrobial activity.This variable indicates either the presence (Actv = 1) or absence (Actv = -1) of antimicrobial activity of the drug against the specific species.In equation ( 8), b k represents the coefficients of the classification function, determined by the least square method as implemented in the LDA module of the STATISTICA 6.0 software package. 27orward stepwise was fixed as the strategy for variable selection. 19,20he quality of LDA models was determined by examining Wilk's U statistic, Fisher ratio (F), and the p-level (p).We also inspected the percentage of good classification and the ratios between the cases and variables in the equation and variables to be explored in order to avoid over-fitting or chance correlation.Validation of the model was corroborated by re-substitution of cases in four predicting series. 26,27

Data set
The data set was conformed by a set of marketed and/or very recently reported antibacter drugs with a MIC 50 ≤ 10 μM against different bacterias.The three data sets used were as follows training series: 199 active compounds plus 168 non-active compounds (367 in total); predicting series: 137 + 106 = 243 in total; virtuals screening 568 active compounds.The literature reports experimental test of each drug against some but not all species of a list of 137.In consequence, we were able to collect 1248 cases (drug/species pairs).The names or codes for all compounds as well as the references consulted can be obtained from the corresponding author upon request.

Results and discussion
The advantage of the present stochastic approach is the possibility of deriving average contributions to the biological activity depending on the probability of the states of the MM.4][25] In specific, this work is the first one that introduces a single linear QSAR equation model to predict the antibacterial activity of drugs against different species.
The best model found was: Where, λ is the Wilk's statistics, statistic for the overall discrimination, F is the Fisher ratio, and p the error level.In this equation, k C s where calculated for the totality (T) of the atoms in the molecule or for specific collections of atoms.These collections are atoms with a common characteristic as for instance are: halogens (X) or unsaturated Carbon atoms (C) or heteroatom-bound hydrogen atoms (H-Het).Summary for the forward-stepwise analysis shows the variables that enter first in the model (Table1).Table1.Summary for the forward-stepwise analysis.In addition, we used a ROC curve (see Figure 2) to investigate the reability of the model, being the areas under curve equal to 0.86 for predicting series and 0.82 for training ones.It indicates that the present model give results statistically significant and clearly different from those obtained with a random classifier (area = 0.5).
In order to show how to use the model in practice we carried out a virtual screening recognizing 480 out of 568 antibacterial compounds (84.5%).These compounds where never used in training or predicting series.The more interesting characteristic of the present model is that the k C s used as molecular descriptors depend both on the molecular structure of the drug and the bacterias species against the drug have to act.The codification of the molecular estructure is in first place due to the use of the adjacency factor α ij to encode atom-atom bondig, molecular connectivity.The other aspect that allow encoding molecular structural changes is that the atomic contributions 0 c j (s) are atom-class specific.Consequently, one change in the molecular structure of, e.g.F by O necessarily implies a change in the interaction.In any case, the more interesting fact is that k C s are the first molecular descriptors reported for antimicrobial QSAR studies with the skill of discerning among a large number of bacterial species.This property is related to the definition of the 0 c j (s).The values of these atomic contributions reported herein by the first time for antibacterial action are given in Table 3 for some atoms and some selected species (email corresponding author for detailed compilation with more than 90 species).Atomic contributions for antibacterial property can be ejecutate by the model, not only to distinguish different species (see Table 3), the model can be calculated the atomic contributions from different strains of the same species.One advantage of our model is to mark resistant strains of susceptible strains to a different drug.For instance, the Table 3 shows the atomic contributions to antimicrobial action agaisnt susceptible and resistant strains of Staphylococcus aureus and Staphylococcus epidermidis.For the first of these two species, the regression coefficient between atomic contributions for resistant and susceptible strains is 0.51.
Conversely, the regression coefficient is 0.82 Staphylococcus epidermidis.This notable difference between both regression coefficients possibly refelects how large is the difference between the respective resitant and susceptible strains.In general, the atomic contributions of different atoms to the antibacterial property against all the studies species are connectied between them.The Table 4 shows high regression coefficients for some of contributions.Please, email the corresponding author for details on the names of all the drugs used, the bacterias species tested, and detailed results for training and validation.The above-mentioned flexible definition of the present approach makes it possible to model by the first time the present very heterogeneous antibacterial activity data.In fact, the present is the first reported unify model that allow one predicting antibacterial activity of any organic compound against a very large diversity of bacterial pathogens.As a sort of concluding remark and future research outlook one may note that the present QSAR methodology may be able to predict biological activity of drugs in more general situacions than the tradional QSAR models may be.

Figure 2 .
Figure 2. Results for the ROC curve.

Table 2
Results of the model, analysis, validation and virtual-screening.
Tabla4.Correlation values of atomic values.