Structural requirements for the binding affinity of some small, non–peptide C5a receptor antagonists

Complement anaphylatoxin 5a (C5a) has been recognized as a potent therapeutic target for anti-inflammatory therapy, thus, blocking the action of C5a on its binding receptors may provide an effective treatment of a variety of inflammatory diseases. However, there have been few clinically available non-peptide C5a receptor antagonists disclosed at present. In pursuit of better anti-inflammatory drugs, quantitative structure–activity relationship studies were carried out in a series of non-peptide C5a receptor antagonists with binding activity using different physicochemical descriptors. The conventional best 2D-QSAR models were developed using a training set of 35 molecules and an external test set of 8 molecules by genetic function approximation (GFA) and stepwise multiple linear regression (Stepwise-MLR) with acceptable r^2^ of 0.773 and 0.863, r^2^~CV~ of 0.752 and 0.775, and r^2^~pred~ of 0.801 and 0.888, respectively, indicating binding activity strongly depends on thermodynamic properties as expressed by the hydrophobicity of molecules.


Introduction
Prolonged activation of the host defense human complement system of plasma proteins contributes significantly to amplifying the inflammatory and cellular responses to stimuli such as infectious microorganisms, chemical and physical injury, radiation, or neoplasia (Lama et al., 1992;Finch et al., 1999), resulting in a cascade of proteolytic cleavages of complement proteins Cl-C5 (Vlattas et al., 1994;Wong et al., 1998), assembly of the membrane attack complex capable of cell lysis (Lama et al., 1992) and release of numerous complement-derived peptides of the anaphylatoxins of C3a, C4a and C5a that interact with cellular components to propagate the inflammatory process (Lama et al., 1992;Vlattas et al., 1994). Biological activities of these anaphylatoxins (Lama et al., 1992) include the cellular release of vasoactive amines and lysosomal enzymes, contraction of smooth muscle, and enhanced vascular permeability. Although broad features of complement activation are known, the details of pathogenesis remain largely unknown (Wong et al., 1998;Harkin et al., 2004).
As is shown in literatures, C5a, a 74-amino acid peptide cleaved from C5 at sites of inflammation or infection during activation of the complement system (CS) (Blagg et al., 2008a(Blagg et al., , 2008b, is a broad pro-inflammatory molecule that binds to G protein-coupled receptors CD88 (C5aR) (Schnatbaum et al., 2006;Barbay et al., 2008). It has been recognized as a very potent inflammatory mediator generated during complement activation (Wong et al., 1998) and a causative or aggravating agent in a variety of inflammatory diseases including rheumatoid arthritis, inflammatory bowel disease, immune complex disease, reperfusion injury, Alzheimer's disease, ischemic heart disease, and adult respiratory distress syndrome (ARDS) (Vlattas et al., 1994;Wong et al., 1998;Finch et al., 1999;Haas et al., 2005). It possesses additional chemotactic biological activities (Lama et al., 1992;Vlattas et al., 1994) that are mediated through specific receptor-ligand interactions, including an increase in Ca 2+ mobilization, activation of neutrophil chemotaxis and aggregation, stimulation of leukotriene and oxidative product release, induction of interleukin-1 transcription by macrophages, enhanced antibody production, and other strong pro-inflammatory response. Thus, C5a is a very intriguing therapeutic target for anti-inflammatory therapy (Haas et al., 2005), blocking the action of C5a on its binding receptors may provide an effective treatment of a variety of inflammatory diseases (Lama et al., 1992;Arumugam et al., 2004, Blagg et al., 2008a, 2008b. Considerable efforts have also been directed toward the discovery of small molecule drugs capable of blocking the complement C5aR response especially, but there have been few clinically available non-peptide C5a receptor antagonists disclosed (Astles et al., 1997;Schnatbaum et al., 2006). For these reasons, it is necessary and also urgent to further understand the C5a structural features important for receptor binding and activation (Vlattas et al., 1994;Finch et al., 1997).
Computational chemistry has been applied widely in the pharmaceutical industry for drug discovery, lead optimization, risk assessment, toxicity prediction and regulatory decisions (Sharma et al., 2008). Traditional computer-assisted quantitative structure-activity relationship (QSAR) studies pioneered by Hansch et al. (1962) provide a rational basis to establish the relationship between physiochemical properties and biological activity of molecules for better understanding the mechanisms of biological performance and show how to improve performance by altering chemical structures of ligands, which increase the probability of success and reduce the time and cost involved in the modern drug discovery process (Neaz et al., 2008). Besides, QSAR method save resources and expedite the process of the development of new molecules and drugs.
There have been many QSAR researches related to modern drug design since it was first introduced. The aim of present work is to derive statistically some significant quantitative structure-activity relationship (QSAR) models for structural requirements of the binding affinity of some non-peptide C5a receptor antagonists, which would aid in search for the novel orally available non-peptide C5a receptor antagonists prior to synthesis.

Data set
A data set of some small, non-peptide C5a receptor antagonists were taken from the published work (Blagg et al., 2008a(Blagg et al., , 2008b. Their C5a receptor binding activity data [ 125 Binding affinity IC 50 (nM)] were taken in molar (M) range and then converted into the corresponding logarithmic values (pIC 50 ) according to the formula: pIC 50 = -logIC 50 . Out of reported 54 molecules, 11 molecules were discarded for which the precise data were not available. The remaining molecules (Table 1) were manually segregated into training (35 molecules) and test (8 molecules) sets (Table 2) based on the suggestions by Oprea et al. (1994), maintaining the structural diversity and wide range of activity in both sets for the subsequent QSAR analysis.

Model building
All computational experiments were performed using Cerius 2 (version 4.10) running on Silicon Graphics O2 R5000 workstation. The molecular geometric structures were constructed using a 3D-sketcher in the Cerius 2 Builder option and partial charges assigned using the Gasteiger method (Gasteiger and Marsili, 1980). Throughout the study, an energy minimization procedure named universal force field 1.02 (Rappe et al., 1992) was employed to generate the lower energy conformation for each molecule. All the structures were subsequently energy minimized until a root mean square derivation 0.001 kcal/ mol· Å was achieved and used in this study (Deokar et al., 2008).

Calculation of descriptors
Different types of physicochemical descriptors for each molecule were generated in the study table using default setting within QSAR+ module of Cerius 2 . There were total 242 nonzero descriptors including E-state-indices, Information_content, conformational, thermodynamic, topological, electronic, structural, and spatial descriptors. Before generating models, the inter-correlation of descriptors was considered and the descriptors with value over 0.7 were removed (Shen et al., 2004). Descriptors used for model generation are listed and described in Table 3.

Genetic function approximation (GFA)
In the present study, genetic function approximation (GFA) was used to generate 2D-QSAR models (Kelkar et al., 2004;Deokar et al., 2008). GFA, developed by Rogers and Hopfinger (1994), was genetically involved in the combination of Fried machs multivariate adaptive regession splines (MARS) (Friedman, 1991) and Holland's genetic algorithem (GA) (Holland, 1975). It is a useful statistical analysis tool to correlate biological activity or property with molecular characteristic parameters, and also greatly improves the ease of successful model interpretation. The length of GFA derived equation was initially fixed to five terms including a constant suggested by Deokar et al. (2008). The population size was established as 100, the equation term was set to linear polynomial and the mutation probability was specified as 50%.
After some preliminary runs for observations, GFA crossover of 10000 and smoothing parameter "d" value of 2.0 were set to give reasonable convergence. Other settings were maintained at their default values.

Stepwise multiple linear regression (Stepwise-MLR)
The stepwise multiple linear regression (Stepwise-MLR) procedure was also employed for the model selection on account of many descriptors used in this study. The multiple linear regression method with stepwise selection calculates QSAR equations by adding one variable at a time and testing each addition for significance (Jung et al., 2007). Only variables tested to be significant are finally used in the QSAR equation. This regression method is especially useful when the number of variables is large and the key descriptors for the activity are not known. The forward regression calculation mode was selected in this study because backward regression calculation can lead to overfitting. The maximum number of steps to be run in the calculation was set at 100, which can be specified to avoid hysteresis. F value of 4.000 was to evaluate the significance of a variable when a variable is added to or deleted from the equation. If the F value of a variable falls below a specified value, the variable is removed.

GFA-derived QSAR model
The number of descriptors necessary and adequate in the GFA-derived QSAR equations was investigated in the beginning. As the conventional square correlation coefficient (r 2 ) can be easily increased by number of terms in the equation, the cross-validated r 2 (r 2 CV ) was selected as the limiting factor for the number of descriptors in the equation (Nair and Sobhia, 2008). As shown in where N (Training set) is the number of compounds in training set; LOF is Friedman's lack of fit score (Deokar et al., 2008); r 2 is the squared correlation coefficient; r 2 adj is square of adjusted correlation coefficient; F-test is the variance related static; LSE is the least square error; r is the correlation coefficient; r 2 CV is a squared correlation coefficient generated during the cross-validation procedure; Bootstrap r 2 (r 2 BS ) (Deokar et al., 2008) is the average squared correlation coefficient calculated during the validation procedure; PRESS, predicted sum of deviation squares, is the sum of overall compounds of the squared differences between the actual and the predicted values for the dependent variables; N (test set) is the number of compounds in test set; r 2 pred is the predictive power of the model.

[Insert Figure 1]
The inter-correlation of the descriptors appeared in the above best model was taken into account and the descriptors were found to be reasonably orthogonal. Main descriptor values appeared in the above 2D-QSAR model-I of training set and test set molecules are shown in Table  2.
[Insert Table 4] The full cross-validation tests and randomization tests were employed to determine reliability and significance of these generated models. The full cross-validation tests (Fan et al., 2001) encompass the entire algorithm, including both the choice of descriptors and the optimization of regression coefficients. The full cross-validated r 2 (r 2 CV ) was computed using the predicted values of the missing molecules by the models obtained from the remaining compounds in the data set.
The results based on the rules of "leave-1-out", "leave-2-out", "leave-3-out", "leave-4-out", "leave-5-out", "leave-6-out", "leave-7-out", "leave-8-out", "leave-9-out" and "leave-10-out" are shown in Table 4, indicating the results obtained were not by chance correlation. The randomization tests (Deswal and Roy, 2006;Nair and Sobhia, 2008) were performed at 90% (9 trials), 95 % (19 trials), 98 % (49 trials) and 99% (99 trials) confidence levels and carried out by repeatedly permuting the dependent variable set. The results of randomization tests in Table 5 showed that none of the permuted data sets produced the random r comparable to nonrandom r of 0.879, suggesting that the value obtained for the original GFA model was significant. The predictive power of the model-I was also evaluated by the external test set molecules. The predictive power of the model-I was calculated by r 2 pred = (SD-PRESS)/SD (Deokar et al., 2008;Deswal and Roy, 2006;Nair and Sobhia, 2008), where SD is the sum of squared deviations between the pIC 50 of each molecule and the mean pIC 50 of the molecules in the training set and PRESS is the sum of squared deviations between the predicted pIC 50 and actual pIC 50 values for every molecule in the test set. The high r 2 pred value of 0.801 for the test set accounted for good predictive ability. The developed QSAR model-I thus was robust and was found satisfactory for predicting the activities of the test set (Table 2). From Table 2, molecule 17 turned out to have high residuals because of their high activities in comparison to other compounds.
[Insert Table 5 and Table 6] According to model-I, the observed C5a receptor binding activity for these non-peptide C5a receptor antagonists are principally influenced by Atype_C_25, Atype_H_52, AlogP and logP, which is confirmed by the maximum frequent usage of these descriptors during the formation of models ( Table 6). All of these descriptors in model-I belong to thermodynamic character. LogP is the partition coefficient (Deswal and Roy, 2006), which represents the lipophilicity of molecule.
The negative slope of logP in this equation represents that activity decreases with an increase in lipophilicity of molecule, which can be obviously shown in Table 2. Thus, substituents, which increase lipophilicity of compound, should be avoided. Descriptors of Atype_C_25 and Atype_H_52 are the atom type AlogP descriptors used to characterize the hydrophobicity (logP) of molecules. The atomic contribution of individual atom types was proposed by Ghose and Crippen (1987) toward the overall hydrophobicity of molecules where carbon, hydrogen, oxygen, nitrogen, sulfur and halogens were classified into 120 atom types (Deswal and Roy, 2006;Nair and Sobhia, 2008). Hydrogen and halogens are classified by the hybridization and oxidation state of the carbon they are bonded to; carbon atoms are classified by their hybridization state and the chemical nature of their neighboring atoms. A total of 44 carbon types alone attest the complexity of the classification procedure. The positive slope of Atype_C_25 and Atype_H_52 in model-I represents that activity increases with an increase in lipophilicity related to C_25 and H_52 atom types for these molecules. The atom type C_25 (Ghose and Crippen, 1987;Nair and Sobhia, 2008) is C in :R--CR--R and Atype_H_52 is H that is unused where R represents any group linked through carbon and --represents aromatic bonds as in benzene or delocalized bonds as the N-O bond in nitro group. Hydrophobicity associated with C atom as part of the aromatic ring or N-O bonded in nitro group is favorable for C5a receptor binding activity (pIC 50 ). where F is the value of ratio between regression and residual variances (Song et al., 2008). The inter-correlation of the descriptors appeared in the above best model was taken into account and the descriptors were found to be reasonably orthogonal. Model-II contains much more significant descriptors than model-1. The high r 2 pred value of 0.888 for the test set accounted for good predictive ability. According to model-II, it can explain and predict 86.3% and 88.8% of descriptors, respectively, which can be proved in predicting the test set ( Table 2). The residuals of model-II are also much smaller than that of model-I (Table 2). Thus the binding activity (pIC 50 ) should be considered in terms of various descriptors in each molecule.

Stepwise-MLR-derived QSAR model
Compared with model-I, model-II have the same descriptors of Atype_C_25, Atype_H_52 and AlogP with positive coefficients and logP with negative coefficient. According to model-II, the C5a receptor binding activity (pIC 50 ) is also affected by the descriptors of Atype_C_8, Chiral Centers and S_dssC. The negative slope of Atype_C_8 represents that activity decreases with an increase in lipophilicity related to C_8 atom types for these molecules. The atom type C_8 (Ghose and Crippen, 1987) is C in :CHR 2 X where R represents any group linked through carbon and X represents any heteroatom (O, N, S, and halogens). Chiral Centers is the count of the number of chiral centers (R or S) present in a molecule. It is positively correlated with the binding activity, indicating the more Chiral Centers a molecule has, the high C5a receptor binding activity is.
S_dssC is a descriptor of E-state-indices and represents the atomic type of Atomic-type =C﹤ in aliphatic hydrocarbon, where S stands for the sum of the E-state values for a given atom type in a molecule, d means double bonds and s means single bond. The E-state indices (Hall and Kier, 1995) encode information about both the topological environment and the electronic interaction of an atom due to all other atoms in the molecule. Increasing presence of these features in a molecule contribute more towards binding activity.

Conclusions
On the basis of present study, it has been concluded that the described 2D-QSAR analysis contributes to the identification of important physiochemical parameters in explaining the variation in activity in a set of 43 molecules. The 2D-QSAR models derived by GFA method and Stepwise-MLR method have moderate internal and external predictivity, as shown by the values of r 2 CV of 0.752 and 0.775, and r 2 pred of 0.801 and 0.888, respectively, highlighting the importance of hydrophobicity of molecules. The statistical significance and robustness of the model has been confirmed by the full cross-validation tests and the randomization tests. Hence the model can be useful in the optimization of activity in this class of molecules, leading to further designing more novel orally available non-peptide C5a receptor antagonists.