Graphlet signature-based scoring method to estimate protein–ligand binding affinity

Over the years, various computational methodologies have been developed to understand and quantify receptor–ligand interactions. Protein–ligand interactions can also be explained in the form of a network and its properties. The ligand binding at the protein-active site is stabilized by formation of new interactions like hydrogen bond, hydrophobic and ionic. These non-covalent interactions when considered as links cause non-isomorphic sub-graphs in the residue interaction network. This study aims to investigate the relationship between these induced sub-graphs and ligand activity. Graphlet signature-based analysis of networks has been applied in various biological problems; the focus of this work is to analyse protein–ligand interactions in terms of neighbourhood connectivity and to develop a method in which the information from residue interaction networks, i.e. graphlet signatures, can be applied to quantify ligand affinity. A scoring method was developed, which depicts the variability in signatures adopted by different amino acids during inhibitor binding, and was termed as GSUS (graphlet signature uniqueness score). The score is specific for every individual inhibitor. Two well-known drug targets, COX-2 and CA-II and their inhibitors, were considered to assess the method. Residue interaction networks of COX-2 and CA-II with their respective inhibitors were used. Only hydrogen bond network was considered to calculate GSUS and quantify protein–ligand interaction in terms of graphlet signatures. The correlation of the GSUS with pIC50 was consistent in both proteins and better in comparison to the Autodock results. The GSUS scoring method was better in activity prediction of molecules with similar structure and diverse activity and vice versa. This study can be a major platform in developing approaches that can be used alone or together with existing methods to predict ligand affinity from protein–ligand complexes.


Summary
Over the years, various computational methodologies have been developed to understand and quantify receptor-ligand interactions. Protein-ligand interactions can also be explained in the form of a network and its properties. The ligand binding at the protein-active site is stabilized by formation of new interactions like hydrogen bond, hydrophobic and ionic. These non-covalent interactions when considered as links cause non-isomorphic subgraphs in the residue interaction network. This study aims to investigate the relationship between these induced sub-graphs and ligand activity. Graphlet signature-based analysis of networks has been applied in various biological problems; the focus of this work is to analyse protein-ligand interactions in terms of neighbourhood connectivity and to develop a method in which the information from residue interaction networks, i.e. graphlet signatures, can be applied to quantify ligand affinity. A scoring method was developed, which depicts the variability in signatures adopted by different amino acids during inhibitor binding, and was termed as GSUS (graphlet signature uniqueness score). The score is specific for every individual inhibitor. Two wellknown drug targets, COX-2 and CA-II and their inhibitors, were considered to assess the method. Residue interaction networks of COX-2 and CA-II with their respective inhibitors were used. Only hydrogen bond network was considered to calculate GSUS and quantify protein-ligand interaction in terms of graphlet signatures. The correlation of the GSUS with pIC 50 was consistent in both proteins and better in comparison to the Autodock results. The GSUS scoring method was better in activity prediction of molecules with similar structure and diverse activity and vice versa. This study can be a major platform in developing approaches that can be used alone or together with existing methods to predict ligand affinity from protein-ligand complexes.
2014 The Authors. Published by the Royal Society under the terms of the Creative Commons Attribution License http://creativecommons.org/licenses/by/4.0/, which permits unrestricted use, provided the original author and source are credited.

Introduction
Understanding and quantifying receptor-ligand interactions forms the core of computer-aided drug discovery methods [1]. One such method, molecular docking, has been widely used owing to its high speed and performance. The approximations in the scoring methods are one of the limitations of these docking programs. Recently, to improve the performance of virtual screening experiments, approaches like free energy perturbation methods, pharmacophore modelling, post docking consensus scoring/tuned scoring, etc. are used in combination with docking methods [2]. There is a need for development of more such methods that can improve the authenticity of virtual screening findings when used alone or together with the existing methods.
System biology is an emerging discipline used to analyse and interpret various kinds of biological networks [3]. Various graph theory methods, especially residue interaction networks, have been extensively used to analyse protein structure and dynamics [4]. In the residue interaction network, amino acids (AAs) are represented as nodes and edges represent the interactions among them. Reports suggest that the protein-ligand interactions can also be explained in the form of network. The impact of a ligand binding on protein network can be measured in terms of its network properties and it has substantial effects on the closeness centrality of network [5].
The purpose of this study was to analyse the changes in local connectivity of each node (active site residues) present in protein-active site after ligand binding in the residue interaction networks and to relate these changes to compound activity. Each node present in protein-active site has its own neighbourhood and local connectivity. Binding of ligand/substrate with active site residues is mediated by the creation of new interactions. These new interactions will change the local connectivity and neighbourhood of the active site residues and induce non-isomorphic sub-graphs in the proteinactive site with respect to an active site residue. Small connected non-isomorphic induced sub-graphs of large network are termed as graphlets [6]. Graphlet signature-based analysis of biological networks has been successfully applied extensively in various biological problems [7,8]. The importance of graphlet signatures in networks led to the development of combinatorial approaches for graphlet counting [9]. This study focuses on the identification of induced sub-graphs in the protein-active site after ligand binding employing graphlet signature-based analysis of residue interaction networks and their application to estimate binding affinity of various ligands.

Material and methods
The aim of the present study was to evaluate the efficiency of graphlet signatures in inhibitor activity prediction. For this study two well-known drug targets, COX-2 and CA-II, were considered. Thirty inhibitors for each of the targets were chosen randomly. The crystal structures of CA-II complexed with various inhibitors have been obtained from the Protein Data Bank (PDB) [10]. Similarly, COX-2 crystal structure co-complexed with Celecoxib was selected for the studies [11].
For the inhibitors which are not co-crystallized with these enzymes, docking procedure was employed to identify the favourable conformations at the active site. Structures in PDB, 2POU and 3LN1 were employed for docking studies with CA-II and COX-2, respectively. Prior to docking, all the potential ligands were prepared in DG-AMMOS [12] using AMMP force field sp4. Autodock program was used for docking [13]. During the docking process, maximum number of conformer generation was set to 100 and other parameters were set to default values.
In this study, residue interaction networks for COX-2 (3LN1) and CA-II (2POU) [11,14] were obtained from RING server [15]. Furthermore, the networks were visualized and analysed in Cytoscape [16]. The hydrogen bond pattern in protein structure was identified using the HB explore tool in RING server [17] and was analysed in RINalyzer [18]. Further, graphlet counter was used to examine the signature patterns made by different inhibitors with the active-site residues [19] (figure 1). Ligand binding at the protein-active site is stabilized by the formation of non-covalent interactions. Hydrogen bond interactions play a major role in proteins [20]. The residues present in protein-active site have hydrogen bond connectivity with their neighbours and create a local hydrogen bond network. Ligand binding leads to change in the graphlet signatures of the residues at the active site. In this study, only the hydrogen bond network of the protein was considered and the effects of ligand binding on the signatures at the active site were studied. The signature analysis method that provides an opportunity to analyse the residues that are not in direct contact with inhibitors and are in secondary shell of active site is also considered in this approach.  The graphlet signatures corresponding to every AA in the active site were analysed before and after ligand binding. Unique signatures, i.e. the signatures that are formed at the active site only after ligand binding were identified. These are the signatures which exist only after ligand binding and are nonexistent in apoprotein structure.
A pool of all the unique signatures formed by the 30 inhibitors with respect to the residues present in protein-active site was created. Furthermore, the information from these graphlet signatures was employed to quantify affinity of every inhibitor in the dataset. Uniqueness score for each inhibitor was measured in terms of variability in signatures adopted by different AAs during inhibitor binding. The uniqueness at these signatures was quantified as follows: where GSUS = graphlet signature uniqueness score for ligand j; , X i * k represent the total number of signature in absence of ligand with respect to ith AA, X ijk represent the total number of signature in presence of ligand with respect to ith AA in particular orbit k, and S ij represents the number of unique signatures made by ith AA with inhibitor j; S t = total number of unique signatures made by all the ligands with all AAs (signature pool); S i = L j=1 S ij and it represents the total number of unique signatures made by ith AA with all the ligands; L i = L j=1 min(S ij , 1) and it represents the number of ligands forming unique signatures with ith AA; and L = total number of ligands used in the dataset.
The correlation between the biological activity and predicted activity (GSUS and Autodock score with default settings) for the compounds in the dataset was performed using Pearson's correlation coefficient. The pair-wise diversity of the compound dataset was measured. The dependency of the method on diversity was illustrated to check if the methods hold good for various scaffolds. The similarity between each pair of molecules was measured using Tanimoto coefficient in OpenBabel [21].

Results and discussion
Studies on the application of network properties to differentiate ligand selectivity have been reported [22]. Graphlet signature methods have been successfully applied to identify functional residues in protein-active site and ligand-binding site predictions [23,24]. This study focuses on the knowledge of induced sub-graphs in protein-active site and their application to estimate and quantify protein-ligand interactions is novel and promising.

Studies on COX-2
RING server was used to generate the residue interaction network of COX-2. All the AAs within 10 Å radius from the centre of the active site were selected (electronic supplementary material, table S1). From the hydrogen bond interaction networks formed, we analysed signatures with varying orbits present at each selected residue. Collectively, these were termed as native signatures of active site as they are present in the native structure of protein, i.e. non-inhibitor bound form. To analyse the hydrogen bond interactions between inhibitor and enzyme, the complexes of 30 inhibitors with COX-2 were used. The respective hydrogen bonds formed by each of the inhibitors at the active site were identified (electronic supplementary material, table S2). Prior to the graphlet signature analysis, the individual inhibitors were added to the protein-hydrogen bond network manually and the contacts were created between the inhibitor and the respective hydrogen bond forming AAs. The graphlet signatures of each individual inhibitor with the active site AAs were studied and unique signatures, i.e. the new signature formed from non-existence at each of the AAs, were identified.
In some cases, it was observed that the same AA was involved in signatures with different orbits more than once. Each of such situations was counted as unique signatures. All the unique signatures formed with each of the selected AAs by every inhibitor were identified as listed in electronic supplementary material, table S3. The total number of unique signatures for all the inhibitors with all the AAs was found to be 761. This collection was termed as signature pool, S t . All the quantified features were further applied in the calculation of GSUS for each inhibitor using equation             but their activity against COX-2 is almost the same. GSUS method was more accurate in the activity prediction of these molecules and the results show clearly that GSUS is more efficient in differentiating similar structure molecules with varied activity and diverse structure molecules with similar activity.

Studies on CA-II
The unique signature selection was performed using the same procedure as we used in COX-2. The total number of unique signatures was found to be 1201 collectively for all the inhibitors (electronic supplementary material, table S4). All the quantified features were further applied in the calculation of GSUS for each inhibitor using equation (3.1) and it was observed that topiramate had the highest GSUS of 0.3, and lowest value was 0.002 for 2-hydroxy-3-methylbenzoic acid (table 2). Correlation coefficient has been calculated for pIC 50 value and GSUS. Correlation coefficient was 0.40 for all the 30 molecules and was significant at the 0.05 level (two tailed). In the dataset of CA-II inhibitors considered, three pairs of inhibitors, 2-aminobenzenesulfonamide/2-hydrazinylbenzenesulfonamide, 2-hydroxy-3-methylbenzoic acid/4-amino-2-hydroxybenzoic acid and 4-amino-6-chlorobenzene-1,3disulfonamide/dichlorophenamide, showed high structural similarity of Tanimoto coefficient greater than 0.7 with activity difference of twofold, sixfold and twofold, respectively (electronic supplementary material, table S7). The performance of GSUS was better in distinguishing the activities of these molecules with similar structure. The performance of GSUS method in CA-II was consistent with that in COX-2, unlike Autodock which showed great difference in correlation coefficient in both of these enzymes. The results in this study hint that GSUS method is promising, and it can be further improved into a more applicable and reliable method. Its performance in distinguishing activities of structurally related and unrelated compounds shows that it can be a part of virtual screening experiments employing multi-layered screening methods.