Supplementary Information Supplementary Figures

Supplementary Figure 1: Quantile scores computed from RRINs for caspase-1 and CheY at different cutoff radii. Surface mapping of the residue quantile scores p R of caspase-1 and CheY for RRINs generated with radii cutoffs from 6 ˚ A and 10Å. The active-site ligand is shown in green sticks and the allosteric site is circled. The allosteric site in caspase-1 is not identified for 6, 7, and 8 ˚ A. It is identified at 10Å, but the signal is weaker than when using an atomistic graph. In contrast, for CheY the allosteric site is identified as significant across the full range of cutoffs .


Supplementary Tables
Supplementary Table 1: Propensities computed from RRINs.Values of p R,allo − p R,site surr for residue-residue interaction networks with four cut-off radii from 6 Å-10 Å.The propensity scores are shown in bold if they are greater than 0, and starred if they lie above the 95% confidence interval computed by a bootstrap with 10000 resamples.The comparable statistic computed from the all-atom network is also presented, as well as the summary of the four bond statistics for each protein from Supplementary Table 4 Supplementary Note 1

Propensities from residue-residue interaction networks
The computational efficiency of allows us to analyse all-atom networks without many of the restrictions on system size inherent to other methods.Proteins or protein complexes of hundreds of thousands of atoms can be analysed in a few minutes on a standard desktop.We can thus keep atomistic detail at the single bond level without restricting the scope of the analysis.Hence there is a less acute need to seek computational savings by obtaining coarse-grained representations of proteins at the level of residue interactions.However, it is still instructive to consider propensity measures computed from residue-level networks (RRINs) [1].We have undertaken this comparison for all 20 proteins in our test set and report the results below.
As discussed in the main text (Sections IIA and IIB), in some cases (e.g., caspase-1, Fig 1b ) we found that the additional information contained in the atomistic network leads to increased signal in the detection of the allosteric site, whereas in other cases (e.g., CheY), RRINs already capture well the site connectivity that reveals the presence of the allosteric site.Our analysis of the full test set (Supplementary Table 1) confirms that the results from RRINs depend on the protein analysed, and also vary substantially depending on the choice of the cut-off distance (a tunable parameter which needs to be chosen when generating the coarse-grained RRINs).
The coarse-grained RRINs for each of the 20 proteins in the test set were obtained by submitting the corresponding PDB files to the oGNM server [2].We obtained RRINs at four different cut-off radii: 6 Å, 7 Å, 8 Å and 10 Å.The cut-off radius is a tunable parameter necessary to generate a RRIN from PDB files, which establishes how close two residues must be in order to be connected in the RRIN.A range of different cut-off radii has been used throughout the literature.However, the usual radius is around 6.7-7.0Å, which corresponds to the first coordination shell [3].
Supplementary Table 1 shows the propensity score of the allosteric site p R,allo − p R,site surr , computed from RRINs obtained at four cut-offs (between 6 Å and 10 Å) for the 20 proteins in the allosteric test set.For comparison purposes, we also report the same score obtained from the all-atom network.It is important to note that this is just one of four scores obtained from the all-atom network, reflecting only the averaged behaviour over the residues.This score is complemented by the three other bond-based statistics, which can pick up inhomogeneities in the propensities of the bonds in the allosteric site, as given by the All-atom Summary column carried over from Supplementary Table 4.
Our results indicate broad consistency between RRINs and the all-atom network.However, the RRIN results vary widely depending on the choice of cut-off radius in the generation of the network.Moreover this variability with respect to the cut-off behaves differently for each of the proteins.As an illustration, the allosteric site of caspase-1 (2HBQ) was not found to be significant in the RRINs with cut-off radii of 6 Å, 7 Å and 8 Å, and only weakly significant for 10 Å, whereas 1LTH and 2BRG are both only detected in RRINS with cut-off radius of 6 Å but not for larger radii.Our results are consistent with previous studies that found that allosteric pathway identification in RRINs is dependent on the chosen cut-off [4].For the different cut-offs, the number of proteins with p R,allo > p R,rest varies between 11/20 (at 7, 8, and 10 Å) and 13/20 (at 6 Å), and only 8/20 proteins have p R,allo > p R,rest for the RRINS at all the cut-off radii.This is compared to 15/20 proteins for the atomistic network.
Even when the allosteric site is detected in the RRIN, the signal when using the atomistic network is considerably higher in a number of proteins (e.g., 1V4S, 1YP3, 7GPB, 1I2D, 2HBQ).In other cases (e.g., 1EYI, 4PFK), the RRIN directly loses the detectability of the allosteric site even if the cut-off is adjusted.This observation suggests that these are proteins where the specific chemistry of intra-protein bonds is important for the allosteric communication.
On the other hand, there are several other cases (e.g., 3ORZ, 1D09, 1HOT, 1PTY, 1LTH) where the RRIN can provide similar results to the atomistic network, yet still with some variability depending on the choice of appropriate cut-off.Interestingly, there are also some proteins (specifically 1F4V, 1YBA, 3K8Y and 2BRG) in which the propensity score is higher for RRINs than for the atomistic network.In these cases, there tends to be a large heterogeneity in the propensities of the bonds in the allosteric site (see Figure 7 in the main text) with some bonds with large negative values as well as other bonds with large positive values.Our bond statistical measures can account for some of this variability.Indeed, both 1F4V and 1YBA are detected by all our four bond measures, and 3K8Y is picked by the measure based on the distributions of p b .Intriguingly, only 2BRG (corresponding to CHK1) cannot be detected by our bond measures.This suggests other areas of future research, in which the importance of averaging at the level of pathways could be used to enrich the findings presented here.bonds are defined as any weak interactions formed an allosteric residue.Full details proteins and allosteric site residues are shown in Supplementary Table 3.

Summary of results on the allosteric test set
As explained in the main text (Section IIE and Materials and Methods, Section IVD), for each of the 20 proteins in the test set, we analyse the propensities of all bonds with respect to the active site of the bound structure, using the ligands shown in Fig. 5 as the source for the bond-to-bond propensity calculations.For each protein, we obtain the propensity Π b of every weak bond and its associated quantile score (p b ).To establish their statistical significance, the bond quantile scores p b (and residue averaged quantile scores p R ) of the allosteric site are compared against an ensemble of randomly generated surrogate sites from each protein.The ensemble of surrogate sites is constructed at random by picking sites that satisfy two structural constraints: (i) they have the same number of residues as the allosteric site; and (ii) their diameter (the maximum distance between any two atoms in the site) is no larger than that of the allosteric site.The sites are generated using Algorithm 1 with pseudocode given below.
Algorithm 1 Pseudocode for surrogate site sampling 1: site ← ∅ 2: while # residues in site < # residues in allosteric site do The propensities averaged over the ensemble of surrogate sites are then used for statistical comparison with the allosteric site.We also obtain absolute propensity scores for each bond (p ref b ) by comparing against the reference SCOP ensemble of 100 proteins.These quantities are defined in the main text (Materials and Methods, Section IVD).Using all these scores we obtain our four statistical measures of significance summarised in Supplementary Table 4.These numerical results are presented also in the form of a graph in Figure 7 of the main text.

Construction of the atomistic protein network
As discussed in Materials and methods (Section IVE), the protein network is constructed by assigning edges between atoms which interact covalently and non-covalently.Each edge is weighted by the strength of the interaction.Covalent bond strengths are obtained from tables assuming standard bond lengths.We include three types of non-covalent interactions: hydrophobic interactions, hydrogen bonds, and electrostatic interactions.The assignment of bonds in the graph follows from the well established FIRST framework [7,8].More in detail: • Covalent bonds: Covalent bonds are weighted according to standard bond dissociation energies given in Ref. [9].
• Hydrophobic tethers: Hydrophobic tethers are assigned between C-C or C-S pairs based on proximity: two atoms have a hydrophobic tether if their Van der Waals' radii are within 2 Å.The hydrophobic tethers are identified using FIRST [10], which does not assign them an energy, and the energy is then determined based on the doublewell potential of mean force introduced by Lin et al [11], which gives an energy of ≈ -0.8kcal/mol for atoms within 2 Å.
• Hydrogen bonds: The energies of hydrogen bonds were calculated using the same formula used by the program FIRST [10] and is based on the potential introduced by Mayo et al [12].
• Electrostatic interactions: Important electrostatic interactions between ions and ligands, as defined in the LINK entries of the PDB file, are added with energies derived from a Coulomb potential where q 1 and q 2 are the atom charges, r is the distance between them, and is the dielectric constant, which is set to = 4 as in Ref. [13].Atom charges for standard residues are obtained from the OPLS-AA force field [14], whereas charges for ligands and non-standard residues are found using the PRODRG web-server [15].
An extended discussion of the construction of the atomistic graph can be found in Refs.[16,17,18]

Table 2 :
. Details of X-ray structures of CheY analysed.The conformations correspond to different stages of activation.

Table 3 :
Proteins in the allosteric test set.The active site and allosteric site bound structures for each of the 20 test set proteins.If the protein is allosterically activated then the PDB ID for both states will be the same.The ligand identifier is that used in the PDB file.Exceptions to this are CheY and caspase-1.As the ligand in these proteins is a peptide, the name and chain ID of the peptide is given instead.

Table 4 :
Allosteric site quantile scores in test set proteins.

Table 5 :
Robustness of propensity scores to additive randomness.Mean (± standard deviation) of propensity scores p R,allo − p R,site surr computed from randomisations of the protein networks of the allosteric test set obtained by adding Gaussian noise to the edge weights (bond energies).The noise level varies between 1kT and 4kT (corresponding to the standard deviation of the added Gaussian) and at each noise level the results were calculated from 10 randomised graphs.The difference between the allosteric site average quantile score and the average surrogate site score for both residues and bonds are shown in bold if they are greater than 0, and starred if they lie above the 95% confidence interval computed by a bootstrap with 10000 resamples.The unperturbed result is also shown for comparison.

Table 6 :
Robustness of propensity scores to deletion of weak bonds.The propensity score p R,allo − p R,site surr for networks obtained by deleting all bonds below two energy thresholds.The results are shown in bold when they are greater than 0 and starred if they lie above the 95% confidence interval computed by a bootstrap with 10000 resamples.The unperturbed score is reported also for comparison.