Calculating Ensemble Averaged Descriptions of Protein Rigidity without Sampling

Previous works have demonstrated that protein rigidity is related to thermodynamic stability, especially under conditions that favor formation of native structure. Mechanical network rigidity properties of a single conformation are efficiently calculated using the integer body-bar Pebble Game (PG) algorithm. However, thermodynamic properties require averaging over many samples from the ensemble of accessible conformations to accurately account for fluctuations in network topology. We have developed a mean field Virtual Pebble Game (VPG) that represents the ensemble of networks by a single effective network. That is, all possible number of distance constraints (or bars) that can form between a pair of rigid bodies is replaced by the average number. The resulting effective network is viewed as having weighted edges, where the weight of an edge quantifies its capacity to absorb degrees of freedom. The VPG is interpreted as a flow problem on this effective network, which eliminates the need to sample. Across a nonredundant dataset of 272 protein structures, we apply the VPG to proteins for the first time. Our results show numerically and visually that the rigidity characterizations of the VPG accurately reflect the ensemble averaged properties. This result positions the VPG as an efficient alternative to understand the mechanical role that chemical interactions play in maintaining protein stability.


Introduction
The set of accessible conformations of a protein is critically dependent upon the arrangement and strength of chemical interactions, which greatly influences the conformational entropy. While there are many different computational models available to characterize protein dynamics [1][2][3][4], the computational efficiency and relative accuracy have made network rigidity models particularly attractive [5]. Describing protein structure as a network of constraints that fix the distance between atoms (vertices), the salient feature of network rigidity is to carefully characterize the number of degrees of freedom (DOF) within the network. The number of accessible DOF is generally reduced as chemical interactions are added to the network, which is related to the reduction in phase space upon formation of the interaction. In particular, adding a distance constraint to a flexible region reduces the number of available DOF, while adding one to an already rigid region does not.
While there are a number of graph theoretic algorithms to calculate network rigidity [6][7][8][9], the efficient Pebble Game (PG) has emerged as the most popular way to account for protein flexibility [5]. Indeed, various methods based on PG have been developed to analyze network rigidity [10][11][12], where FIRST [5] has served as the starting point for methods that explore the native conformational dynamics, such as ROCK [13,14] and FRODA [15,16]. Using pebbles to refer to DOF, network rigidity properties of the complete network are quickly calculated based on a strict accountancy of pebbles. Once complete, the PG identifies all flexible/rigid regions within the network. Unfortunately, FIRST, ROCK and FRODA are limited by an athermal formulation, meaning fluctuations within the noncovalent interaction network are not modeled.
Within molecular networks [17], covalent bonds are modeled as quenched constraints (meaning they are ever-present), whereas noncovalent bonds fluctuate on and off. The intermittent nature of the noncovalent interactions reflecting protein dynamics further complicates calculation of average network properties. In this direction, we have developed a statistical mechanical Distance Constraint Model (DCM) [18,19] that is based on a Gibbs ensemble of PG networks that uses network rigidity to account for enthalpy-entropy compensation [20,21]. The result of the DCM approach is that the give and take between protein stability and flexibility is accurately quantified [22][23][24][25][26][27]. In all works to date, solving of the DCM for protein structures has required average network rigidity properties determined from Monte Carlo sampling across a large sample of network topologies. In this approach, there is a binary on/off designation based on the probability of a constraint to be present or not. This randomness leads to an astronomically large ensemble of networks consisting of 2 N possibilities for N constraints that are fluctuating on or off throughout the network. Typically N will range between a few hundred to several thousand in applications. Monte Carlo sampling works markedly well because of self averaging properties of constraint networks. It has been found that for statistical error bars to be within acceptable limits, millions of networks are usually necessary to be sampled [19].
Because there can be more than one distance constraint placed between a pair of rigid bodies, the body-bar PG [28] represents the network as a multi-graph, where more than one edge can connect between a pair of vertices. That is, each vertex represents a rigid body, having 6 DOF, and each edge represents a distance constraint. Herein, the framework of the body-bar PG algorithm is generalized, and, interestingly, requires only a minor modification in a way that preserves essentially the same implementation. That is, we have developed a Virtual Pebble Game (VPG) that allows for probabilistic descriptions of the network. The network now has only one edge between a pair of vertices with an assigned weight that defines the capacity for it to absorb DOF. This capacity is given by the average number of constraints that can form between a pair of vertices, thus it needs not be an integer value. The VPG extends the counting of constraints and DOF to real numbers, allowing for fractional DOF, which are viewed as representing the probability to find a DOF. Through this generalization of the PG implementation, the VPG algorithm tracks probability flow that governs where the average number of DOF pool within the effective network, rather than track individual DOF that fluctuate about this average. This approach leads to a dramatic computational speed-up because the PG algorithm dictates sampling over many networks to calculate equilibrium properties, whereas the VPG can probabilistically determine them from a single calculation. The approach of the VPG to calculate ensemble properties without sampling is in the same spirit as other algorithms that tackle important computational biology problems with a very large search space that otherwise would require excessive computation time [29][30][31][32].
In a recent report [33], we have demonstrated that the VPG closely reproduces the ensemble averaged counting of DOF within a variety of disordered lattices. The ensemble averaged results over many PG runs is designated PG . In this report, key average or consensus network rigidity metrics are directly compared across a non-redundant data set of 272 protein structures. For example, identified rigid clusters represent groups of vertices that behave as a single body. Numerically and visually we show that the VPG rigidity calculations faithfully represent an overwhelming majority of the ones performed by the PG. Varying the number of interactions present in the network allows us to identify the rigidifying effect that they have on protein structure [11]. Through a continuous increase in number of hydrogen bond (H-bond) constraints placed in the protein, a rigidity percolation is defined where the network progressively becomes more rigid. The rigidity threshold [34] defines the point where the protein just transitions from being globally flexible to globally rigid, or vice versa. At this rigidity threshold, the greatest fluctuation in network topology occurs, leading to the greatest differences between the PG and VPG quantities. Remarkably, at the rigidity threshold, the similarity in all network rigidity metrics that were calculated using PG and VPG is found to be quantitatively high. As we demonstrate below, the VPG is ideally positioned as a viable alternative to ensemble averaging in the characterization of protein rigidity.

Protein Structure Description
We consider a dataset composed of 272 protein structures that are nonredundant at the SCOP [35] family level. Our dataset includes one, two and three domain proteins for PDB codes (see Table 1), that range from 50 to 764 residues. We focus on three types of chemical interactions, which are: intra-residue, linker and hydrogen bond. Note that salt-bridges are considered a special type of H-bond as described previously [5]. The intra-residue interaction models the covalent bonds that exist within a residue. The linker interaction represents the peptide bond that connects the C-N terminal atoms in adjacent residues. The reason we make a distinction between these two types of covalent bonds is due to the number of DOF they consume. While an intra-residue covalent bond (and disulfide bonds if any) consumes five DOF (leaving one for the dihedral angle), the linker consumes six DOF (locking any possible rotation) due to the partial double bond character of the amide group. The last interaction is the H-bond, which we specifically control whether a H-bond is present or not by the parameter 0:0ƒP nat ƒ1:0. In this fashion, all possible H-bonds Table 1. PDB codes of the proteins in the dataset. within the structure are present when P nat~1 :0, whereas no crosslinking H-bonds exist when P nat~0 :0. An independent H-bond consumes three DOF in order to account for the distance and angular constraints it imposes. A constraint topology file (CTF) contains a list of all the possible interactions that are to be considered within a specific protein structure. It is constructed from the original PDB file. The CTF defines each interaction type, as well as their probabilities. Quenched covalent interactions never change from one CTF to another, whereas the probability for a H-bond to form is described by the variable P nat . Fig. 1 compares the PG and VPG descriptions of a toy network with eight nodes, where quenched covalent bonds are solid and H-bonds are dashed. Two possible H-bonds exist in this example, leading to an ensemble of 2 2 PG networks. Within each realization, the H-bond is either fully present with probability P nat or not with probability (1{P nat ). The PG properties are determined by averaging over the ensemble generated by Monte Carlo sampling. Conversely, the VPG requires only one probabilistic network to describe the ensemble because the presence of a H-bond is directly quantified by its probability, P nat , to be present.

The VPG Algorithm
The three main elements of the PG algorithm are pebbles, vertices and distance constraints, which respectively represent DOF, rigid bodies and intramolecular interactions [28]. Note that the justification for the mapping between atoms and rigid bodies, and switching the PG applicable on a bar-joint network to a bodybar network are thoroughly explained in prior works [10,28]. When the body-bar pebble game initiates, all vertices are unconnected and each is ascribed six free pebbles that describe its position and orientation in 3D. When an independent interaction is placed into the network, six trivial DOF are fixed on either one of its incident vertices while the number of distance constraints modeling it are consumed. In the language of the PG algorithm, distance constraints are recursively added to the network, and free pebbles cover the new constraints if they are independent, accounting for DOF removal from the system. Pebbles are not always locally available, but can often be transfered from remote regions of the network. That is, network rigidity is a long-range interaction that can propagate across the network [36,37]. This pebble search function is possible given that pebbles provide directionality in the network, dependent upon which vertex has provided them. A constraint is redundant when a pebble cannot be transferred to cover it.
The search for pebbles in the directed network resembles a network flow problem [38,39], given that the covered capacity of any edge will determine the maximal flow of pebbles through that edge. In a recursive fashion one edge at a time is placed in the network, always following the described process. This way the PG accomplishes its main goal, determining if an edge is independent (fully covered) or redundant (partially or none covered). When a search for pebbles fails and consequently a redundant constraint is found, all the vertices that were involved in the search collapse into a single vertex (with its six trivial DOF), and defines a minimally rigid graph [40], which we loosely refer to as a Laman subgraph [41] because of the analogous concept in two dimensions. This fact allows the PG to run virtually in linear time with the number of vertices in protein-like networks.
The crucial difference between the VPG and the original PG algorithm is the assignment of pebble capacity to edges, and to handle fractional pebbles. The capacity of an interaction represents the maximum number of DOF that it can consume; for linker and intra-residue interactions the capacity is six and five DOF, respectively. The capacity of a H-bond is defined as the product of its probability, P nat , and the number of distance constraints used to model it, which is three. Therefore, consider a network consisting of vertices fv n g,n~1,2, . . . N, with a list of edges fe m g,m~1,2, . . . M. The capacity for the m-th edge is denoted by c m . The VPG follows the following procedures and operations: 1. Initialize the graph with a set of isolated vertices fv n g, with the free DOF of each vertex v i being 6. 2. From the list of edges fe m g, insert edge e k with capacity c k into the graph. Let v i and v j be the two incident vertices for edge e k . 3. Collect 6 pebbles for vertex v i by doing a breadth first search. 4. Flag vertex v i as visited, try to collect c k pebbles for vertex v j by doing a breadth first search while holding the 6 pebbles on v i in place. If not all c k pebbles can be found in one trial, continue to collect more pebbles by carrying out the search repetitively until there are enough free pebbles on v j to cover edge e k , or if no new pebbles are found (a failed search). 5. If c k or more pebbles are collected on vertex v j , cover edge e k with c k pebbles. Otherwise, all the visited vertices within the failed search are condensed into a single vertex. If fe m g is not empty, go to step 2. 6. End of VPG.

Rigid Cluster Decomposition
After having placed all the constraints, the PG and VPG algorithms determine the number of DOF left in the network. The trivial case is when there are just six DOF remaining, indicating that the network is globally rigid and all vertices are contained in a single rigid cluster. When there are greater than six remaining DOF, pebble location identifies which regions of the protein network are flexible or rigid. Excess pebbles identify flexible regions, whereas rigid regions occur when no free pebbles beyond 6 are accessible. From this information, it is possible to apply a Rigid Cluster Decomposition (RCD) to localize groups of vertices that move together as a rigid body. A rigid cluster is a subgraph with all of its vertices completely rigid among themselves.
The process of finding rigid clusters in the VPG proceeds as follows: for any pair of vertices add a hypothetical edge, then try to cover it with ew0 DOF, while six DOF are fixed on one of the incident vertices. If an excess number of DOF is found, then both vertices do not belong to the same rigid cluster, otherwise a failed search is declared and all the vertices involved in the search are part of the rigid cluster. Fig. 2 presents two example RCD cases. Notice that all the edges have been covered and they have different capacities. In the first case (Fig. 2a), there is a total of 7.4 available DOF. Therefore for any pair of vertices in the network, it will always be possible to gather 6+e DOF with ew0 representing excess DOF, which indicates all constraints are independent and the network is globally flexible. For the second case (Fig. 2b), the number of available DOF is exactly six (on vertex four). Therefore, no excess DOF (i.e. e~0) will be found under any circumstance, and this condition indicates that a rigid cluster is present that includes all five vertices.
It is worth emphasizing that the point of these examples is to show that the data structure for the VPG is essentially the same as the PG. On the other hand, the edge pebble capacities are not shown in Fig. 2 for either example. Yet, it can be surmised that the capacities of each edge in example (a) is precisely equal to the numbers assigned to the edges (i.e. 5, 5 and 0.6). If this were not the case, assuming one of these edges had a greater capacity, then some or all of the excess 1.4 DOF that is currently remaining would be used to cover the edges. Conversely, because there are no excess DOF in example (b) it is clear that either all the edges are being covered at their maximum capacity, or, their capacity is larger than the sum of the number of pebbles that cover the edge (on both sides must be added). If the capacity of an edge is larger than the total amount of pebbles covering it, then this would indicate that the edge is redundant. If this were the case, many vertices could be collapsed into a single vertex as explained above in regards to failed pebble searches, and creation of Laman subgraphs.

Network Similarity Metrics
We employ two distinct metrics to compare the networks identified by the VPG and PG algorithms. To quantify rigid cluster similarity, we employ the Rand Measure (RM) [42]. The RM is a very well suited metric to compare clustering within a network. In the case of rigid cluster decomposition, both the bodybar PG and its VPG counterpart assign a unique label to each vertex to indicate the cluster it belongs to within the network. The network will generally consist of many rigid clusters. The RM is a combinatory count of all possible pairs of vertices where it counts all the cluster composition coincidences between the two networks generated by the two approaches. If both networks have the exact same rigid cluster decomposition, then RM is equal to 1. In general the RM has a range between 0 to 1. Zero is only possible if one network consist of all vertices within one cluster, while in the other network all vertices are in separate clusters (each vertex has its own unique label).
For a specific pair of vertices, there are two cases in which a match is found between the two networks. In the first case, the two vertices in network 1 belong to the same cluster (they have the same cluster label) and likewise, in network 2 both vertices belong to the same cluster. In the second case, the two vertices have different labels in network 1, indicating they belong to two different clusters, and likewise, in network 2 the two vertices belong to two different clusters. On the other hand, a match is not found if in one network the two vertices belong to the same cluster, while in the other network they belong to two different clusters.
The RM is calculated by the total number of matches divided by the total number of possible pairs. A RM greater than 3 4 is a strong indicator of good agreement. A formal definition as given in [42] is: given N points, X 1 ,X 2 ,:::,X N , and two clusterings of them Y~fY 1 ,:::,Y k1 g and Y '~fY ' 1 ,:::,Y k2 g there is defined We also compare the rigidity assessment between the majority vote from PG to the VPG assignment for each non-linker torsion bond in a protein. This provides a very sensitive metric to assess how well the VPG reflects the consensus results from a large sampling of PG runs. That is, we count the number of times that both approaches agree in their rigid versus flexible assessment, normalized by the total number of comparisons. This calculation leads to an agreement measure (AM) that ranges from 21 to 1. When the rigidity estimate from VPG matches the majority vote among all PG realization (i.e., a rotatable torsion by consensus in PG corresponds to a flexible torsion in VPG, whereas a locked torsion by consensus in PG corresponds to a rigid torsion in VPG), the AM equals 0. When the VPG fails to match the majority of PG designations, the AM varies towards +1 (21 = flexible and +1 = rigid). The variance from 0 indicates the proportion of disagreement. To calculate the AM index for the n-th torsion, we implement the following algorithm defined as: if (VPG deemed as rigid the n{th torsion) where N wrong is the count of times that PG disagreed with VPG, N agree is the count of times that PG matched the VPG, and N total~Nagree zN wrong is the total number of realizations for the PG. For instance, if a particular torsion has a value of AM~{1, it indicates that the VPG assesses the torsion to be rigid, whereas the PG indicates the opposite (flexible) in all of the realizations.
When AM~0, this is considered perfect agreement, and when 0vjAMjv0:1, there is disagreement between the consensus PG vote and the VPG prediction, but the minority and majority votes from PG are very close, where the difference is comparable to the intrinsic sampling error bars.

Rigidity Profiles
To complement the analysis above, we also graphically compare two additional descriptions of network rigidity that resemble contact maps. The Rigid Cluster Map (RCM) is a N|N symmetric matrix that identifies co-rigid a{carbon pairs within protein structure. By definition the main diagonal is rigid (an a{carbon is rigid with respect to itself). When constructing the RCM matrix, if a pair of a{carbons belong to the same rigid cluster a value of 1 is assigned to the intersection of both vertices (specifying the row/ column of the matrix), else 0 is given. For one run of the PG, the RCM is a binary plot simply highlighting co-rigid residue pairs. The PG RCM plots are based on a majority rule across the ensemble. That is, if 50% or more of the realizations is rigid a 1 is assigned, otherwise 0 is assigned. For the VPG, there is only one run, and the output will be 1 or 0. Since the RCM is symmetric, the lower triangle shows the VPG, while the upper triangle shows the PG results.
Further, we also employ Mechanical Coupling Maps (MCM) to characterize how flexibility propagates throughout structure. The MCM quantifies the degree of flexibility of each a{carbon in a protein relative to a reference a{carbon, which serves as a rigid body anchor to eliminate the trivial rigid body translations and rotations. To calculate the MCM, the maximum number of excess DOF shared between the a{carbon of interest and the reference a{carbon must be determined. Operationally, this is accomplished by first fixing the trivial six DOF on the reference a{carbon, and then launch a pebble search on the other a{carbon to gather the maximum number of internal DOF. Note that the result does not depend on which a{carbon is selected as a reference, as the result depends only on the a{carbon pairs. For normalization purposes, the number of internal DOF found is divided by six (being the maximum number of DOF that an a{carbon can have). This information is presented using a color code scheme in the MCM that ranges from 0 to 1. Since the number of DOF that can be found in the VPG can be fractional (not binary like the RCM), the proper comparison to PG requires the MCM values from the PG runs to be averaged across the ensemble. The MCM thus provides a more nuanced view of network rigidity than the RCM. Because the MCM is also symmetric, again the lower triangle shows the VPG, while the upper triangle shows the PG results.

Quantifying Rigid Cluster Similarity
Characterizing the rigid clusters offers a unique view in terms of the role that chemical interactions play within proteins. In prior work, we have used rigid cluster decomposition of protein structure to provide a statistically significant description of thermodynamic coupling within double mutant cycles [43]. Moreover, there have been many investigations characterizing the loss of rigidity that occurs upon protein unfolding using a Hbond dilution model [11,34,[44][45][46][47][48]. Finally, PG characterizations of rigidity have been used to explain the increased stability of thermophilic proteins [49], RNA function [50], the effects of ligand binding [5] and the identification of critical interactions [51]. In these works, an energy cutoff is used to identify which Hbonds are present. As the energy cutoff is lowered, less H-bonds are included in the structure, thus increasing flexibility. Equivalently, as the energy cutoff is raised, more H-bonds are included in the structure, thus increasing rigidity.
In an analogous way, we vary P nat from 0 to 1 in order to control the number of H-bonds present in the network. One technical difference here, is we treat all H-bonds as equivalent, and ignore their energies altogether. In the above mentioned previous works, H-bond energy was used to characterize the strength of a H-bond so that the weakest H-bonds can be removed before the strongest H-bonds to study protein unfolding. The reason for intentionally treating all H-bonds as equivalent in this this work is because we are interested in testing how good the VPG can represent PG over the ensemble. If some H-bonds are almost always present (lowest energy H-bonds) while some H-bonds are almost always broken (high energy H-bonds), the VPG results will be closer to the PG results because less fluctuations will occur in the H-bond network, meaning the comparisons herein correspond to the worst-case scenario. We have tested and verified this dependence on the fluctuations present in the H-bond network, and we will publish a more physically realistic H-bond dilution protocol elsewhere that models protein unfolding. However, the interest in this report is to show that the VPG provides an excellent approach to characterize protein flexibility/rigidity properties even in the extreme case of uniform H-bond strength.
Under this H-bond dilution strategy, the capacity defined by 3|P nat is the number of DOF that will be removed from the VPG when an H-bond is independent (recall each H-bond is described by three distance constraints). For the PG counter part, a random number between zero and one is assigned to each possible H-bond in each PG realization to determine if it is present or not (H-bond is present if RAND(0,1)vP nat ). Note that empirically we find that for a given P nat , an ensemble of 200 realizations is typically good enough to make robust predictions across our protein dataset. Since we are using PG to define the exact answer to compare against the VPG, we run the PG 1000 times to reduce statistical error bars by ffiffi ffi 5 p relative to what is found in actual applications. As discussed above, the Rand Measure [42] (RM) compares the rigid cluster decompositions from the VPG and PG . Fig. 3a presents the RM calculation across the range P nat values for four exemplar protein structures. The four example proteins span a range of sizes (from 64 to 315 residues) and topological architectures. Specifically, they are the chemotaxis receptor methyltransferase CheR structure (pdbid = 1AF7) [52], the FLAP endonuclease from M. jannaschii (pdbid = 1A76) [53], a small scorpion protein toxin (pdbid = 1AHO) [54] and the disulfide oxidoreductase from P. furiosus (pdbid = 1A8L) [55]. In each, there is a region where the RM decreases sharply, which corresponds to the worst-case situation when the fluctuations are maximized. For most networks the point of maximum fluctuation identifies the rigidity threshold, representing the transition from flexible to rigid. A similar pattern was detected across our entire dataset, which appears at values as low as P nat~0 :60 and ends at values as high as P nat~0 :95. To calculate RM at each P nat , each one of the 1000 PG realizations is compared to the single VPG description. The PG RM value is simply the average of the RM 1000 realizations. The P nat at the minimum in RM, designated P RM , identifies the worst-case scenario.
The high RM values indicate that the rigid cluster decomposition is very similar across the PG and VPG. To emphasize this point, we identify the PG realization that yields the median RM score across the entire ensemble, which is called PG med . This rigid cluster decomposition for this point is plotted (using the same four proteins) in the first column of Fig. 4. Color differences indicate different rigid clusters, whereas grey indicates a flexible region. The middle column identifies the rigid clusters identified by the VPG. While the similarity is apparent by just qualitatively comparing the rigid cluster decompositions from each algorithm, the difference plots in the third column are the most compelling. Red coloring identifies regions that disagree in rigid cluster composition, whereas grey indicates agreement.
Expanding to our entire dataset, Fig. 3b plots a histogram of all RM scores at the respective P RM value for each protein. The worst-case RM scores are encouragingly large (w80%) for an overwhelming majority of the proteins, thus indicating that the rigid clusters identified by the two algorithms are quite similar. Slight shifts to the considered P nat to just above and below P RM Figure 4. Rigid Cluster visualizations for four example proteins. The first column highlights the rigid clusters identified within the PG realization that corresponds to the median RM value, designated PG med . The middle column corresponds to the rigid clusters identified by the VPG. Finally, differences between the two algorithms are highlighted in the third column. doi:10.1371/journal.pone.0029176.g004 negligibly affects the histograms. Note that when P nat ?0 or P nat ?1 the fluctuations within the network are suppressed, meaning the algorithms become identical. Consequently, the mechanical descriptions converge. Fig. 3a typifies this result, where the two approaches produce identical results (RM~1) at small and large values of P nat .

Over versus Under Prediction of Rigidity
The RM indicates that differences in the rigid cluster decomposition for PG and VPG occur, but the RM does not characterize where and how the differences take place. Therefore, it is important to determine if the VPG tends to systematically over-or under-estimate rigidity in the protein. To determine how often each type of error appears, we quantify similarity within the rigidity of all rotatable backbone (w and y) dihedral angles (torsions). The agreement measure (AM) described above is applied here for three specific proteins, and across the entire dataset. Fig. 5a, b and c present histograms of AM values for three of the proteins from above. In panel (a), it is shown that the VPG slightly overestimates the amount of rigidity within the methyltransferase CheR structure, whereas panel (b) indicates that it slightly underestimates the amount of rigidity within FLAP endonuclease. Panel (c) shows that VPG overestimates the amount of rigidity within the disulfide oxidoreductase. Fig. 3c presents a histogram of the entire dataset. Clearly, the overwhelming majority (w92%) of torsions are in close agreement within statistical uncertainty of PG . Strong disagreement (jAMjw0:1) between both algorithms is minimal, especially considering the  . Rigid cluster maps (RCM) of chemotaxis receptor methyltransferase CheR structure is plotted at two different P nat values. Red coloring identifies residue pairs that are co-rigid. PG results are presented in the upper triangle, whereas the VPG is presented in the lower. At P nat~0 :60, the protein is mostly flexible due to a lack of crosslinking H-bonds. However, the structure becomes increasingly rigid as H-bonds are added to the network. At P nat~0 :75, the VPG slightly under-predicts the extent of rigidity. For this protein P RM~0 :80. doi:10.1371/journal.pone.0029176.g006 comparison is at the P RM of each protein that defines the worstcase. Nonetheless, it is interesting to identify when discrepancies are most likely to occur. A survey of the differences reveals that they generally occur in loop regions and edges of secondary structures, as typified in Fig. 5d.

Rigidity Profile Similarity
We use Rigid Cluster Maps (RCM) to visually highlight pairwise mechanical couplings within structure, using red marks to highlight a{carbon pairs within the same rigid cluster, otherwise no mark is provided. For ease of comparison, the PG results are presented in the upper triangle, whereas the VPG results are presented in the lower triangle. By construction, the protein backbone corresponding to the RCM diagonal is always rigid in both variants. Using the methyltransferase structure from above as a typical case, the two panels in Fig. 6 correspond to two different values of P nat , ranging from a completely flexible (unfolded) structure with few crosslinking H-bonds to a predominantly rigid structure with many crosslinking H-bonds. As one can see, the VPG and PG algorithms give very similar results.
Going a step further, Fig. 7 presents the RCMs of our four example proteins near P RM , thus P nat is corresponding to the critical region. The presented values are slightly shifted from exact P RM values to highlight interesting features. Note that the changes in P nat actually make the RCM plots appear more dissimilar. The large square region along the backbone corresponds to a rigid a{helix. A similar pattern is observed in the disulfide oxidore-  ductase, which also has few crosslinking interactions at this value of P nat . Conversely, the off-diagonal features are mostly conserved in the methyltransferase CheR structure, but the VPG slightly overestimates the extent of rigidity within the core region (residues *75{150). The FLAP endonuclease example provides the most interesting visual differences between the two approaches, where the VPG underestimates the PG predictions. That is, the VPG fails to identify rigid clusters present within the PG . However, the differences are found to be much less severe on closer inspection regarding the number of available DOF. While there are no free pebbles within the PG in these regions, the probabilistic VPG identifies a tiny nonzero fraction (3|10 {3 ). Clearly, this difference is negligible, but the binary RCM makes the difference appear much larger than it actually is.
The Mechanical Coupling Maps (MCM) provide a more nuanced view of rigid cluster decomposition. Unlike the binary RCMs, MCMs are based on a continuous scale that identifies the fractional number of pebbles shared between each a{carbon pair. In this sense, the MCMs are similar to the cooperativity correlation plots calculated by our statistical mechanical DCM [18,19,[22][23][24][25]. Fig. 8 compares the MCMs for the same four proteins in Fig. 7, using the same P nat values. The rigidity overprediction by the VPG in the methyltransferase example is again clear. However, there is appreciable co-rigidity within the residue pairs contained within the range of residues *60{80 and residues *80{150, which was identified as flexible by the the RCM. Additionally, the MCMs reveal a more interesting set of similarities throughout the plots. In the same way, the similarity within FLAP endonuclease is also more pronounced, although the VPG again somewhat overestimates the extent of rigidity. Conversely, the MCMs actually show more dissimilarity within the two examples without any off-diagonal RCM components. In both, there is marginal co-rigidity identified by the PG (the reddish shadowing) due to some rigidity fluctuations throughout the ensemble that is suppressed by the VPG.
Expanding across the entire dataset, Fig. 3d provides a histogram of the Pearson correlation coefficient between the MCM matrices calculated by the PG and the VPG. Clearly, the VPG is consistently a good estimator of the PG behavior. This point is strengthened by Fig. 9, which compares the Pearson correlation coefficients of the MCM of each unique PG realization to the PG plot for the same four proteins considered above. That is, the boxplots describe the intrinsic variability across the PG ensemble. Within each boxplot, the horizontal grey line indicates the median RM value across the PG distribution, whereas the top and bottom of the box indicates the upper and lower quartiles. The whiskers describe the rest of the distribution, and the dots identify outliers (corresponding to the default settings of R). The red line corresponds to the similarity between the PG and VPG MCMs, which is encouragingly strong. In fact, the VPG similarity to the PG behavior is better than the third quartile in all cases. This result clearly indicates that VPG approximates PG behavior better than the vast majority of the single PG realizations. These comparisons are calculated at the same value of P RM as above, meaning they again correspond to the critical region.

The Rigidity Transition
Following earlier works [18,19,22,25,34], we define P t (for transition) as the peak in the rigid cluster susceptibility (RCS) curve, which is defined as the reduced second moment in rigid cluster size. That is, the peak in RCS identifies the point in which the rigid cluster sizes are maximally fluctuating, indicating a transition from a globally flexible to globally rigid network. Twelve examples (including the four proteins discussed above) of RCS curves using the PG and VPG approaches are shown in Fig. 10, all of which are qualitatively similar. As shown by the scatter plot in Fig. 11a, the rigidity transitions identified by the PG and VPG algorithms are highly correlated. In addition, the Average Cluster Size (ACS) at P t is also highly correlated across the two algorithms (Fig. 11b). Since P t identifies the P nat value with maximal variability within the rigid cluster sizes, it is expected to be related to the P RM because the single VPG mean-field calculations suppresses fluctuations. This is indeed the case as indicated by the strong correlation between P t and P RM for both the PG and VPG algorithms (Fig. 11c-d).

Conclusions
In this report, we demonstrate that ensemble averaged PG properties, which requires sampling, is approximated well by a single mean field calculation. That is, the probabilistic VPG accurately reproduces a number of ensemble-averaged network rigidity properties. The high values of the RM clearly indicate that the rigid cluster decompositions are very similar, especially for jP nat {P RM jw0:2. The AM and structural comparisons of the rigid clusters respectively provide quantitative support for this point. Comparisons of the RCM and MCM rigidity profiles between the PG and VPG variants also indicate that the calculated rigidity properties are highly similar. In fact, the PG to VPG MCMs are much more similar than the intrinsic variability across the ensemble of PG snapshots. Finally, the mechanical transitions identified by the peak in the rigid cluster susceptibility curves are highly correlated across the two variants. Taken together, these results collectively demonstrate the utility and power of the virtual pebble game that deals with the probability of finding pebbles rather than the pebbles themselves.

Author Contributions
Conceived and designed the experiments: DRL DJJ. Performed the experiments: LCG. Analyzed the data: LCG. Contributed reagents/ materials/analysis tools: LCG HW. Wrote the paper: LCG DRL DJJ. Figure 11. Rigidity transition effects. (a) The rigidity transition (P t ) is compared across the PG and VPG algorithms. (b) Similarly, the average cluster size (ACS) at their respective P t values are compared across the two algorithms. The value of P nat with the worst RM (called P RM ) is compared to P t calculated using the (c) VPG and (d) PG . The linear relationships occur because the mean field approximation is maximally inaccurate in this range. Note, a few proteins do not have completed peaks in their rigid cluster susceptibility curves because the protein never crosses the rigidity transition, which have been excluded from panels (c) and (d). doi:10.1371/journal.pone.0029176.g011