The dynamics and flexibility of protein disulphide-isomerase (PDI): predictions of experimentally-observed domain motions

We have studied the mobility of the folding catalyst, protein disulphide-isomerase (PDI), by molecular dynamics and by a rapid approach based on flexibility. We analysed our simulations using measures of backbone movement, relative positions and orientations of domains, and distances between functional sites. Despite their different assumptions, the two methods are surprisingly consistent. Both methods agree that motion of domains is dominated by hinge and rotation motion of the a and a' domains relative to the central b-b' domain core. We identify the a' domain as showing the greatest intra-domain mobility. The flexibility method, which requires 10^4-fold less computer power, predicts additional large-scale features of inter-domain motion that have been observed experimentally. We conclude that the methods offer complementary insight into the motion of this large protein and provide detailed structural models that characterise its functionally-significant conformational changes.

• The rapid method predicts experimentally-observed extreme domain re-orientations • Used in synergy, the methods can simulate fully the flexibility of large proteins Introduction Proteins show internal mobility over a wide range of time-and length-scales, from the rapid local motions that define the conformational entropy of a given protein conformation (<ns), to the functionally-significant, concerted, slower motions (> ms) of loops or whole domains that interconvert different conformations (Henzler-Wildman andKern 2007, Tzeng andKalodimos 2012). However, there is a real limitation in our ability to describe fully the dynamics and flexibility of proteins and to characterize their motions -both in isolation and in response to interactions with partner proteins and other molecules. The gold standard approaches in this field are protein structure determination to atomic resolution by X-ray crystallography, and computationally-intensive simulations of motion by all-atom molecular dynamics (MD) calculations. For small proteins, MD simulations routinely extend over timescales up to 100 ns and can effectively identify local motions. But the most functionallysignificant motions, involved in protein activity and protein:protein recognition, involve larger-scale structural units and motions -such as closure of a cleft between domains-which occur on timescales orders of magnitude longer than those accessible by typical MD simulations. There is a clear need for methods which can simulate protein motion that are less computationally-intensive but retain the full physical reality of the structure of the protein at the atomic level.
An attractive approach is to deconstruct and explore the protein structures in the form of networks of connections. Such networks at different levels (backbone, side chain and Hbonds based connectivity) have been investigated earlier (Thorpe, Lei et al. 2001, Greene and Higman 2003, Brinda and Vishveshwara 2005, Glembo, Farrell et al. 2012. We developed a consistent method for deconstructing a protein structure into a network of rigid and flexible units and then for exploring the natural modes of motion of this network, based solely on information contained within the high-resolution structure (Jimenez-Roldan, Freedman et al. 2012). The method can characterize plausible modes of motion in terms of the regions involved, the 'hinge' points, and can also explore the limits on the amplitude of motion associated with any given mode. We previously applied this method to 6 proteins from typical small 'model' proteins of <100 residues to large proteins with >1000 residues.
One of the large proteins investigated was protein disulphide-isomerase (PDI), an abundant catalyst of oxidative protein folding in the endoplasmic reticulum of secretory cells (cf. Figure 1). PDI is essential for the accurate and efficient co-and post-translational folding of secreted and cell-surface proteins including medically-important classes of protein such as antibodies, cytokines, digestive enzymes, blood-clotting factors etc.; indeed, manipulation of PDI levels is a successful strategy used in industry for increasing the yields of high-value recombinant disulphide-bonded proteins, see e.g. (Finnis, Payne et al. 2010). PDI comprises four domains, each of which has the conserved thioredoxin-fold conformation (trx-fold), arranged in sequence a-b-b'-x-a'-c, where a and a' are redox-active trx-fold domains, b and b' are structurally-similar domains lacking redox activity, x is an extended linker and c is a highly acidic C-terminal extension (Hatahet and Ruddock 2009) (refer to Figure 1a). The function of PDI within the cell involves successive interactions with a range of diverse partner proteins and protein substrates (Kozlov, Maattanen et al. 2010, Bulleid andEllgaard 2011) which implies that PDI is a dynamic molecule, a conclusion supported by the long delay between its discovery in the 1960s and the first determination of a crystal structure of a full-length multi-domain PDI (Tian, Xiang et al. 2006). There is now a substantial body of experimental evidence supporting this view of PDI as a conformationally dynamic protein whose flexibility underlies its function. Studies on mammalian PDI, and also on PDI from the fungus H.insolens, indicate that there is extensive relative movement of the a' and b' domains, facilitated by the x-linker (see Discussion for further detail and references). By contrast, crystallization of yeast (S.cerevisiae) PDI (yPDI) at different temperatures produced crystal structures which differ markedly in the relative orientation of the a and b domains (Tian, Xiang et al. 2006, Tian, Kober et al. 2008. In this paper we have three purposes: (i) to explore and describe in detail the motions of PDI, predicted by our novel approach, (ii) to compare this set of motions with those predicted by a gold-standard multi-ns all atom MD simulation and (iii) to compare the predictions of both methods to the experimental data on yPDI flexibility. As mentioned earlier, the rapid flexibility approach has accessed a large conformational space. In particular, starting from the crystal structure of yPDI determined at 4 o C, a conformation found in crystals grown at room temperature is accessed by the flexibility approach. This new conformation is not accessed by a relatively short (30 ns) MD simulation, highlighting the strength of the flexibility approach in rapidly exploring realistic regions of conformational space.

Characterisation of yPDI motion predicted from rigid cluster and normal mode analyses
The starting point for our approach to mobility simulation of full-length yPDI is the highresolution structure (pdb: 2B5E), which is used to generate a representation of the molecule as a set of rigid clusters and flexible linkers. Rigidity analysis includes bonds as constraints and then proceeds by successive exclusion of the weakest bonds in the structure (Thorpe, Lei et al. 2001) (see Supplementary material and Figure S1). Our method for simulating protein motion (Jimenez-Roldan, Freedman et al. 2012) then applies normal mode analyses (Suhre andSanejouand 2004, Dykeman andSankey 2010) to the high-resolution structure to derive vectors that describe the low-frequency modes of motion of the protein. It then models the motion through a series of steps along the vectors (Wells, Menor et al. 2005, Jolley, Wells et al. 2008, treating clusters as rigid and other regions as flexible while maintaining constraints of local bonding geometry and steric exclusion. In this analysis, we have focussed on normal modes 7-11 (m 7 -m 11 ) which are the five lowest-frequency non-trivial normal modes (m 1m 6 describe trivial rotations and translations). We find that the motions are primarily motions of a and (especially) a' domains relative to a bb' base.

MD study of yPDI highlights relative motion of domains
In parallel with this analysis, we have undertaken an MD simulation study of the potential intramolecular motion of yPDI at 300K, based on the same high resolution x-ray structure 2B5E. We generated a full molecular trajectory over 30 ns (consuming >36,000 CPU hours) and have interrogated the trajectory in a number of ways in order to compare it with the outputs from the computationally less demanding flexibility approach. Both the flexibility and MD simulations generate trajectories comprising a series of all-atom protein structures and hence both can be subjected to the same analyses.
Initially we asked whether we could confirm that the major features of motion were motions of whole domains. To test if the domains remained essentially intact as structural units, we obtained measures of β-sheet geometry for each domain and asked how constant these measures remained through the simulations of motion. For each domain, we selected a number of C α atoms to define the sheet and extracted angles between sets of three atoms as measures of sheet geometry. We then derived the mean and standard deviation of these angles through the trajectories of motion obtained both by MD simulation and flexibility analysis. Figure 1c plots these data for each domain and shows (i) that each angle measure shows a very narrow variation through the motion simulations (standard deviations are +/-2-3 o for the flexibility simulations and +/-5-6 o for the MD simulations), and (ii) that there is very close agreement between both types of simulation in the mean angles observed. These data confirm that the core β-sheets of each domain retain essentially constant geometry throughout both sets of simulation (see also Figure S2).

Analysis of inter-domain motion predicted by flexibility and MD simulations
In order to provide a quantitative account of the large-scale inter-domain motion of this multi-domain protein, we have represented each domain as a plane based on its core β-sheet and calculated the relative orientations of these planes (tilt (θ) and dihedral twist (δ)) through simulations of motion. Figure 2 presents a twist/tilt plot for each adjacent domain pair and allows a direct comparison of the predictions from flexibility analysis and MD simulations.
The upper panel represents the a-b domain pair and shows that the MD simulation explores a considerable range around the 2B5E starting structure, but that all orientations fall close to a single line in twist/tilt space. Flexibility analyses in modes m 7 and m 8 predict the same pattern of motion, but modes m 9 and m 10 begin to explore other regions of twist/tilt space. In particular, motion along the negative direction of m 10 , converts the relative orientation of the a and b domains into precisely that found in the alternative 'high-temperature' crystal structure of yeast PDI (pdb: 3B0A); this structure lies far away from the region explored in the MD simulation.  Figure   S3) To indicate the scale of the relative motion of domains in our simulations, we have used as a measure the distance between active site residues located in domains a (Cys61 in the fulllength sequence) and a' (Cys406), respectively. This distance is 27Å in the crystal structure, but Figure 3a demonstrates that movement along m 7 allows it to vary freely from 15 -80Å due to the co-ordinated hinge-like motion of domains a and a' relative to b-b'. Figure 3a also shows the less striking evolution of the inter-active-site distance through the other normal mode trajectories, reflecting the more complex motions in these modes. Figure 3b shows that during the 30 ns MD simulation, the inter-active-site distance increases from its initial value and then varies widely through the trajectory, ranging up to 68Å. (See also Figure S4 and Figure S5).

Analysis of intra-domain motion predicted by flexibility and MD simulations
To focus more on local, intra-domain motion, we have determined the pseudodihedral angle ξ (Warshel andLevitt 1976, DeWitte andShakhnovich 1994) at each residue for each structure computed in modes m 7 -m 10 and then derived the RMS variation of cos ξ, as a measure of the net flexibility at each residue. As shown in Figure 4a, this analysis highlights as regions of greatest flexibility the N-and C-termini and the x-linker, with local maxima also shown at the a-b and b-b' domain boundaries. This confirms that the major flexibility of the protein arises from the relative motion of domains but it also indicates that there is considerable intra-domain motion. Figure 4a also displays the variation of pseudodihedral angle ξ, as derived from the MD analysis. MD detects much more local and higher frequency motion and hence it is not surprising that the MD-derived plot shows much greater scatter.
Taking a running average over 5 residues smooths some of this and again shows that the most marked motion is at the termini and at the x-linker; motion at the other domain boundaries is not more marked than at several intra-domain sites. Interestingly, both methods indicate much greater intra-domain motion within the a' domain than within the other domains.
To obtain an alternative measure of the extent of intra-domain motion through our simulations, we monitored the evolution of the MD trajectory over 30 ns in terms of the RMSD between each of the domains and the structure of that domain in the initial full-length protein (Figure 4b). It is apparent that each domain initially deviates to some extent from its original structure in the crystal, but that for the period from 5-25 ns, these differences are approximately stable, with domain a' showing the greatest RMSD and domains a and b showing the smallest. What also emerges from Figure 4b is that the MD analysis over this period does not explore the full potential for motion. In the period from 25 nsec onwards, domain b (in contrast to the other domains) begins a further phase of motion, which considerably increases its RMSD from the starting structure. For comparison, we analysed the change in RMSD for each domain through the flexibility trajectories of modes m 7 -m 11 .

MD analysis of extreme 'closed' structure generated by flexibility analysis
It is noteworthy that the MD trajectory (Figure 3b) shows the distances between active-site Cys residues (values range from 22 to 68 Å) as typically much greater than the initial value (~26 Å) and does not explore more 'closed' structures. By contrast, our analysis of motion along normal mode m 7 suggested that the molecule was able both to 'open' and 'close' relative to the starting structure, generating values for the inter-active site distance in the range 15-80 Å (Figure 3a). To test whether such closed structures were artefacts of our flexibility approach, a short MD simulation (10 ns) was performed on a 'closed' structure to assess its physical plausibility. We took the atomic co-ordinates of a 'closed' structure representing the extreme of positive motion along m 7 and used it as the starting point for the MD simulation. Figure 5a shows the MD trajectory over 10 ns starting from this extreme 'closed' structure, expressed in terms of the inter-active-site distance. The molecule moves gradually to explore structures in which the distance between the active site residues 61 and 406 lies in the range from 8 -17 Å. The intra-domain distance between cysteines at residues 61 and 90 remains constant during the simulation as in Figure 3b. The inter-cysteine distance for the 90-406 pair is correctly coordinated with the 61-406 pair. Two conclusions are very clear from this simulation, (i) the closed structure is physically plausible as it is not immediately abandoned in the first few ns of the MD simulation, and an ensemble of 'closed' conformations are explored by MD simulations, (ii) the original 30 ns MD simulation did not fully explore the conformational space available to yPDI, since it never generated any structure comparable to those found in this 10 ns simulation (as judged by the value of the inter-site distance). These conclusions are confirmed by the data in Figure 5b, which represent the trajectories from the 10 ns MD simulation in terms of twist/tilt plots. Hence the flexibility analysis clearly finds energetically plausible inter-domain orientations, which do not simply regress from the 'closed' structure towards the orientations found in the crystal structure. It is known in the literature (Caves, Evanseck et al. 1998, Loccisano, Acevedo et al. 2004) that short multiple MD simulations, with different starting conformations, can explore the conformational space better than a single long equilibrium simulation. Our results from two MD simulations reiterate this point. More importantly, we have shown that the flexibility approach can complement MD simulations by quickly providing such a set of realistic, different starting points.

Discussion
The main feature of PDI molecular motion predicted by both flexibility and MD approaches is that the redox-active a and a' domains show considerable freedom of motion with respect to a relatively fixed base provided by the b and b' domains. This motion includes both rotations and hinge-like opening and closing about the a-b and b'-a' hinges. These findings are valuable in reconciling discrepancies in data from literature on the mobilities of PDIs from the yeast S.cerevisiae and other sources. A substantial body of work on human PDI and its fragments in solution, using NMR and intrinsic fluorescence, has identified motion of the x-linker region relative to the adjacent b' domain as a key feature; such motion would modulate ligand access to the major non-covalent ligand binding site on the b' domain and alter the orientation of this domain relative to the catalytic a' domain (Nguyen, Wallis et al. 2008, Byrne, Sidhu et al. 2009, Wang, Li et al. 2012. This is consistent with selective proteolysis studies on bovine and human PDI (Freedman, Gane et al. 1998, Wang, Chen et al. 2010), which showed preferential cleavage around the x linker indicative of substantial flexibility of the protein in the region between the b' and a' domains. A major change in relative orientation of these two domains was also inferred from NMR, x-ray and SAXS studies on the reduced and oxidised states of PDI from the fungus H.insolens (Nakasako, Maeno et al. 2010, Serve, Kamiya et al. 2010. A similar redox-driven re-orientation has recently been demonstrated in x-ray crystallography structures of truncated human PDI (Wang, Li et al. 2012). By contrast, work on yPDI has provided highly valuable insights, based on x-ray crystallographic structures, but little comparable work on mobility in solution.
The x-ray work, based on yPDI crystals grown at different temperatures, has provided two structures which differ very little in the relative orientation of the b' and a' domains, but show clear differences in the N-terminal half of the protein, in the relative orientation of the a and b domains (Tian, Xiang et al. 2006, Tian, Kober et al. 2008. The simulations reported here are based on the higher resolution yeast structure, derived from crystals grown at 4 o C.
They show that there is extensive motion of both a and a' domains, with the motion of the a' domain being more marked, but that the mode 10 trajectory involves a rotation of the a domain relative to the b domain which results in the orientation found in the higher temperature structure. It is likely that yPDI in solution shows extensive mobility both in the a-b and in the b'-x-a' regions, but that the non-statistical sampling of conformations provided by crystallization has highlighted only the a-b motion.
Our rapid flexibility analysis requires only a few CPU hours to generate many trajectories for this large (c. 500 residues) protein in full atomistic detail, providing the basis for a close analysis of molecular motion. Given the fundamental conceptual and operational differences between MD and the rapid flexibility approach -and the difference of several orders of magnitude (×10 4 ) in computer time that MD requires -their predictions are strikingly congruent in many respects. The methods provide very similar predictions of the range of inter-domain orientations explored (Figure 2), the variation of inter-active-site distance that is accessible (Figure 3), and the extent to which the individual domains show intra-domain flexibility (Figure 4). The MD approach provides a far more detailed picture of local motion (Figure 4a) but does not explore some very large amplitude domain motions, which are detected by the flexibility approach and are supported by experimental data. One example is the very large change in the a-b inter-domain twist angle (Figure 2a) predicted by motion along mode 10, which generates precisely the relative orientation of these domains found in an alternative crystal structure (Tian, Kober et al. 2008). A further example is given by flexible motion along mode 7, which can generate 'closed' structures with a much shorter inter-active-site distance (ca. 15Å) than that found in either crystal structure. Experimental data indicate that these sites can be cross-linked by bifunctional chemical reagents with maximum span 16Å (Hawkins, de Nardi et al. 1991) highlighting the extent to which the 'horseshoe' structure of PDI can close in solution. Furthermore, although such structures were not generated in the initial 30 ns MD simulation, such closed structures are rather stable ( Figure 5) over a 10 ns MD simulation, with limited amplitude fluctuations in domain orientation and inter-active-site distance.
In conclusion, the wide-ranging and disparate experimental data that indicate the extensive and functionally-significant flexible motion of PDI have now been supplemented with structurally-detailed simulations which confirm the character and extent of this flexibility.
Both MD and the rapid flexibility approach predict the extensive relative domain motions, especially the motions of the a' and a domains relative to the b-b' core. The two approaches are complementary in that MD provides data on high-frequency local motion while the flexibility approach allows a more rapid exploration of the full range of inter-domain motion.
Our comparison of the flexibility results with experimental data and with MD results gives us confidence to proceed to a more wide-ranging application of this approach to the many other PDI family structures that have been determined since 2006 when the yeast PDI (2B5E) structure was published (Kozlov, Maattanen et al. 2010).

Experimental Procedures
Flexibility simulation: We are using a recently developed rapid method (Jimenez-Roldan, Freedman et al. 2012), combining protein rigidity analysis , geometric modelling of flexible motion (Wells, Menor et al. 2005, Jolley, Wells et al. 2006 and elastic network modelling (Suhre and Sanejouand 2004). This method moves the structure while maintaining bonding and steric constraints. Coarse-grained elastic network modelling identifies low-frequency modes, which are possible directions for flexible motion.
Rigidity analysis rapidly identifies rigid clusters and flexible regions which can act as "hinges". These flexibility simulations allow us to explore large-amplitude motion along multiple normal modes in an all-atom protein structure at minimal computational expense (Jimenez-Roldan, Freedman et al. 2012) and provide valuable information on the flexibility of the protein and its functional sites. We have performed the analysis through multiple (2000-5000) steps in each direction from the starting structure, representing the full structure after every 100 steps; hence the analysis is always based on 10 trajectories (one in each direction for each of 5 normal modes) where each trajectory comprises 20-50 all-atom structures in the form of PDB files. We previously showed that each trajectory leads to a gradual shift of the protein from the starting structure, as measured by RMSD between original and derived structures, and that this shift may reach an asymptote, where no further motion is possible along the initial vector, as a result of steric constraints (Jimenez-Roldan, Freedman et al. 2012). See Supplemental Information for more details.

Molecular dynamics:
All-atom MD simulations on yPDI were performed at 300K using the AMBER9 suit of programs (Case, Darden et al. 2006) with ff03 force field (Duan, Wu et al. 2003) and parm99 parameters. Explicit solvent MD simulations are carried out on the 2B5E crystal structure of yPDI (for 30 ns) and a 'closed' structure of PDI (for 10 ns) generated from the geometric simulations as described above. The MD simulations are performed in aqueous medium using the TIP3P water model. The solvation box is 12Å from the farthest atom along any axis. Na + ions are added to neutralize the net charges on yPDI using the tleap module in AMBER9. The MD simulations are performed under NPT conditions using the Berendsen thermostat and periodic boundary conditions. Particle Mesh Ewald (PME) summation is used for long-range electrostatics and the van der Waals cut-off used is 10 Å.
The pressure and temperature relaxations are set to 0.5 ps -1 . SHAKE constraints are applied to all bonds involving H atoms. A time step of 2 fs is employed with the integration algorithm and the structures are stored every 1 ps. All the MD simulations are implemented and analyzed using a 264 core Intel Xeon HPC cluster.
Pseudodihedral RMS: To generate the pseudodihedral ξ i for residue i at Ca-atom position r i , we consider the inter-residue pseudo-bonds q i-1 , q i , q i+1 defined such that q i = r i+1 -r i for all residues i along the protein main chain (except for the first residue in the sequence and for the last two residues). The dihedral angle ξ i formed by q i-1 and q i+1 about the axis of q i then characterizes the orientation of the residues and their variation and flexibility. When the protein main chain is in a fully extended (β-strand) state, ξ i is near zero (cos(ξ i )»1). When the main chain is in a coiled or turned conformation (α-helix or β-hairpin), ξ i has magnitude above 90 degrees, and cos(ξ i ) is negative, typically around -0.7 for α-helices. We extract cos(ξ i ) for each residue i and each conformer generated during a motion. We then use the root-mean-square-deviation of all such cos(ξ i ) values as our measure of flexibility at residue i. : Each of the a, b, b', a' domains contains a central β-sheet, characteristic of the thioredoxin fold. The orientation of the domain can be represented by a vector normal to a central part of the β-sheet. This vector is generated by selecting the C α atoms of four 'central' residues, namely alternating central residues in each of the adjacent anti-parallel strands β 2 and β 4 (by PDI numbering; these are strands 1 and 3 of the thioredoxin fold). These 4 atoms define a quadrilateral and from its two diagonals we construct a normal vector n. When we now consider two adjacent domains in the yPDI structure -labelled as domains 1 and 2, with central positions r 1 , r 2 and plane normals n 1 , n 2 -we can define an inter-domain tilt angle θ in the range 0 to 180 degrees as cos(θ)=n 1 •n 2 . An inter-domain dihedral twist δ , with range from -180 to +180 degrees, is obtained by constructing an inter-plane vector r 12 =r 2 -r 1 and considering the dihedral δ between the plane containing n 1 , r 12 and the plane containing n 2 , r 12 (see Supplemental Information).

Rigid cluster analysis of yeast PDI
The starting point for our approach to mobility simulation of full-length yeast PDI is the high-resolution structure (pdb: 2B5E), which is used to generate a representation of the molecule as a set of rigid clusters and flexible linkers. In Figure 1 we show the structure schematically both as a ribbon diagram and as a linear representation (), highlighting its organization as 4 distinct domains with an extended linker between domains b' and a'. The rigid cluster analysis includes bonds as constraints but is dependent on setting a 'cut-off' energy parameter (E cut ) to include strong H-bonds and exclude weaker ones (as inferred from the high-resolution structure). Figure S1 illustrates the effect of varying the value of E cut on the analysis. Inclusion of all Hbonds leads to a representation of PDI as a single rigid cluster. Exclusion of the weakest bonds gives a model that shows the molecule as four distinct rigid clusters corresponding closely to the structurally-defined domains. Further stepwise exclusion of weaker bonds then leads either to fragmentation of each cluster into a series of smaller clusters (see e.g. domain a') or to maintenance of the domain as a single cluster, but with specific regions becoming flexible and hence excluded from the cluster (see e.g. domain a). We have performed the rigidity and mobility analyses at several values of E cut and found that, in practice, the nature of the results is not very sensitive to E cut (as shown in Figure S1. For the more detailed analysis in the paper, we have focussed on simulation at E cut = -2 kcal/mol; previous work on other proteins has shown that this value makes flexibility predictions that are borne out by unexpected experimental findings (Li, Wells et al. 2012, Amin, Wallis et al. 2013.
Also, additional modes could be explored, and we have analysed trajectories for modes m 12 -m 16 but we find that the lowest 5 modes capture the majority of the accessible motion and provide a good overall picture with considerable structural detail. In many cases the mode trajectory is eventually limited by steric constraints, providing an amplitude limit on the motion.

More details on the mobility analysis
The measures of β-sheet angles as shown in Figure 1 represent averages over the 30ns MD. Figure S2 shows

More on the use of the structural measures Pseudodihedral RMS:
The conformation of the protein main chain can be almost completely specified by giving the two Ramachandran angles (φ,ψ) for each residue. These define the geometry of the C-N-C α -C and N-C α -C-N variable dihedrals around the C α atom. The variation of the Ramachandran angles in the course of a simulation is then a measure of the flexibility of the protein main chain. However, much of the information present in the Ramachandran angles can be captured using one number per residue rather than two, by considering a pseudodihedral measure defined by the C α atoms of four successive residues (Levitt 1976, DeWitte andShakhnovich 1994). The variation of this backbone pseudodihedral is, in turn, a convenient measure of protein flexibility.
Obviously we are concerned less with the absolute value of ξ i , but more with the variation in ξ i during motion. Our approach to describing the flexibility of the protein is as follows. For each structure generated during simulations of motion, we extract cos(ξ i ) for each residue i. We then find the mean and variance of cos(ξ i ) over the course of the simulation for each residue. The root-mean-square-deviation of cos(ξ i ) is our measure of flexibility.

Tilt and dihedral twist:
Let us identify the atoms at the corners of a quadrilateral as a, b, c, d in cyclic order (cp. the schematic in Figure 2). They have position vectors r a , r b , r c , r d . We obtain a central position r for the plane as r=(r a +r b +r c +r d )/4. Vectors representing the diagonals of the quadrilateral are r ac =r c -r a and r bd =r d -r b and n= r ac ×r bd is the unit normal vector n parallel to the cross product of the two.
We now consider two adjacent domains in the yPDI structure, which we label as domains 1 and 2, with central positions r 1 , r 2 and plane normals n 1 , n 2 . An interdomain tilt angle θ in the range 0 to 180 degrees is obtained from cos(θ)=n 1 •n 2 .
An interdomain dihedral twist δ in the range -180 to +180 degrees is obtained by constructing an interplane vector r 12 =r 2 -r 1 and considering the dihedral δ between the plane containing n 1 , r 12 and the plane containing n 2 , r 12 . Two natural motions for adjacent domains are (i) a "towards-and-away" tilting motion, in which θ varies substantially with little variation in δ, and (ii) an "axial twist" motion of rotation about the interplane vector, generating a co-variation of θ and δ. Motions of this kind are visible in our tilt-twist plots as horizontal and diagonal trajectories.
Tilt and twist values were extracted for the input structures, for the structures generated in the simulations of flexible motion, and for conformers of the MD trajectory, to describe three relative domain orientations: a-b, b-b', and b' Note that we use next-nearest-neighbour rather than nearest-neighbour residues in each strand, in order to prevent the pleating of the β-sheet affecting our results. values of E cut correspond to greater limitations on flexible motion and, as expected, they lead to more restricted motion, as measured by this parameter.

Additional analysis of the inter-and intra-domain motion
The histogram of the MD inter-site distances ( Figure S5) is derived from the data in Figure 3b and confirms the wide spread of distances with the most common distance being close to 40 Å. Although this inter-domain distance varies widely through the MD simulation, intra-domain distances (exemplified by the distance between the active-site residue Cys61 and a buried residue within the same domain (Cys90)) remain essentially constant (Figure 3b) and shows a very tight distribution around 11 Å ( Figure S5). Figure S6 shows that in the majority of modes, the intra-domain RMSD is greatest for the a' domain and smallest for a and b domains. This trend in intra-domain motion is also evident in the analysis of variation of backbone pseudodihedral angles ( Figure   4a) where both MD simulation and flexibility analysis show greatest variation in the a' domain.

Figure S1
Rigid cluster decomposition graph. The horizontal axis represents the protein backbone and the vertical axis the energy E cut . The residues belonging to rigid clusters are colored as in Figure 1 whereas the flexible regions are shown as horizontal thin black lines.

Figure S2
The β-sheet angles remain stable along the 30 ns MD trajectory. The three colours correspond to the three angles defined on the indicated residues for each β-sheet (cf. Error! Reference source not found.c). See also Figure Figure 1c