Amino Acid Insertion Reveals a Necessary Three-Helical Intermediate in the Folding Pathway of the Colicin E7 Immunity Protein Im7

*Corresponding author. Astbury Ce Molecular Biology, University of Le UK. E-mail address: s.e.radford@lee Abbreviations used: Im7, inhibitor providing immunity to cells produc Im7H3M3, Im7 variant containing e KXY, the equilibrium constant betwe rate constant for the conversion of X denaturant-dependence of the free e Y; mXY, the denaturant-dependence logarithm of the rate constant kXY; U intermediate state; TS, transition sta Beta-Tanford values of species X; TF trifluoroethanol; CD, circular dichro


Introduction
Previous studies have shown that the small fourhelical protein Im7 (an inhibitor protein for colicin E7 that provides immunity to cells producing colicin E7; Fig. 1a) folds via an unusually rugged folding landscape involving the population of an onpathway hyperfluorescent intermediate. 1,2 This species contains a core of hydrophobic residues that is ∼20% expanded compared with the native state ( Fig. 1b) and is stabilised by both native and nonnative interactions. [1][2][3][4] The intermediate ensemble lacks a structured helix III, but contains helices I, II and IV, which are oriented in a non-native manner so as to minimise the exposed hydrophobic surface area that would result from a native-like helical organisation in the absence of helix III. The low ϕ-values for point mutations in the sequence that forms the native helix III in both the intermediatestate ensemble and the subsequent rate-limiting transition-state ensemble suggest that helix III only docks onto the developing protein core subsequent to crossing the rate-limiting transition-state barrier as the native structure develops (Fig. 1b). More specifically, residues Leu53 and Ile54, which are in helix III of the native structure and form an integral part of the hydrophobic core of the native protein, appear to play little or no role in stabilising the intermediate state. 1,5 By contrast, either or both of the natively solvent-exposed or partially exposed residues Tyr55 and Tyr56 in helix III, which are essential for the inhibitory action of nuclease E colicin immunity proteins, have been suggested to form non-native interactions during folding, helping to anchor the stretch of residues that will ultimately form helix III onto the three-helix intermediate. 2,[6][7][8] Although much less stable than the intermediate in Im7 folding, an intermediate has been shown to form transiently during the folding of the Im7 homologue Im9. 9 These data suggest that formation of a three-helical intermediate is a ubiquitous feature of the folding mechanism of immunity proteins, with specific side-chain-side-chain interactions in different proteins stabilising the folding intermediates to different extents. This raises the intriguing question of whether the short nature and low helical propensity of the sequence comprising the native helix III, which is highly conserved (83%) across the family of immunity proteins, 10 are responsible for the development of the three-helical intermediate, or whether other features of the protein sequence dictate three-state folding. In order to address this question, we describe here a series of experiments in which the sequence of helix III was redesigned to increase its length and helical propensity through the insertion of an eight amino acid polyalanine-like helix without disruption of the native Im7 structure. The resulting newly extended helix III is predicted to have a length and a helical propensity that exceed those of helices I, II and IV. Here, we describe the design of this variant Im7 and the determination of its structure using NMR spectroscopy. In parallel, by analysis of the folding mechanism using ϕ-value analysis, we provide evidence that suggests an obligate requirement for folding via a three-helical  14 16 Peptides corresponding to helices I, II, III and IV were constructed, and their α-helical content was determined using far-UV CD at 10°C in 50 mM sodium phosphate (pH 7.0). The measured percent helicity for each peptide is also shown. (a) Structure of wild-type Im7, drawn from the coordinates of 1AYI. 10 Helices are coloured as follows: helix I, blue; helix II, red; helix III, green; helix IV, yellow. The figure was drawn using Chimera. 34 (b) Cartoon representation of the folding mechanism of Im7. The on-pathway intermediate forms on the sub-millisecond timescale. How helices I, II and IV dock in the intermediate and transition state ensembles is shown schematically but is based on restrained molecular dynamics simulations of these species using Φ-values or hydrogen exchange protection factors as restraints. 1,43 The helical structure in the unfolded state in the absence of denaturant is not known, although it is expected to be low, as predicted using AGADIR. 12 intermediate, irrespective of the length and helical propensity of helix III.

Results
Construction of a variant of Im7 with a highly helical helix III On average, α-helices in native proteins contain 12 residues. 11 Helices I, II and IV of Im7, containing 13, 14 and 14 residues, respectively, conform to this view (Table 1). These three helices have average helical propensities of ∼8%, 2% and 3%, respectively, as predicted by AGADIR 12 (Table 1 and Fig. 2a). By contrast, helix III of Im7 has only six residues and no significant helical propensity (Fig. 2a). Of the six residues that comprise helix III (Thr51, Asp52, Leu53, Ile54, Tyr55 and Tyr56), Asp52 and Tyr56 are highly solvent exposed, while the remaining residues are either totally buried or partially buried in native Im7, leading to stabilisation of the helical structure of this sequence as the native helix III. In order to increase the helical propensity of helix III, we initially considered substituting both Asp52 and Tyr56 with Ala. According to predictions from AGADIR, 12 however, these substitutions do not significantly increase the predicted helical propensity of helix III. To increase the helical propensity of helix III further without perturbing the structure of the native protein, especially its native hydrophobic core, we found it necessary to extend the length of the helix. It has been shown that polyalanine sequences have a high helical propensity, even in the absence of tertiary contacts. 13,14 Concatenation of such a sequence with that of the native helix III was thus considered to be a possible route towards the redesign of helix III to a length and a helical propensity commensurate with those of helices I, II and IV.
Three variants of helix III were designed and tested. First, the natural sequence of helix III (TDLIYY) was altered to TALIYA (substituting the exposed Asp52 and Tyr56 with Ala) without an increase in predicted helicity. This sequence was then expanded to contain, in total, an additional eightresidue helical segment based on the polyalanine TALIYAAAAAAAA, which increased the average predicted helicity of the sequence to 7%. Incorporation of this sequence into Im7, creating variant Im7H3M1, did not significantly alter the stability of the native protein compared with that of wildtype Im7 (ΔG UN o = −24.1 ± 0.7 kJ mol − 1 and ΔG UN o = − 25.6 ± 0.32 kJ mol − 1 , respectively; data not shown). 15 The addition of residues Asn-Pro-Gly at the C-terminus of the newly incorporated sequence, which is known to form an efficient C-capping motif, 16 to give an insert sequence of TALIYAAAA-AAAANP increased the average predicated helical propensity to 10%, again without significantly altering the stability of the resulting protein Im7H3M2, which had a ΔG UN o of −22.5 ± 0.67 kJ mol − 1 (data not shown). 15 In order to further stabilise the engineered helix, we introduced a potential salt bridge into the inserted region as another contributor to α-helix stability. Such side-chain-side-chain interactions potentially contribute between 0.4 and 2.0 kJ mol − 1 to the stability of a protein. [17][18][19] As noted above, Asp52 and Tyr56 of Im7 are solvent exposed, 10 and if the inserted region adopts the same helical structure, the residues at position I (originally Tyr56) and  (a) Prediction of the helical propensities of the four helices in wild-type Im7 and the Im7H3M3 variant using the prediction algorithm AGADIR. 12 Residues in helices are highlighted in accordance with Fig. 1a. Residues in unstructured regions are shown in grey. The helical propensity of wild-type Im7 helix III is shown in green, while that of Im7H3M3 is shown in purple. (b) Primary sequence of residues in helices III and IV and their sequences in Im7 (top) and Im7H3M3 (bottom). Helix III (green) and helix IV (yellow) in wild-type Im7 are shown. Residues predicted to occupy helix III in Im7H3M3 are shown in purple. position V of the insert (Fig. 2b) are predicted to be solvent exposed. Therefore, the alanine residues at insert positions I and V were substituted with Glu and Arg, respectively to create Im7H3M3. As summarised in Fig. 2b, the redesigned helix III in Im7H3M3 is expected to contain a C-terminalcapped polyalanine helix with a salt bridge located towards its centre. Together, these stabilising features are predicted to increase the average helical propensity of helix III (over all residues) to 14% ( Fig. 2a and Table 1). The variant Im7H3M3 thus contains a sequence for helix III that has the largest predicted helical propensity of all four helices in the protein, while maintaining the key hydrophobic side chains of Leu53 and Ile54 in the native hydrophobic core. In addition to the inserted sequence, Im7H3M3 differs from Im7 in the sequence linking helices III and IV. The wild-type interhelix connection (PSDNRDDS) was substituted in Im7H3M3 with a linker (GGDGGGP) that is expected to have a flexibility greater than that of the native interhelix linker to allow the possibility of the new polyalanine-rich helix III being accommodated into the structure without perturbing the docking of helix IV in the variant Im7 protein. In addition to the five Gly residues in the inserted linker, aspartic acid was incorporated to aid solubility.
Im7H3M3 folds into a helical structure, as judged by far-UV CD (data not shown), with a ΔG UN o of −23.3 ± 0.4 kJ mol − 1 , determined by equilibrium denaturation and chevron analysis (see the text below), similar to the properties of wild-type Im7 ( Table 2). This variant was therefore used to study the importance of the length and helical propensity of helix III in the folding mechanism of Im7.

CD studies of peptide fragments of Im7 and Im7H3M3
In order to ascertain whether the polyalanine-rich sequence of the helix III insert in Im7H3M3 has increased its helical content, as predicted by its enhanced helical propensity, we synthesised a peptide equivalent to this sequence and measured its helical content in aqueous solution and in the presence of 2,2,2-trifluoroethanol (TFE) using far-UV CD. 20 In parallel, peptides equivalent to the sequences of helices I, II and IV and of the natural helix III were also synthesised and studied under identical conditions (Fig. 3).
The far-UV CD spectrum of the Im7H3M3 helix III peptide in 0% vol/vol TFE shows a signal typical of an α-helix with minima at 208 and 222 nm (Fig. 3e), indicating that the sequence adopts a helical structure in solution, 21 consistent with its design. The addition of 40% (vol/vol) TFE stabilises the helix, as observed for other polyalanine helices, 20 although propagation of helicity throughout the peptide is presumably blocked by the presence of the proline residue towards its C-terminus. 22 In aqueous solution (0% vol/vol TFE), the ellipticity of helix III peptide from Im7H3M3 results in an average helical content of 16%, determined from the magnitude of the signal at 222 nm 21 (see Materials and Methods), that is close to the average predicted helical propensity of this sequence (∼ 14%) ( Table 1). 12,23,24 As shown in Fig. 2a and Table 1, peptides with sequences equivalent to those of helices I, II and IV were predicted to have average helix propensities of ∼ 8%, 2% and 3%, respectively. Each peptide, as measured by TFE titration, underwent a two-state transition into helical structure, as indicated by the presence of an isodichroic point in each titration series at 203 nm (also see Materials and Methods) (Fig. 3a, b and d). These experiments suggested average helicities of the peptides, determined from the magnitude of the signal at 222 nm (see Materials and Methods), that are equivalent to those of helices I, II and IV (14%, 10% and 11%, respectively-values significantly greater than those predicted by AGADIR) ( Table 1). The peptide with the sequence of the natural helix III (Fig. 3c), by contrast, showed very little helical content in the absence of TFE and no change upon addition of TFE, indicating that this small 6-residue sequence requires tertiary structure formation for it to stabilise in helical form in native Im7.

Structure of Im7H3M3 and side-chain interactions
The solution structure of Im7H3M3 was determined by heteronuclear NMR spectroscopy (see Materials and Methods). The results revealed a structure that is remarkably similar to that of wildtype Im7, with helices I, II and IV adopting nativelike conformations, as reflected by their lengths, orientations and side-chain arrangements (Fig. 4). As predicted, helix III has been extended by the polyalanine insert, entirely consistent with the redesign strategy. Also as predicted, the NMR structure indicates that both the C-cap motif and the Glu-Arg salt bridge in helix III are formed; the dihedral angles are consistent with a C-cap, while the average distance between the side-chain N atoms in Arg V and the carboxylate O atom in Glu I is ∼ 3 Å, Fig. 4. Representation of the structure of Im7H3M3, drawn from the coordinates of 2K0D. The structure of wildtype Im7 (1AYI) 10 is shown in light grey for comparison. Helices are coloured as follows: helix I, blue; helix II, red; helix III, purple; helix IV, yellow. The positions of His47 (black) and Trp75 (yellow) in Im7H3M3 are shown, with their equivalents in Im7 shown in grey. This figure was drawn using Chimera. 42 consistent with the presence of a salt bridge. While a full description of the structure and dynamics of the Im7H3M3 variant will be given elsewhere (manuscript in preparation), it is noteworthy at this point to highlight the slightly different orientation of Trp75 in the two structures ( Fig. 4), since this residue forms the fluorophore that is used as a probe for folding and stability in the studies described below. The indole ring remains stacked against the side-chain imidazole of His47 in Im7H3M3, a characteristic feature of immunity proteins that gives rise to the low fluorescence intensity of Trp75 in the native structure. 25 Kinetic analysis of Im7H3M3 folding To determine whether the redesign of the helix III insert has affected the folding mechanism of Im7, we analysed the folding and unfolding kinetics of Im7H3M3 using stopped-flow fluorescence (10°C, pH 7.0 and 0.4 M Na 2 SO 4 ) (see Materials and Methods). The resulting chevron plot (Fig. 5) shows that the sequence differences between Im7H3M3 and wild-type Im7 destabilise the native state slightly , with a corresponding increase (4-fold) in the rate constant of unfolding (k NI ) relative to that of wild-type Im7 (Table 2). Remarkably, the folding branch of the chevron plot of Im7H3M3 is unperturbed compared with that of wild-type Im7, indicating that Im7H3M3 folds via a three-state mechanism involving a hyperfluorescent intermediate (Fig. 5). Quantitative analysis of the data revealed that the stability of the intermediate in Im7H3M3 is increased relative to the stability of the intermediate that is transiently populated in wildtype Im7 (ΔΔG UI o ∼2.4 kJ mol − 1 ) ( Table 2). The Im7H3M3 variant exhibits an m-value for the formation of the intermediate (M UI ; M XY represents the denaturant dependence of the free energy between X and Y) that is approximately 15% larger than that of wild-type Im7 (Table 2), consistent with the increased length of the polypeptide chain and near-identical native structures of the two proteins. Accordingly, the m-value for folding from the unfolded protein to the native state (M UN ) for Im7H3M3 (5.8 ± 0.2 kJ mol − 1 M − 1 ) is increased by 10% relative to that observed for wild-type Im7 (5.3 ± 0.1 kJ mol − 1 M − 1 ). Interestingly, although remaining hyperfluorescent, the fluorescence intensity of the intermediate formed during the folding of Im7H3M3 is reduced compared with the intermediate formed in the folding of wild-type Im7. Further experiments showed that this results, in part, from the substitution of Tyr56 with Ala, which must subtly alter the environment of Trp75 in the intermediate ensemble such that its fluorescence quantum yield is reduced. 15 The small differences observed in the folding behaviour of Im7H3M3 compared with that of wildtype Im7 is surprising given that 9 native residues have been removed and 16 non-native residues have been inserted, including a region predicted by peptide studies to be significantly helical even in the denatured state (Fig. 3e). It has previously been shown that the insertion of only 4 alanine residues into a loop region after helix VIII of T4 lysozyme destabilises its native state by ∼25 kJ mol − 1 , with a similar result being observed for the insertion of 3 alanine residues into a loop region after helix III in the same protein. 26 T4 lysozyme is approximately twice the size of Im7. It is striking, therefore, that the smaller Im7 tolerates the insertion without substantial perturbation of native-state stability (ΔΔG UN o = 2.3 kJ mol − 1 ) ( Table 2). One possible explanation for the tolerance of Im7 for the polyalanine insertion is that the glycine linker allows the native structure to accommodate the extended helix III without causing strain on helix IV. Of particular interest is the stabilisation of the intermediate ensemble in Im7H3M3 compared with the intermediate of wild-type Im7 (ΔΔG UI o = 2.5 kJ mol − 1 ). Work performed on an intermediate ensemble of Im7 trapped at equilibrium by the mutation of Leu53 and Ile54 to alanine 5 shows that this variant displays an increased exposed hydrophobic surface area relative to the native state. Therefore, it is plausible that the polyalanine helix of Im7H3M3 helps protect this hydrophobic surface

ϕ-Value analysis
In order to determine in more detail the effect of the sequence differences between Im7 and Im7H3M3 on the folding mechanism of Im7, we created a number of variants of Im7H3M3 (Table 3). Hydrophobic substitutions were made at residues in the core of the protein that have previously been shown to play a key role in the folding mechanism of Im7, 1,2 so that the formation of tertiary structure contacts during folding could be investigated. In addition, to probe the formation of secondary structure, we made a series of alanine-to-glycine substitutions at solventexposed sites that included residues in all four helices. The folding and unfolding kinetics of all variants were then assessed using stopped-flow fluorescence, and the results were collated as a series of ϕ-values. Altogether, 24 variants that span the squence of Im7H3M3 were studied. Hydrophobic substitutions made in the centre of helix I (V16A and L19A) show that these residues, which are buried in the native state, stabilise the intermediate-state and transition-state ensembles in both wild-type Im7 and Im7H3M3, resulting in ϕ I and ϕ TS values of N0.6 for both Im7HM3 and wild-type Im7 (Fig. 6a and b and Table 4). These side chains, therefore, are buried early in the folding of both wild-type Im7 and Im7H3M3.
When considered as a whole, the variants of Im7H3M3 in helix II that were examined by mutational analysis also showed properties similar to those obtained for Im7 ( Fig. 6c and d). L37A, F41L and V42A show high ϕ I values in Im7H3M3 (1.02 ± 0.0, 1.08± 0.12 and 1.66 ± 0.2, respectively) that are reduced in the transition-state ensemble (0.69 ± 0.00, 0.16± 0.12 and 0.69 ± 0.01, respectively). Despite the similarity in ϕ-values for residues in helices I and II, some subtle structural reorganisation of the intermediate ensemble in Im7H3M3 may have occurred relative to that of Im7, however, as reflected in the increased ϕ-values of V16A, L19A ( Fig. 6a and b) L37A and V42A in Im7H3M3 relative to those in Im7 (Fig. 6c and d and Table 4) and consistent with the altered fluorescence properties of this species (Fig. 5).
Three residues were mutated in helix IV of Im7H3M3. Two residues (A77G and A78G) exhibit ϕ I and ϕ TS values similar to those of the corresponding residues of Im7 ( Fig. 6g and h). Data obtained for the variant V69A at the N-terminus of helix IV indicate that this region of the protein does not make substantial stabilising contacts in the intermediate and transition states of wild-type Im7, as shown by the low ϕ-values of this variant. Destabilisation of native Im7H3M3 by this substitution, however, results in a ΔΔG UN o of b 2.5 kJ mol − 1 , a value too small to allow an accurate ϕ-value to be calculated for this site in the redesigned protein ( Fig. 6g and h). 27 Taken together, the data suggest that helix IV is at least partly formed in the intermediate ensemble and that the insertion of the Helix IV (exposed) 576.9 ± 36.6 (4.7 ± 0.2) 148.9 ± 4.6 (0.2 ± 0.05) 6.7 ± 0.3 (− 0.6 ± 0.0) −14.9 ± 0.2 − 22.3 ± 0.2 A70G Helix IV (exposed) 87.5 ± 12.7 (4.7 ± 0.2) 240.3 ± 11.9 (0.7 ± 0.1) 6.0 ± 0.18 (− 0.8 ± 0.0) −10.5 ± 0.3 − 19.2 ± 0.4 A77G Helix IV (exposed) 140.8 ± 23.1 (4.7 ± 0.2) 132. ± 7.2 (0.9 ± 0.1) 8.8 ± 0.28 (− 0.6 ± 0.0) −11.6 ± 0.4 − 18.0 ± 0.4 A78G Helix IV (exposed) 91.7 ± 14.6 (4.7 ± 0. Mutation of residues in helix III in Im7H3M3 exhibit the largest differences in stability relative to substitutions in this region in wild-type Im7. The ϕ-values for residues mutated in helix III for Im7 suggest that the side chains of Thr51 and Ile54 make few stabilising contacts in the intermediate-state and transition-state ensembles (ϕ I and ϕ TS values for all residues are b0.15; Table 4), despite destabilising the native state by ∼11 kJ mol − 1 for I54V (Table 3 and Fig. 6e and f). However, the Im7H3M3 variant I54V shows a result very different from that of Im7 I54V, insofar as the native state is only marginally destabilised by this substitution (ΔΔG UN o ∼1.5 kJ mol − 1 for I54V in Im7H3M3 compared with 11 kJ mol − 1 for this substitution in wild-type Im7), 1    accurate determination of a ϕ-value, 27 but the different effect of this mutation on the native-state stability of Im7H3M3 and Im7 is clear. Thus, the sequence differences between Im7 and Im7H3M3 appear to have negated the need for the side chain of Ile54 to be present in order to stabilise the native state.
In contrast with the Im7H3M3 I54V variant, the L53A variant of Im7H3M3 responds similarly to mutation as the L53A variant of Im7; 1 in both cases, the truncation of residue 53 destabilises the native state to such an extent that the intermediate ensemble becomes the predominantly populated species at equilibrium. The observed rate constant of folding for the Im7H3M3 variant L53A is significantly faster (450 s − 1 ) than that for Im7H3M3 (158 s − 1 ), suggesting that the observed rate constant for the folding of the Im7H3M3 L53A variant monitors folding to the intermediate ensemble. In agreement with this, the folded state of the Im7H3M3 L53A variant in the absence of denaturant is more fluorescent than the denatured state and unfolds with an equilibrium free energy of 17.4 ± 2.0 kJ mol − 1 , consistent with the ΔG UI o for Im7H3M3 (data not shown). Despite the fact that ϕ I and ϕ TS values could not be determined for the substitutions made in helix III of Im7H3M3, the similarity in the responses of the intermediate ensemble to mutations at residues 53 and 54 suggests that the presence of an extended helix III has not substantially altered the folding mechanism of Im7 in that the newly designed protein still folds to a stable native structure via a three-state transition, involving a hyperfluorescent intermediate with a three-helix core, as has been described for the wild-type protein. 1,2

Discussion
The data presented here indicate that the insertion of a polyalanine helix C-terminal to helix III in Im7 has a negligible effect on the folding mechanism of the protein. The variants designed to probe differences between the folding mechanisms of Im7 and Im7H3M3 exhibit similar ϕ-values for all of the residues examined (Fig. 6), consistent with the insertion of a large and highly helical sequence in place of helix III having little effect on the folding mechanism. Despite the wholesale similarities in ϕ-values, several residues in the natural helix III sequence displayed extensive differences in their response to mutation in the native state of the Im7H3M3 compared with Im7. The most striking difference observed was for the mutation I54V, which destabilises the native state of Im7 by ∼11 kJ mol − 1 yet destabilises Im7H3M3 by only 1.5 kJ mol − 1 . Although surprising, this result is similar to that observed for the T4 lysozyme variant L133A, where the substitution had a large destabilising effect on the native state (ΔΔG UN o ∼20 kJ mol − 1 ) but, when included as part of an 8-residue alanine helix, had a reduced destabilising effect on the native state (ΔΔG UN o ∼10 kJ mol − 1 ). The results observed for the Im7H3M3 variant of Im7 suggest, therefore, that insertion of the polyalanine sequence allows compensation for the removal of the branched side chain of Ile54 such that the native protein is substantially less sensitive to mutation at this site, possibly by structural reorganisation of the core of the native structure upon truncation of the isoleucine side chain. As a consequence in the Im7H3M3 variant, the native state can still be preferentially populated despite truncation of this key residue in the native hydrophobic core.
The goal of this work was to determine whether helix III was the last helix to form during the folding of Im7 because it is the shortest of the four helices and has no significant helical propensity, or whether other features of the protein sequence dictate the formation of the three-helix intermediate en route to the native state. The data presented here indicate that the formation of a three-helix intermediate stabilised by both native and non-native interactions is an integral feature of the folding mechanism of Im7 and that folding via a three-state mechanism does not result from the short length and low helical propensity of the sequence of the native helix III. The results reveal that the helical propensity of helix III is not a key driver for the folding mechanism of Im7 and that formation of the full helix III requires docking onto the appropriate surface of the nonnative three-helix bundle, where side-chain-sidechain interactions stabilise the sequence and allow the formation of helical structure. Thus, the threehelix bundle core can be considered as a folding template for helix III. The results add weight to the growing body of information on the folding mechanism of Im7, showing that folding via a three-helical intermediate is an obligate feature of the folding mechanism of this family of proteins. Indeed, a recent ϕ-value analysis of the folding of Im7, combined with molecular dynamics simulations, has provided insights into why a three-helical intermediate may be a ubiquitous feature of the rugged energy landscape of this family of proteins, in that functional residues located within helices II and III, particularly involving Tyr55, appear to form stabilising non-native contacts in the intermediate ensemble that occlude the binding site for helix III. 2 The data presented here show that even when this region contains a preformed helical structure in the denatured state, helix III cannot compete with the rapid non-native docking of aromatic and hydrophobic residues early in Im7 folding such that a mechanism involving the canonical three-helical intermediate is preserved.

Mutagenesis and protein purification
Mutagenesis was carried out with the QuikChange sitedirected mutagenesis kit (Stratagene) using the Im7 gene in pTrc99A as template. 1 All proteins were expressed as His-tagged versions and were purified to homogeneity, as described previously. 1 Expression and purification of isotopically labelled Im7H3M3 were carried out as described previously for Im7. 6

Im7H3M3 preparation
In order to construct an Im7H3M3 megaprimer, we performed PCR. 28 Megaprimer PCR requires the use of three primers: two are non-mutagenic and contain either the NcoI restriction site or the HindIII restriction site that is found at the 5′ and 3′ ends of the Im7 gene, respectively, and the third is an internal mutagenic primer. Due to large-scale changes, a second extension primer was used in addition to the third internal mutagenic primer. In the first round of megaprimer PCR, the Im7 template DNA was used to produce the necessary extension product for the next round of PCR. The megaprimer was separated from the rest of the products of PCR by agarose gel electrophoresis and purified from the gel using the QIAquick Gel Extraction Kit (Qiagen, UK), in accordance with the manufacturer's instructions. In the second round, the newly extended Im7 template DNA was used in a second PCR to produce a further extended gene. A final, third round of PCR was performed, producing the required mutant gene, with short flanking sequences that contain relevant restriction sites for ligation into the expression vector pTrc99a. The full-length mutant gene was isolated by agarose gel electrophoresis and purified using a QIAquick Gel Extraction Kit. Pure mutated DNA was digested with appropriate restriction endonucleases and ligated into the plasmid pTrc99a. The DNA encoding the mutant gene was sequenced to ensure that it contained the desired change and no other.

Analysis of peptide fragments of Im7 in TFE
Peptides were purchased from Peptide Protein Research (Fareham, UK) as 50% pure samples. Crude lyophilised peptide was purified to N 95% by reverse-phase HPLC using a C18 column (Dionex, USA). A stock peptide solution was prepared by dissolving 1.5 mg ml − 1 of each peptide in buffer (50 mM sodium phosphate, pH 7.0), and its concentration was determined using the extinction coefficient. 29 The sequence used to measure the helical propensity of helix II lacked a chromophore; therefore, Gly-Tyr was added to the C-terminus of the sequence to allow its concentration to be determined. 22 Each peptide stock solution was then diluted 10-fold into a solution containing trifluoroethanol (0-80% volume TFE/total volume). All peptide concentrations were within the range 20-80 μM. 13 In order to assess the effect of concentration on the peptide conformation, we made a second stock solution independently (concentration N200 μM) and also measured it (data not shown). The molar ellipticity of all peptides was shown to be concentration independent over the range 20-200 μM. All peptides were soluble in the buffer used at all TFE concentrations. The samples were incubated overnight at 10°C before analysis by CD (using a 1-mm pathlength cuvette; Hellma, UK). Sixteen scans were collected to improve the signal-to-noise ratio. All CD spectra were corrected for the signal of the buffer alone. Peptides equivalent to helices I, II and IV undergo a twostate transition between less helical forms and fully helical forms, as shown by the presence of an isodichroic point at 203 nm (Fig. 3). Consistent with this, analysis of the intensity of the signals at 190-195, 195-210 and 222 nm as a function of TFE concentration 30 also indicated a twostate transition between coil and helix for these peptides (data not shown).
The observed ellipticity at 222 nm (θ 222 ) in 0% TFE was used to determine the fraction helicity of each peptide using Eq. (1). 31,32 In this analysis, the ellipticity at 222 nm is assumed to be linearly related to the mean helix content of a peptide: where θ C is the ellipticity of a random coil, and θ H is the ellipticity of a fully formed helix. Both parameters are temperature dependent and derived from Eqs. (2) and (3), respectively: where the values 2220 and −44,000 cm 2 deg dmol − 1 peptide bond − 1 are the ellipticities of the random coil and helix at 0°C, respectively. The value − 44,000 cm 2 deg dmol − 1 peptide bond − 1 is the ellipticity of an infinite length helix and, therefore, has a length correction factor included (1-3/N r ), where N r is the number of residues in the peptide chain.

Kinetic data collection and analysis
All kinetic experiments were performed using an Applied Photophysics SX18.MV stopped-flow fluorimeter, as previously described. 1 The temperature was maintained at 10 ± 0.1°C using a Neslab-RTE300 circulating water bath. Fluorescence was excited at 280 nm with a 10-nm bandwidth, and a longpass filter was used to monitor the emission fluorescence at N 320 nm. where U, I and N are the unfolded, intermediate and native states, respectively, and k XY is the rate constant for folding between X and Y. Previous analysis of the folding of Im7 using ultrarapid mixing has shown the folding of the wildtype protein to be entirely consistent with such a model. 2,3 The data for each protein measured were fitted using Igor 5.0 (Wavemetrics). 2 When fitting data to Scheme 1, the formation of the intermediate was assumed to occur rapidly as a preequilibrium step. k UI was fixed at 3000 s − 1 for Im7 and all Im7H3M3 variants. The stability of the intermediate was then determined by allowing k IU to vary. The macroscopic M-value M UI was fixed at 4.7 kJ mol − 1 M − 1 (the value determined for the wild-type Im7H3M3) and assumed to be identical for each variant. The microscopic m-values m IN and m NI (m XY represents the denaturant dependence of the natural logarithm of the rate constant k XY ) were allowed to vary freely.
ϕ-Value analysis ϕ I and ϕ TS for the 24 variants of Im7H3M3 at pH 7.0 and 10°C were determined by an analysis of folding and unfolding kinetics as a function of urea concentration. With only a single exception (L53A), folding was three state, as demonstrated by a kinetic rollover in the logarithm of the folding rate constant versus denaturant concentration at low denaturant concentration and by the presence of an increase in fluorescence signal in the dead time of the stopped-flow experiment (∼2.5 ms). Previous studies have shown that the only mechanism that can describe all the observed folding and unfolding data for Im7 is a three-state mechanism with an on-pathway intermediate. 2,3 The chevron plots for Im7 and each variant, together with their initial and final fluorescence signals, were fitted to Scheme 1. It was assumed that the mutation altered the equilibrium stability of the intermediate and/or the folding/unfolding rate constants, while M UI was assumed to be constant. Analysis of the kinetics of folding/unfolding showed no dependence on protein concentration (0.5-50 μM).
ϕ TS -values were calculated according to: where ΔΔG U-TS WT-VAR is the change in free energy of the transition state, referenced to the unfolded state, and is the difference in free energy between the native state and the unfolded state of the wild-type (WT) and variant (VAR) proteins, respectively.
Similarly, ϕ I was calculated according to: Errors on all parameters were propagated mathematically.

NMR spectroscopy
NMR spectra for the structure determination of Im7H3M3 were collected at 25°C on Varian INOVA 500and 600-MHz spectrometers at the University of East Anglia and on a Varian 900-MHz spectrometer at the Henry Wellcome Building NMR facility (University of Birmingham). A series of three-dimensional spectra [CBCA (CO)NH, HNCACB, HNCO, HCCH total correlated spectroscopy, H(CCO)NH, C(CO)NH, 1 H-1 H-15 N nuclear Overhauser enhancement spectroscopy (NOESY) heteronuclear single quantum coherence (HSQC) and 1 H-1 H-13 C NOESY-HSQC] was collected for complete assignments and nuclear Overhauser enhancement measurement. All NMR data were processed using NMRPipe 33 and analysed using NMR View. 34 The structure was determined with the software package ATNOS-CANDID-CYANA [35][36][37] and refined in explicit solvent with Xplor-NIH 38,39 from nuclear Overhauser enhancements obtained from 1 H-1 H-13 C and 1 H-1 H-15 N NOESY-HSQC spectra. None of the final structures contain violations of dihedral angle restraints larger than 10°or violations of distance restraints greater than 0.5 Å. For the final ensemble of 30 structures, the RMSD calculated over residues 2-94 is 0.68 ± 0.13 and 0.94 ± 0.09 Å for backbone and all heavy atoms, respectively. The Ramachandran plot for the ensemble of 30 structures determined with PROCHECK 40 had 91.2% of residues in core regions, 8.4% of residues in allowed regions and 0.4% of residues in disallowed regions. Summary WHAT IF statistics 41 are shown below. These statistics provide an overall summary of the quality of the structure as compared with current reliable structures. Structure Z-scores show a number of constraint-independent quality indicators. RMS Z-scores mostly give an impression of how well the model conforms to common refinement constraint values. The standard deviation shows variation over models in the ensemble, where appropriate.

Accession numbers
Coordinates for the structure of Im7H3M3 have been deposited in the Protein Data Bank with accession number 2K0D.