Transition state heterogeneity in GCN 4 coiled coil folding studied by using multisite mutations and crosslinking

We have investigated the folding behavior of dimeric and covalently crosslinked versions of the 33-residue a-helical GCN4-p1 coiled coil derived from the leucine zipper region of the transcriptional activator GCN4. The effects of multisite substitutions indicate that folding occurs along multiple routes with nucleation sites located throughout the protein. The similarity in activation energies of the different routes together with an analysis of intrinsic helical propensities indicate that minimal helix is present before a productive collision of the two chains. However, approximately one-third to one-half of the total helical structure is formed in the postcollision transition state ensemble. For the crosslinked, monomeric version, folding occurs along a single robust pathway. Here, the region nearest the crosslink, with the least helical propensity, is structured in the transition state whereas the region farthest from the tether, with the most propensity, is completely unstructured. Hence, the existence of transition state heterogeneity and the selection of folding routes critically depend on chain topology. The folding of many small globular proteins is kinetically two-state without the accumulation of intermediates. We and others have proposed that such folding reactions are nucleation processes (1–5), and the issue of uniqueness of the transition state (TS) nucleus has become a subject of much debate (6–9). The major method for characterizing TSs is protein engineering or mutational F analysis (10, 11). For some mutations, intermediate effects on refolding rates have been observed (7, 12–17). A crucial question is whether these fractional effects represent a single TS with partially formed structure or a heterogeneous population of TSs, some with the structure fully formed and others with it completely absent. Most (7, 18), but not all (19), folding experiments have been interpreted in the context of a homogeneous TS ensemble and a single dominant folding pathway. The small, but fractional, F values that we measured previously for the GCN4-p1 coiled coil (CC) allowed for the possibility that folding occurred along multiple pathways with nucleation sites located throughout the protein (14). The importance of secondary structure relative to tertiary structure and chain topology in the determination of folding pathways and rates is another unsettled issue. The general insensitivity of refolding rates to helix-destabilizing substitutions in the CC indicated that a large fraction of the helical structure is not formed in the rate-limiting step (14). Based on these and other results with cytochrome c, we proposed that the critical element of the TS is the formation of the overall chain topology, established by pinning the chain by the interaction of a number of apolar side chains, rather than secondary structure formation (3, 5, 14). A strong correlation recently was noted for nearly a dozen proteins between the folding speed and the average sequence distance between residues, called the contact order (20). Although helical structure reduces the contact order of a protein and its stabilization can accelerate folding in some circumstances (15, 21), this correlation can be taken to suggest the importance of topology in the structure of the folding TS. A third central question in protein folding is whether helical structure forms before hydrophobic collapse as envisioned by the diffusion-collision model (19, 22, 23). The likelihood of such a folding mechanism is increased if residual helical structure is present in the denatured state. In fact, isolated regions of GCN4-p1 sequence have been observed to be more than 50% helical (23). Also, the presence of this amount of residual structure in the denatured state could account for the insensitivity of folding rates to Ala-to-Gly substitution, because the free energy gap between the denatured state and the TS would be unchanged if helix is present in both states

The folding of many small globular proteins is kinetically two-state without the accumulation of intermediates.We and others have proposed that such folding reactions are nucleation processes (1)(2)(3)(4)(5), and the issue of uniqueness of the transition state (TS) nucleus has become a subject of much debate (6)(7)(8)(9).The major method for characterizing TSs is protein engineering or mutational ⌽ analysis (10,11).For some mutations, intermediate effects on refolding rates have been observed (7,(12)(13)(14)(15)(16)(17).A crucial question is whether these fractional effects represent a single TS with partially formed structure or a heterogeneous population of TSs, some with the structure fully formed and others with it completely absent.Most (7,18), but not all (19), folding experiments have been interpreted in the context of a homogeneous TS ensemble and a single dominant folding pathway.The small, but fractional, ⌽ values that we measured previously for the GCN4-p1 coiled coil (CC) allowed for the possibility that folding occurred along multiple pathways with nucleation sites located throughout the protein (14).
The importance of secondary structure relative to tertiary structure and chain topology in the determination of folding pathways and rates is another unsettled issue.The general insensitivity of refolding rates to helix-destabilizing substitutions in the CC indicated that a large fraction of the helical structure is not formed in the rate-limiting step (14).Based on these and other results with cytochrome c, we proposed that the critical element of the TS is the formation of the overall chain topology, established by pinning the chain by the inter-action of a number of apolar side chains, rather than secondary structure formation (3,5,14).A strong correlation recently was noted for nearly a dozen proteins between the folding speed and the average sequence distance between residues, called the contact order (20).Although helical structure reduces the contact order of a protein and its stabilization can accelerate folding in some circumstances (15,21), this correlation can be taken to suggest the importance of topology in the structure of the folding TS.
A third central question in protein folding is whether helical structure forms before hydrophobic collapse as envisioned by the diffusion-collision model (19,22,23).The likelihood of such a folding mechanism is increased if residual helical structure is present in the denatured state.In fact, isolated regions of GCN4-p1 sequence have been observed to be more than 50% helical (23).Also, the presence of this amount of residual structure in the denatured state could account for the insensitivity of folding rates to Ala-to-Gly substitution, because the free energy gap between the denatured state and the TS would be unchanged if helix is present in both states (24,25).
To investigate the issues of TS heterogeneity, the role of topology, residual structure, and precollision helical structure, we have studied the folding of single-site and multisite mutants as well as a crosslinked derivative of the CC where the two helices have been covalently linked with an unstructured tether.We find that the dimeric version folds via multiple routes whereas the crosslinked version folds along a single robust pathway.The mutational data further indicate that minimal helix is formed before a productive collision but up to half of the molecule is helical in the folding TS.Although this helical region is predominantly near the carboxyl terminus in the dimeric version, the presence of the amino tether in the crosslinked version induces nucleation to occur exclusively at the amino terminus in spite of this region's very low helical propensity.The switch in locality of the nucleus and loss of pathway heterogeneity emphasize the importance of topology in the determination of folding behavior even in the context of this elemental helical protein.

MATERIALS AND METHODS
Peptide Synthesis.Peptides were prepared and characterized as described (26).GCN4-p1Ј (Ac-RMKQLEDKVEELL- SKNWHLENEVARLKKLVGER-NH 2 ) included a tryptophan in place of the tyrosine at position 17.GCN4-p2Ј contained an additional Cys-Gly-Gly tripeptide at the amino terminus but residue count begins at the Arg for direct comparison to GCN4-p1Ј.Crosslinked GCN4-p2Ј was formed by bubbling oxygen under native conditions for 2 hr at neutral pH.Dimeric GCN4-p2Ј was formed by reduction with 10-fold molar excess of Tris-(2-carboxyethyl)-phosphine hydrochloride.Peptide concentrations were determined by using an extinction coefficient of 5,700 M Ϫ1 ⅐cm Ϫ1 at 280 nm.
Equilibrium Measurements.Equilibrium free energies were determined from guanidinium chloride (GdmCl) denaturation profiles monitoring tryptophan fluorescence and CD at 222 nm (27).CD measurements were conducted at 1-to 2-nm resolution with a pathlength of 0.1 cm.Peptide concentrations ranged from 2 to 42 M, and experiments were carried out in 20 mM sodium acetate, 150-200 mM sodium chloride at pH 5.5, 10°C.

RESULTS AND DISCUSSION
Two-State Folding Behavior.To investigate the role of topology and the generality of our previous work with the dimeric GCN4-p1Ј CC, we studied a modified version that folds either as a crosslinked monomer or as a dimer.This modified species contains an amino-terminal Cys-Gly-Gly linker that forms a disulfide bond between the Cys residues under oxidizing conditions.Under reducing conditions, the disulfide crosslink is not formed and the folding behavior is the same as for the tetherless version (27).
Stopped-flow fluorescence spectroscopy was used to measure the folding kinetics of the tryptophan containing GCN4-p2Ј molecule (Fig. 1).Crosslinked GCN4-p2Ј exhibits firstorder behavior consistent with the unfolded 7 native unimo-lecular nature of this folding reaction.Dimeric GCN4-p2Ј exhibited second-order folding and first-order unfolding behavior consistent with the 2(monomer) 7 dimer bimolecular nature of this reaction.The folding rates of the dimeric and crosslinked versions are 2 ϫ 10 6 M Ϫ1 ⅐s Ϫ1 and 7.5 ϫ 10 3 s Ϫ1 , respectively (extrapolated to zero denaturant).These rates equate to an effective chain concentration of 4 mM for the crosslinked version.
Current and previous kinetic studies demonstrate that the folding of both versions obeys a thermodynamically and kinetically two-state transition between an unfolded state and a fully helical native state (14,27,28).This demonstration is accomplished by using the chevron analysis with a linear dependence of the equilibrium and activation free energies for folding ( f ) and unfolding (u) on GdmCl concentration (10) where Eqs.1c and 1d apply to the unfolding of the dimeric and crosslinked versions, respectively.When folding is effectively two-state, the equilibrium values for the change in free energy and surface burial can be calculated from kinetic measurements according to: The equivalence of thermodynamically and kinetically determined values for ⌬G 0 H 2 O and m 0 demonstrates the applicability of a two-state model for GCN4-p2Ј folding (see supplemental Tables 2 and 3, which are available on the PNAS web site, www.pnas.org).
Single-Site Mutations.The two-state folding behavior provides a simple framework in which to interpret the kinetic FIG. 1. Chevron plot of folding kinetics of dimeric and crosslinked GCN4-p2Ј in 20-100 mM sodium acetate, 150 mM sodium chloride, pH 5.5, 10°C.Symbols are the same in the left and right panels.The measured bimolecular folding rates for the dimeric CC have been scaled to 5.5 M protein concentration.The solid line in lower left represents the predicted folding rates for the GGG mutant determined from the difference in equilibrium stability and activation energy for unfolding according to ⌬G ‡ f ϭ ⌬G ‡ u ϩ ⌬G 0 .To measure the unfolding rates of the marginally stable dimeric GGG version, 5% (vol͞vol) of 2,2,2-trifluoroethanol was added.Previous work demonstrated that 2,2,2-trifluoroethanol does not affect unfolding rates (34).
Proc.Natl.Acad.Sci.USA 96 (1999) effects of mutations and characterize the TS of the folding reactions.In our previous studies of the dimeric version, destabilizing Gly substitutions were made at external surface positions (14).The relatively small change in folding rates compared with the unfolding rates was taken as evidence that no large fraction of helix is present in the rate-limiting TS.The effect of each mutation was quantified by the parameter ⌽ f , given by the change in folding activation free energy, ⌬⌬G ‡ f , divided by the change in global stability, ⌬⌬G 0 .A ⌽ f value is the degree to which the total energetic effect of the substitution is realized in the TS.Generally, a value of zero is considered to mean that the mutated residue is unstructured in the TS whereas a value of one is consistent with this residue sensing a native-like environment.
Mutations were designed to compare Ala-to-Gly substitutions because these residues have a large difference in helix propensity.In the dimeric CC, Ala-to-Gly substitutions at the seventh, 14th, and 24th positions had only a small effect on folding rates, with measured ⌽ f values of 0.07 Ϯ 0.03, 0.16 Ϯ 0.04, and 0.25 Ϯ 0.01, respectively (14).The corresponding single-site substitutions in the crosslinked version are quite different (Fig. 2A, Table 1) with ⌽ f values of 0.74 Ϯ 0.02, 0.54 Ϯ 0.05, and 0.00 Ϯ 0.02, respectively.The trend in the ⌽ f values has shifted from a slight bias toward the carboxyl terminus in the dimeric CC to a strong bias and high ⌽ f values at the amino terminus, the location of the tether, in the crosslinked version.
TS Heterogeneity.The high ⌽ f values and the spatially localized nucleus in the crosslinked species led us to reexamine the results for the dimeric molecule.Although the ⌽ f values are small in the dimeric system, they are not zero.These marginal effects indicate that the mutated residues are either partially constrained in the TS, or fully constrained in a fraction of a heterogeneous population of TSs.A subcategory of the latter possibility pictures helix nucleation at multiple alternative positions along the chain (Fig. 2B) (14).If a destabilizing mutation is present at a given position, another region then might serve to nucleate helix formation and folding rates would be only marginally retarded.
We examined the possibility of TS heterogeneity by simultaneously probing the helicity at multiple sites by using variants with triple-site substitutions (Fig. 1).The first variant, AAA, contains alanines in the seventh, 14th, and 24th positions (six positions total).The second variant, GGG, contains glycines at these positions.The third variant, TTT, contains threonines.
The AAA-to-GGG substitution results in a change in equilibrium stability of 5-6 kcal͞mol and nearly identical ⌽ f AAA/ GGG values of 0.46 Ϯ 0.02 and 0.40 Ϯ 0.02 for the dimeric and crosslinked versions, respectively.The AAA-to-TTT substitution results in a change in stability of about 2-3 kcal͞mol and ⌽ f AAA/TTT values of 0.51 Ϯ 0.03 and 0.57 Ϯ 0.02, respectively.
For the Ala-to-Gly substitutions, where the decrease in helical propensity results from a increase in backbone entropy in the unfolded state, ⌽ f values are sensitive to the decrease in chain entropy in the TS rather than to helix formation per se.For the Ala-to-Thr substitutions, where the decrease in propensity results from the loss of side-chain entropy in the helical conformation (29), ⌽ f values more specifically reflect helix formation in the TS.The sizable triple-site ⌽ f AAA/TTT values indicate that the TS has helical structure rather than merely a partially constrained backbone.
Homogeneous and Heterogeneous Nuclei.For the crosslinked CC, the change in the kinetic behavior caused by the triple Ala-to-Gly substitution reflects the independent and additive effects caused by each of the single-site substitutions.A composite triple-site ⌽ f AAA/GGG value can be predicted from the effects of the single-site mutations according to The predicted composite ⌽ f AAA/GGG value of 0.34 is very close to the observed value of 0.40 (using the values for measured S14A substitution in place of the unmeasured A14G substitution).This additivity, and also the lack of significant change in the kinetic m f value, indicate that a single robust folding pathway exists for the crosslinked CC.
A different result is found for the dimeric system.The observed ⌽ f AAA/GGG value of 0.46 is much larger than 0.18, the composite value predicted from the single-site ⌽ f values  Calculated according to ⌽f ϭ Ϫ⌬⌬G ‡ f͞(⌬⌬G ‡ u Ϫ ⌬⌬G ‡ f).The ⌬⌬G ‡ f and ⌬⌬G ‡ u values are calculated at 0.9 and 3.5 M GdmCl, respectively, for the dimeric version and at the concentration noted in parentheses for the crosslinked version, to reduce extrapolation errors and sensitivity to slight differences in m values.
Proc.Natl.Acad.Sci.USA 96 (1999) 10701 assuming their effects are independent and additive.In fact, the ⌽ f values for both the triple-site substitutions are larger than any single-site value.When a single-site mutation is made that disrupts folding through one region, the bulk of the folding events proceeds through other regions and only a minimal decrease in folding rates is observed.In the case of the triple-site substitutions, however, a large decrease in the folding rates is observed because folding must proceed through at least one of the destabilized regions.These results point to heterogeneous nucleation in the dimeric CC, with multiple, alternative nucleation sites.The multisite hypothesis was explicitly tested with an additional triple-site variant, D7A͞S14A͞A24G.This AAG variant has a destabilizing glycine mutation near the polypeptide's carboxyl terminus in the region most likely to be nucleated (highest single-site ⌽ f value).We hypothesized that the destabilizing substitution would block nucleation at this site in the dimeric CC and shift nucleation toward the amino regions, which then should exhibit heightened sensitivity to destabilization, just as in the crosslinked CC.The triple-site variant contained alanines at both the seventh and 14th positions to ensure that these two positions are probed.
In the background of the A24G substitution in the dimeric CC, the double-site Ala-to-Gly comparison at the seventh and 14th positions yields a high ⌽ f AAG/GGG value, 0.72 Ϯ 0.02, indicating that these two regions become structured in a large fraction of the TSs.The same comparison in the crosslinked system, where nucleation occurs only at the amino tethered end, yields the same double-site ⌽ f AAG/GGG value, 0.73 Ϯ 0.02.Hence, the A24G substitution in the dimeric CC shifts the nucleation site from one end of the molecule to the other, just as the tether does in the crosslinked CC.
These results demonstrate that multiple nucleation sites do exist and that the relative importance of each one is subject to manipulation.The existence of multiple nucleation sites in the dimeric CC is presumably a consequence of the system's translational symmetry and the length of the helices.Nucleation can occur at essentially any position within the 10 turns of helix.Upon introduction of an unstructured crosslink, the translational symmetry of the CC is broken.A large difference in effective local concentration then results in a relatively homogeneous TS ensemble with a strong bias toward nucleation near the tethered end.
Our results are quantitatively consistent with this view (Fig. 2B).In the simple scenario where a particular region is structured in the TS for 50% of the folding routes, a 10-fold destabilization of this region will increase the activation energy of these routes by 1.3 kcal͞mol.This will largely block folding along these pathways and decrease the net folding rate by nearly a factor of two (⌬⌬G ‡ f ϭ 0.33 kcal͞mol), leading to a ⌽ f value of 0.26.This reasoning suggests that folding of the dimeric CC can be approximated by three independent routes with nucleation occurring at the amino terminus, the center, and the carboxyl terminus with relative probabilities of 1͞6, 1͞3, and 1͞2, respectively.These values for the relative fluxes equate to ⌽ f values of 0.08, 0.12, and 0.16, for the seventh, 14th, and 24th positions, respectively, similar to the experimental values.Recently, Matthews and coworkers (30) also concluded that the TS is approximately 30% native-like with the two carboxyl-terminal heptads being the likely nucleation site.
Residual Structure and Marginal ⌽ f Values.If residual helical structure is present in both the denatured state and the TS, then a different explanation is possible for the marginal ⌽ f values observed in the dimeric CC (25).In this case, the activation energy for folding would be unchanged for any given surface substitution and only a negligible change in folding rates would be observed.Another consequence of residual structure would be a negligible change in equilibrium stability because helix is present in both the unfolded and native states.For helix altering mutations in the protein CheY, both of these unusual behaviors have been observed (24).Also, there is an accompanying ''rollover'' region in the chevron plot at low denaturant concentrations, consistent with decreased surface burial in the folding TS.
For a peptide containing residues 16-32 of GCN4-p1, a considerable amount of ellipticity, ⌰ 222 Ϸ Ϫ15,000 deg⅐cm 2 ͞ dmol, was observed under extremely low ionic conditions at pH 7, 3°C (23).Further, the AGADIR program (31) predicts that in isolated monomers, the region encompassing residues 22-30 should have helical structure 60-70% of the time (Fig. 3A).Hence, residual structure may be present in the denatured state and could be responsible for the marginal ⌽ f values that we observe.
A variety of other results also indicate that under the conditions used in our study, only minor amounts of helical FIG. 3. Residual structure in the denatured state.(A) Predicted helicity for monomers of GCN4-p1Ј and the A24G variant at pH 5.5, 10°C, 0.2 M ionic strength, and for the related leucine zipper with a strengthened hydrophobic core (32) at pH 4.8, 25°C, 0.1 M ionic strength, calculated by using AGADIR (31) Proc.Natl.Acad.Sci.USA 96 (1999) structure exist in the denatured state.The GCN4-p2Ј chevron plots do not exhibit rollover behavior.All of the single-site Ala-to-Gly substitutions (two total per CC) result in destabilization of at least 1.2 kcal͞mol, the appropriate ⌬⌬G o for such mutations.Millisecond (burst phase) stopped-flow CD folding studies of the entire CC indicate that the ellipticity of the denatured state is smaller than Ϫ3,000 deg⅐cm 2 ͞dmol at 0.5 M GdmCl (28).These results demonstrate that little residual helical structure exists in the denatured state under our refolding conditions and cannot provide an explanation for the fractional single-site ⌽ f values.Precollision Helix Formation.Although minimal residual structure exists in the denatured state, transient helix formation still may be required for a productive collision in the folding of the dimeric CC (25).In diffusion-collision models (22), the folding rate is the probability of two monomers being helical multiplied by the success frequency and the diffusionlimited encounter rate, Ϸ10 9 M Ϫ1 ⅐s Ϫ1 .The AGADIR program (31) predicts 60-70% helix for residues 22-30 in the absence of denaturant.However, all other residues are predicted to be much less helical (Fig. 3A), and the probability of a helical stretch with 11 or more residues is only 4%.This analysis sets an upper limit of 11 residues as the maximum length of precollision helical structure consistent with the dimeric folding rate, 2 ϫ 10 6 M Ϫ1 ⅐s Ϫ1 (extrapolated to 0 M denaturant and assuming all collisions are productive).
However, the mutational results are inconsistent with this amount of precollision helical structure.According to diffusion-collision models, the majority of productive folding events in the CC should involve the most helical region, residues 22-30, and the effect of a mutation in this region on folding rates should be determined by the change in the region's helicity.The A24G substitution reduces helicity in this region from 69% to 29% (Fig. 3A).The folding rate then should decrease by a factor of 5.7 (⌬⌬G ‡ f ϭ 0.97 kcal͞mol).Likewise, the glycine substitution reduces the stability of the dimeric CC by 1.9 kcal͞mol (2 RT ln [K eq A24 ͞K eq A24G ]).This analysis predicts a ⌽ f A24G value of 0.51, double the observed value of 0.25.Hence, diffusion-collision models using the high AGADIR- calculated levels of helix incorrectly predict the observed ⌽ f A24G value.
Under our experimental conditions where the amount of residual structure is minimal, an even stronger argument can be made against the requirement of precollision helical structure.When K eq Ͻ Ͻ1, the occurrence of precollision helical structure for a destabilizing mutation decreases as the ratio K eq mutant ͞K eq WT .The folding rate and the stability both decrease as the square of this ratio.Therefore, the ⌽ f value for the A24G substitution should be near unity according to diffusion-collision models.Because observed value is much less, either the precollision helix formation occurs in regions having much less intrinsic helicity, or more probably, the helical structure present in the TS forms after the initial collision.
The present analysis differs from the recent analysis by Myers and Oas (25), who proposed that precollision helix formation is consistent with the observed rate and single-site mutational data for the dimeric CC (25).Although they also used AGADIR to calculate the average helicity for the GCN4-p1 sequence, their subsequent analysis assumed a uniform helical propensity.This analysis spreads the helicity more uniformly than that predicted by AGADIR (Fig. 3A), which results in an overestimation of the probability of encountering long helical stretches in isolated monomers.More importantly, it also underestimates the effects of mutation in the region having the most intrinsic helicity (e.g., A24G), particularly under the present conditions where there is minimal residual structure.
Finally, a modified version of the CC with a strengthened hydrophobic core (32) has a maximum helicity 10-fold lower than GCN4-p1Ј (Fig. 3A).This variant folds about 20-fold faster than GCN4-p1Ј, rather than 10 2 -fold slower as would be predicted by a diffusion-collision model.Further, the near diffusion-limited folding rate found for this CC (3 ϫ 10 8 M Ϫ1 ⅐s Ϫ1 ) and for a hydrophobic variant of Arc repressor (2 ϫ 10 8 M Ϫ1 ⅐s Ϫ1 ) (33) indicate that the major fraction of chain encounters are productive so that folding is too fast to be contingent on transient preformed structure.
Secondary Structure, Folding Rates, and Topology.Intrinsic helical propensity does not dominate the choice of folding routes for either CC.For the crosslinked molecule, the region with the highest intrinsic helicity is completely unstructured in the TS.Effective chain concentration, determined by chain connectivity, governs which pathway is selected and folding begins from the least helical region.For the dimeric CC, the region between residues 22 and 30 has the highest intrinsic helicity even with the A24G substitution (Fig. 3A), but less helical regions can comprise the majority of the nucleation sites (⌽ f AAG/GGG ϭ 0.72).
Whether the stabilization of local interactions increases folding rates depends on whether a particular element is structured in the TS.For example, the 24th position is unstructured in the TS of the crosslinked CC, and the Ϸ3 kcal͞mol destabilization for the glycine substitution has no effect on folding rates.Furthermore, the version of the dimeric CC with a strengthened hydrophobic core but having only 20% of the intrinsic helicity of GCN4-p1 (Fig. 3A) folds about 20 times faster (32).Evidently, hydrophobicity plays a more important role than intrinsic helicity in the determination of folding rates for this version of the CC.
Even with its simple topology, the folding rate of the crosslinked CC and its contact order of 10% agree well with the correlation between folding speed and the average sequence distance between contacts noted for other proteins (20).Although the topology of the CC should be reasonably well defined once the tether bends and residues in the amino region contact each, the TS has additional requirements.More than a single turn of helix must be formed and about 50% of the denaturant-sensitive surface must be buried (m f ͞m 0 ) before additional folding steps can proceed in a thermodynamically downhill manner.The helix-stabilizing cosolvent 2,2,2trifluoroethanol interacts to nearly the same degree with the TS as with folded state in both species (⌽ f solvent Ϸ 1; ref. 34 and unpublished data), indicating that a high degree of backbone desolvation occurs in the TS.A quantitative connection between folding rates, topology, secondary structure formation, and surface burial remains to be determined.
Recently, ␤-turns have been postulated to be folding initiation sites because they have some intrinsic stability and are the only structures completely formed in the TS of three proteins (7,35,36).Although the tether is unstructured in the crosslinked CC, it increases the local chain concentration and serves the same purpose as a ␤-turn.Hence, initiation sites need not have any intrinsic stability, and folding can begin from the least helical region of the molecule.These considerations bear on the considerable effort being directed at identifying relatively stable regions of proteins as possible folding initiation sites.
Conclusions.A heterogeneous TS ensemble with multiple nucleation sites located throughout the molecule can explain the minimal effect of helix-destabilizing substitutions on folding rates of the dimeric CC.However, this pathway heterogeneity critically depends on connectivity and is lost on the introduction of an unstructured tether between the two helices.
We have seen that helix formation is unlikely to occur before a productive collision.Yet, one-third to one-half of the molecule becomes helical in the postcollision TS.Although secondary structure stability can influence the selection of pathways in the dimeric CC, the major fraction of nucleation sites need not occur in the most helical region of the molecule.For Biophysics: Moran et al.
Proc.Natl.Acad.Sci.USA 96 (1999) 10703 the crosslinked CC, in fact, the region having the highest helicity is the last to fold.We find it remarkable that such unexpected and complex behavior can be generated from such a simple system.

FIG. 2 .
FIG. 2. (A) Single-site ⌽f values for the dimeric (hatched bars) and the crosslinked CC (open bars).The slight bias in ⌽f values toward the carboxyl terminus in the dimeric version is shifted to a strong bias toward the amino, tethered end of the crosslinked version.(B) Simplified folding models.The dimeric CC folds with a heterogeneous TS ensemble with multiple nucleation sites whereas the crosslinked version folds along a single route.The width of arrows represents the approximate flux down each route and reflects the magnitude of the single-site ⌽f values.