Equilibrium folding dynamics of meACP in water, heavy water, and low concentration of urea

Many proteins fold in apparent two-state behavior, as partially folded intermediates only transiently accumulate and easily escape detection. Besides a native form and a mainly unfolded form, we captured a partially unfolded form of an acyl carrier protein from Micromonospora echinospora (meACP) in the folding/unfolding equilibrium using chemical exchange saturation transfer NMR experiments. The C-terminal region of the partially unfolded form is mainly folded and the N-terminal is unfolded. Furthermore, to understand how the folding process of meACP is influenced by solvent environments, we compared the folding dynamics of meACP in D2O, H2O and low concentration of urea. As the environment becomes more denaturing from D2O to H2O and then to urea, the unfolded state becomes increasingly populated, and the folding rate decreases. Adding a small amount of urea, which does not change solvent viscosity, has little effects on the unfolding rates, while changing H2O to D2O reduces the unfolding rates possibly due to the increase of solvent viscosity. The quantified solvent effects on the protein folding Gibbs energy and activation energy suggest that the transition state of folding may have a similar structure to the native state of the protein.

rapidly 26 in D 2 O than in H 2 O, because of stronger D-bond than H-bond in neutral water 27 . Urea is a widely used denaturant although its mechanism of denaturing remains not clearly known. Studies suggest that urea denatures proteins by a "direct interaction" mechanism, as urea has preferential binding to all regions of proteins and can form H-bond with protein backbones 28,29 .
Recently we have found that the acyl carrier protein from Micromonospora echinospora (meACP, PDB code: 2l9f) exists in one major native folded state (N), one minor largely unfolded state (U), and one minor intermediate state (I) under native conditions 2,30 . The three states are in relatively slow conformational exchange on the sub-second timescale, following the pathway N ↔ U ↔ I or U ↔ N ↔ I. Beyond this, the overall structural features of this state I were mostly uncertain. In this study, we deepened our study of meACP folding to explore structural features of state I using chemical exchange saturation transfer (CEST) NMR. We extracted the backbone 13 Ca chemical shifts of meACP in the state I, and demonstrated that this state is a partially unfolded form.
In an effort to understand the folding dynamics thoroughly, the folding equilibria were compared under varied conditions: D 2 13 Ca CEST profiles, showing those residues existed in at least two forms. As demonstrated previously on the basis of chemical shifts, the minor form was mainly unfolded (U) with α-helical propensity 2,31 . Moreover, a third state (I) was clearly seen in their CEST profiles for several residues (Figs. 1 and S1).
For the residues displaying two-dip CEST profiles, we first performed individual fitting using the two-state model. The population values of state U (p u ) extracted from the 13 Ca CEST are shown in Fig. 2. The results from the 15 N CEST are given in Fig. S4. Interestingly, the average p u values obtained by individual fitting with the two-state model for the residues in the C-terminal half region (C-terminal region of helix 2 and helix 3: from V974 to A1010) were smaller than those for the residues in the N-terminal half region (helix 1, loop 1 and N-terminal region of helix 2: from A938 to I972). The t-tests (p-value < 1.4% for all data sets) confirmed that the differences were significant. The population differences between the two regions showed that the folding of meACP was not globally all or none. Is there a partially folded state where the C-terminal region remains folded while the N-terminal region is unfolded?
When the dip for state I overlaps with that for state U (i.e., Ω ≈ Ω U I ) or overlaps with that for state N (Ω ≈ Ω N I ), the CEST profiles of a three-state system look like two-state profiles, but cannot be fitted well to the global two-state model. The bad global fitting of our data by the two-state model indicates the global presence of state I (Fig. S2). In addition, 18 (out of 93) residues displayed three-dip 15 N CEST profiles with clear evidence of a third state I 2 . Moreover, the dip depth for state I in CEST profiles increased with urea concentration (Figs. 1 and S1), and the difference of p u values between the N-and C-terminal regions increased as well (Fig. 2). Taken together, we suggest that the difference of p u between the two regions shown in Fig. 2 comes from the global presence of state I.
For a system with three conformational states, there are four possible three-state exchange models (M2~M5 in Fig. 3). As discussed in our previous paper 2 , the exchange rates between N and U and between N and I were around 250 s −1 and 100 s −1 , respectively, and model M2 was excluded based on the 15 N data. In the on-pathway model (M2 in Fig. 3), the exchange between N and I must be faster than the exchange between N and U, which is contradictory to the result derived from the 15 N data. It is noteworthy that models M3 and M4 are two special cases of M5, where the exchange rate between I and N or the rate between I and U is zero.
For each data set, global fittings based on the two-state model and three-state models were conducted to compare the fitting goodness, and to extract the global exchange rates and populations of different states (shared by all residues) as well as the per-residue chemical shifts of minor states (Ω U only in two-state model; Ω U and Ω I in three-state models), transverse relaxation rates (R 2N ) and longitudinal relaxation rates (R 1N ) of the major state. The reduced x 2 values for global fittings by three possible three-state models M3~M5 were significantly smaller than those for the two-state model M1 and the on-pathway three-state model M2 (see Table S2). The F-tests showed that the global three-state models M3~M5 described the CEST data significantly better than models M1 and M2 (p-value < 10 −4 for all data sets). Indeed, the two-state model failed obviously for a few residues that displayed three dips. Considering the possibility that global minima might not be reached in the global fittings by the three-state models (due to large search space) and the fact that the fitting residuals by M3~M5 were similar, we think that the models M3~M5 matched equally well with the data (Table S2). So we could not determine which model is true for meACP. Fortunately, the 13 Ca chemical shifts of state I (Ω I ) obtained with these three three-state models were almost identical.
As the Ω I values were extracted by minimizing χ 2 , it is likely that Ω I could be positioned incorrectly at Ω U in a few cases when Ω I and Ω U were very similar and χ 2 values were not sensitive to Ω I positions. Nevertheless, Ω I could be determined at high confidence level from the χ 2 distribution for most residues.
Structural features of the two minor states. The secondary chemical shifts (CSs) of each residue in three states were calculated by subtracting the random coil chemical shift (CS) values, which were experimentally measured in the fully denatured sample (meACP in 4 M urea), from the CS values of the corresponding residues which were obtained in the CEST data. The analysis of 13 Ca secondary CSs is a simple method to identify the secondary structures of a protein in different states 32 (Fig. 4).
For state U, the 13 Ca secondary CSs in three helical regions had an average value of around +1 ppm, as compared with state N whose 13 Ca secondary CSs in the helical region were in the range of +2~+3 ppm (Fig. 4), indicating the presence of residual helical structures. The residual helical pattern in state U was the same as that in the native state, implying that the three native helices were not fully unfolded in state U. The 13 Ca secondary CS amplitudes for the helical regions in state U were about one-third of those in state N, suggesting that the three helices in state U are populated for about 30% of the time.
For state I, the 13 Ca secondary CSs of its N-terminal region (helix 1, loop 1, and N-terminal region of helix 2) were close to zero, indicating this region exists in a nearly fully unfolded form. On the other hand, the 13 Ca secondary CSs of the C-terminal region (helix 3 and C-terminal region of helix 2) in state I were mostly similar to those in state N, showing that the C-terminal region of state I is native-like ( Fig. 4) (or the helices 2 and 3 remain mainly folded in state I).
Obviously, the state I is overall a partially unfolded form (PUF). This is consistent with the results shown in Fig. 2. Based on the two-state model, a residue can stay as either a folded or an unfolded form. As the N-terminal  region has a higher unfolded population than the C-terminal region (Fig. 2), some protein molecules must exist as a PUF. Based on 15 N CEST data alone, we previously suggested that state I adopted a locally altered conformation in which the N-terminal region of helix 2 differs from both helical and random coil structures while other helices are native-like 2 . Based on 13 Ca and 15 N data, it is now clear that the state I resembles state N in the C-terminal half, while it resembles state U in the N-terminal half (Fig. 4). It is noteworthy that state I still differs significantly from state N in the N-terminal region of helix 2.
The partially unfolded form (PUF or state I). In one scenario, the folding follows the triangle model (M5 in Fig. 3). It might suggest parallel folding pathways. The terms p I * k IN and p u * k UN can represent the contributions to folding from apparent folding pathways U-I-N and U-N, respectively. ⁎  (Table S3), showing that the pathway U-I-N contributes significantly less than the U-N pathway in the protein folding. Although the triangle model fits the experimental data, it is not necessary for the folding to must be parallel, because a four-state model with state I as an off-pathway product (M6 in Fig. 3) still fits the data. Therefore, the state I might be an "on-pathway" folding intermediate state and it is also possible to be an "off-pathway" intermediate state.
The triangle model can be approximated as the single-pathway model N-U-I (M3) when k IN + k NI is much smaller than k IU + k UI . It can also be approximated as the model U-N-I (M4) when k IN + k NI is much larger than k IU + k UI . For some of our data sets k IN + k NI ≫ k IU + k UI , and for other data sets k IN + k NI ≪ k IU + k UI (Table S3). This is caused by the fact that each set of data could be fitted equally well by M3 and M4 (Table S2). Because the folding can follow the two possible single-pathway models, state I might be an "off-pathway" state of the folding from state U to N. We previously suggested that there might be a dynamic equilibrium between monomeric and oligomeric forms, and the oligomers might result from this state I 2 . The N-terminal region of helix 2 (around S970-L977) in state I is very non-native (Fig. 4), which might be prone to aggregate. The misfolding of this region might cause the failure of the further folding of state I.
If state I 0 is on the folding path from U to I 0 to N, but local misfolding happens for a fraction of this state and blocks its folding to state N due to "the occurrence of an on-pathway optional error" 33 , the misfolded fraction (state I) appears as an off-pathway product ("dead-end") (M6 in Fig. 3). Therefore, the "off-pathway" state I may represent the misfolded population, which is only a fraction of the constructive state I 0 with optional local misfolding errors. This putative "on-pathway" intermediate state I 0 would be similar to the observed state I in overall structure, but it would interconvert with state U so fast that it could not be detected by CEST and relaxation dispersion.
Comparison of folding dynamics in D2O, H2O, and urea. The effects of co-solvent D 2 O and urea on shifting the folding equilibria were examined. The 13 Ca CEST profiles recorded in H 2 O, D 2 O and urea varied mainly in the depths of the minor dips (Fig. 1). The dips for both states U and I were larger in urea and smaller in D 2 O, compared with those in H 2 O (see Fig. 1 and Fig. S1).  Tables S3 and S4). As buffer condition became more denaturing (from D 2 O to H 2 O, to higher concentration of urea), the population of unfolded state (p u ) increased, reflecting the decrease of Gibbs energy of unfolding (ΔG N-U ). We do expect unfolded states (U) to be more populated in urea, less in D 2 O, as D 2 O and urea have been widely known for their stabilizing and destabilizing effects on proteins, respectively. meACP was ~0.5 kcal/mol more stable in D 2 O (Table 1). This is consistent with literature studies (summarized in Table S1). Previous data suggest that the larger the proteins are, the more prominent the stabilizing effect is (Fig. S3). The stability enhanced by D 2 O agrees well with the size of meACP.
The exchange between states N and U can be viewed as the major folding/unfolding process, therefore the conversion rates from U to N (k UN ) and from N to U (k NU ) were denoted as the folding (k f ) and unfolding (k u ) rates, respectively. According to the results in Table 1, Tables S3 and S4, k f (k UN ) decreased as solvent became more denaturing; k u (k NU ) increased when solvent was changed from D 2 O to H 2 O, and there were no significant effects on k u by a low concentration of urea.
There were large uncertainties in the population (p I ) and exchange rates of state I (k X-I and k I-X , which are the exchange rates between I and N and between I and U) because the fitting x 2 values changed little when p I , k X-I , and k I-X were within certain ranges. Therefore, we only estimated p I and the sum of k X-I and k I-X . Despite the large uncertainty in the estimated p I , we observed in the raw profiles that the dips for state I were larger in urea, and

Effects of urea on meACP folding and unfolding.
High concentration of urea is denaturing, while the effects of low concentration of urea on protein stability seem to vary for different proteins and solvent environments. It was shown that low concentration urea stabilizes protein ferrocytochrome c 34 . Here we found that meACP was destabilized, even at the concentration of urea as low as 0.25 M. The populations of both state U and state I increased with urea concentration. According to Eq. 4 (see Materials and Methods section), the effect of urea on k f mainly originates from the change of energy barrier (activation energy ‡ G ), because the changes of viscosity and internal friction at low concentration of urea buffer (0.25 M) are negligible (≈1%) 35,36 . Therefore, the change of activation energy is given by, Using the results in Table 1 The φ f factor, or so-called Tanford's β value, is viewed as an indicator of the average degree of changes in accessible surface areas (ΔASA) of the transition state (TS) relative to that of the unfolded state during the protein folding 37 . φ f , generally ranging from 0 to 1, represents the ratio of TS stability change to native state (N) stability change. A φ f value of 1, meaning that the relative stability change of TS equals to that of N, suggests that the overall ΔASA of TS and that of N are similar during unfolding. Thus, φ ≈ 1 f indicates that the TS may have a native-like structure. According to the results obtained in D 2 O, we found that φ f > 1/2, further supporting that the TS is native-like (see discussion in the supplementary information).
Overall, the results here suggest that the TS could be placed close to state N on the path N ↔ U in the possible models (M3~M6) shown in Fig. 3. In M3, the pathway would be N ↔ TS ↔ U ↔ I when considering the TS; in M4 it would be U ↔ TS ↔ N ↔ I; in M5 it would be N ( ↔ I) ↔ TS ↔ U ( ↔ I); in M6 it would be U ↔ I 0 ( ↔ I) ↔ TS ↔ N, according to their structural features (or overall folded percentages). The TS is suggested to be a folded transition state that is close to the native state, sharing a similar structure with the native state.

Conclusion
We found that there exists a partially unfolded form of meACP, where the N-terminal half is unfolded while the C-end half remains mainly folded. The existence of PUF was confirmed by 15 N and 13 Ca CEST results in different solvents. The detected PUF could be either an "on-pathway" intermediate state in the triangle model or an "off-pathway" state in the other three-state models and four-state model. This PUF might represent a fraction of the constructive folding intermediate, and occurred to be misfolded. The structure features of this PUF might suggest the sequential folding order in which the C-terminal region folds before N-terminal region does. Furthermore, the H/D isotope effect and urea effect on meACP folding dynamics were investigated by comparing CEST results in different solvent conditions. Consistent with literature studies on H/D isotope effects, the native  Table 1).
state of meACP was more stable in D 2 O, folding was accelerated, and unfolding was decelerated compared to that in H 2 O. Low concentration of urea destabilized the native state of the protein and seemed to stabilize state PUF. It also slowed down the folding rates while had no effects on the unfolding rates, suggesting the structural similarity between the transition state and native state of meACP.

Materials and Methods
Sample preparation and NMR spectroscopy. Different acyl carrier protein samples ( Table 2) were prepared by following previous protocols 38 . All NMR samples contained 0.6~1 mM protein, 50 mM NaCl, 5 mM EDTA, and 50 mM phosphate at pH 6.9. 13 Ca CEST and 15 N CEST experiments were performed at 25 °C on a Bruker 800 MHz NMR spectrometer equipped with a cryoprobe. The experimental parameters used were the same as those described previously 2,31 .
CEST data analysis. Individual and global fitting using the two-state model. The residues displaying two dips separated by more than 1 ppm were chosen to extract kinetics parameters. Their CEST profiles were first fitted individually using the two-state model (M1 in Fig. 3) to obtain parameters for each residue, i.e., folding and unfolding rates (k f , k u ), population of the minor state (p u ), chemical shift in the minor state (Ω U ), longitudinal relaxation rates of major (R 1N ) and minor states (R 1U ), transverse relaxation rates of the major (R 2N ) and minor states (R 2U ). In the fitting, the J coupling effect was taken into account as described previously 39 . We assumed that 1 J COCα = 55 Hz and 1 J CαCβ = 35 Hz for all residues in 13 C labelled protein samples; and set R 1U = R 1N for each residue. When R 2U was used as an independent fitting parameter, the extracted R 2U for most residues deviated significantly from the expected values for our system due to the presence of a third state that overlaps with state N or U as well as deviations of 1 J COCα and 1 J CαCβ from the assumed values. To simplify the fitting, we assumed all the residues have the same R 2U values in the unfolded state and set R 2U = 6.5 and 8.1 s −1 for 13 Ca in H 2 O and D 2 O respectively and 3.0 s −1 for 15 N in H 2 O at 25 °C. These values were estimated based on the 15 N R 2 values of intrinsically disordered protein α-synuclein (~3.5 s −1 ), which were measured at 15 °C on an 800 MHz spectrometer 40 . The same assumption was made for three-state fittings. In global fitting, all residues shared a common exchange rate (k ex ) and a common population of the minor state (p u ), but they each had unique R 1N , R 2N , and Ω U values.
Individual fitting and global fitting using three-state models. The procedures for fitting the CEST data of individual residues to three-state models (M2 ~ M5) were the same as that described above. The residues displaying two and three dips were also fitted globally to the three-state models. In the fitting, we set R 2I = R 2N and R 1I = R 1U = R 1N for each residue, and assumed 1 J COCα = 55 Hz and 1 J CαCβ = 35 Hz for all residues. For residues displaying three well separated dips, Ω U and Ω I values were already certainly known. For residues displaying two-dip profiles, Ω U values were quite certain and Ω I should be close to either Ω N or Ω U . Optimization was done by extensive grid-search of Ω I . To extract the Ω I value as accurate as possible, we used the following procedure: a. The Ω I of each residue was first estimated by individual fitting via grid search of Ω I ; b. Ω I values obtained in step a were used as the input values in the global fitting to obtain global exchange rates between states N and U (k 1 ) and between I and U or I and N (k 2 ), populations of states N (p N ), and U (p u ); c. Ω I values were re-calculated by individual fitting with fixed global k 1 , k 2 , p N , and p u values (obtained in step b); d. Repeat steps b and c, until χ 2 of the global fitting decreased to a stable value.
Error estimation. For each CEST profile, the uncertainty (δ) in intensity ratio (I/I 0 ) was estimated by calculating the standard deviation of data points in the 'non-saturation area' . The following Monte Carlo simulations were used to estimate fitting errors of extracted CEST parameters for each residue: a. Generate 120 sets of profiles using extracted parameters, add random noise with a standard deviation of δ and mean of 0; b. The 120 set of profiles were fitted to extract 120 sets of fitting parameters. The standard deviation of each parameter was considered as the fitting error.  To determine global fitting errors, 80% of the residues used in the global fitting were randomly taken to extract global parameters and repeat 120 times to obtain standard deviations.
Energy calculations. According to the population of each state, the Gibbs energy change (ΔG) of process A to B was calculated by where p A and p B are the populations of states A and B, respectively. The subscript AB is short for A → B. The folding rate (k f ) is related to the activation energy ( ‡ G ) by 41,42 where C is the frequency factor for the folding process, σ refers to the "internal friction", and η is the viscosity of solvent. σ reflects the contribution of the energy landscape ruggedness to the reaction rate and is dominated by the structure of a protein when denaturant concentrations are low 36 .
Data Availability. The datasets used in the current study are available in the supplementary information file, and the Matlab scripts for data fitting are available from the corresponding author upon request.