Sequence-Dependent Correlated Segments in the Intrinsically Disordered Region of ChiZ

How sequences of intrinsically disordered proteins (IDPs) code for their conformational dynamics is poorly understood. Here, we combined NMR spectroscopy, small-angle X-ray scattering (SAXS), and molecular dynamics (MD) simulations to characterize the conformations and dynamics of ChiZ1-64. MD simulations, first validated by SAXS and secondary chemical shift data, found scant α-helices or β-strands but a considerable propensity for polyproline II (PPII) torsion angles. Importantly, several blocks of residues (e.g., 11–29) emerge as “correlated segments”, identified by their frequent formation of PPII stretches, salt bridges, cation-π interactions, and sidechain-backbone hydrogen bonds. NMR relaxation experiments showed non-uniform transverse relaxation rates (R2s) and nuclear Overhauser enhancements (NOEs) along the sequence (e.g., high R2s and NOEs for residues 11–14 and 23–28). MD simulations further revealed that the extent of segmental correlation is sequence-dependent; segments where internal interactions are more prevalent manifest elevated “collective” motions on the 5–10 ns timescale and suppressed local motions on the sub-ns timescale. Amide proton exchange rates provides corroboration, with residues in the most correlated segment exhibiting the highest protection factors. We propose the correlated segment as a defining feature for the conformations and dynamics of IDPs.


Introduction
Intrinsically disordered proteins (IDPs) and proteins containing intrinsically disordered regions (IDRs) comprise up to 40% of the proteomes in all life forms [1]. They are involved in numerous cellular functions, including regulation and signaling [2,3]. As such, the dysregulation, misfolding, and aggregation of IDPs can lead to many diseases [4,5]. While lacking defined tertiary structures, IDPs can exhibit conformational preferences, such as transient secondary structures and recurrent residue-residue contacts (e.g., salt bridges and cation-π interactions) [6,7]. When binding to partners, transient secondary structures may become stable [8][9][10], and residue-residue contacts may switch from intramolecular to intermolecular [11]. Conformational dynamics may also play a particularly important role in the competition of IDPs for binding to the same partner [12] and in the binding for denatured lysozyme by Schwalbe and co-workers. Although these authors also reported increases in R2 by tertiary interactions, leading to the apparent sequence dependence of R2, the latter have not received much attention in studies on IDPs. A notable exception is a recent study of the lowcomplexity domain of heterogenous nuclear ribonucleoprotein A1 (A1-LCD), where the regions of increased R2s were attributed to π-π interactions between aromatic residues [56].
The transmembrane protein ChiZ is one of a dozen or so proteins that comprise the Mtb divisome, the machinery responsible for cell division. Mtb is the causative agent of tuberculosis; its cell division has strong implications for both pathogenesis and drug resistance [57]. The structural determination of divisome membrane proteins and their complexes has begun [58], but sequence analysis suggests that many of these proteins, including ChiZ, CrgA, FtsQ, FtsI, and CwsA, have disordered extramembranous regions of various lengths. ChiZ consists of 165 residues ( Figure 1a); the cytoplasmic N-terminal 64 residues (ChiZ1-64) are predicted to be disordered (Figure 1b,c), and the next 22 residues form a transmembrane helix; on the periplasmic side, a disordered linker connects the transmembrane helix to a 53-residue LysM domain that binds to peptidoglycans [59]. The exact role of ChiZ in cell division is still an open question. Its full name, cell wall hydrolase interfering with FtsZ ring assembly (gene Rv2719c), may have been a misnomer, as a recent study showed that zymogram assays suggesting cell wall hydrolase activity by Chauhan et al. [59] likely yielded a false positive [60]. On the other hand, the interference with FtsZ ring assembly remains intact. The polymerization of FtsZ (a bacterial homolog of tubulin), forming the FtsZ ring, initiates the septation step of cell division; thus, the correct localization of the FtsZ ring is crucial for proper division [57]. With the increased expression of ChiZ, Mtb cells grown in macrophages were filamentous; promotion of filamentation by ChiZ overexpression in M. smegmatis (a non-pathogenic mycobacterium) affected the mid-cell location of FtsZ rings [59,61]. Importantly, the disordered Nterminal region and transmembrane helix sufficed for cell filamentation and FtsZ ring mislocalization [62]. Bacterial adenylate cyclase-based two-hybrid (BATCH) assays indicated that ChiZ interacts with FtsQ and FtsI but not FtsZ, implicating an indirect mechanism for FtsZ-ring mislocalization [62].  Here, we combined SAXS, NMR, and MD simulations to thoroughly investigate the conformational and dynamic properties of ChiZ1-64. Based on benchmarking against the SAXS profile and secondary chemical shifts, we selected the AMBER14SB/TIP4PD force field among the five tested. Experimental R 2 rates were non-uniform along the sequence, which was recapitulated by MD simulations. The sequence-dependent dynamics can be attributed to the formation of correlated segments, stabilized by polyproline II (PPII) conformation and intra-segmental interactions. In particular, the residues with the largest amplitudes for motions on the slowest timescale (approximately 10 ns), including Asp11, Trp24, Arg25, Arg26, and Tyr47, frequently engaged, with different partners, in salt bridges and cation-π interactions. The linkage of conformation and dynamics to sequence, captured by the formation of correlated segments, will be useful for understanding IDPs and their interactions with partners.

Protein Expression and Purification
The expression of ChiZ1-64 containing an N-terminal His-tag with a TEV protease cleavage site was performed in E. coli BL21 cells. Cells were grown at 37 • C until the O.D. at 600 nm was 0.7, and then 0.4 mM of IPTG was added to induce expression for 5 h at 37 • C. For 13 C-15 N uniformly labeled samples, protein expression was performed in M9 media containing 1 g of 15 N ammonium chloride and 2 g of 13 C-labeled glucose.
The ChiZ1-64 was initially purified using nickel affinity chromatography. The cells were resuspended in a lysis buffer (20 mM of tris-HCl pH 8.0 containing 500 mM of NaCl and 6 M of urea) and lysed by a French Press. The lysate was centrifuged at 12,000× g for 40 min to remove insoluble material. After that, the lysate was loaded onto a Ni-NTA resin column (Qiagen). The column was washed with a washing buffer (20 mM of tris-HCl pH 8.0 containing 500 mM of NaCl and 60 mM of imidazole) and then eluted with 400 mM of imidazole.
After nickel affinity chromatography, the fractions containing the protein were pooled and treated with TEV protease. The His-tag and TEV protease were removed by passing the protein sample through a Ni-NTA column. Further purification proceeded by cation exchange chromatography. ChiZ1-64 was dialyzed against the NMR buffer (20 mM of sodium phosphate at pH 7.0 plus 25 mM of NaCl) and then loaded into an SP column (GE Healthcare). The column was washed with the NMR buffer containing 200 mM of NaCl and eluted with 500 mM of NaCl. The fractions containing the protein were concentrated and dialyzed against the NMR buffer for further experiments.

Small Angle X-ray Scattering
SAXS experiments were performed on the DND-CAT 5ID-D beam line at the Advanced Photon Source of the Argonne National Laboratory. The X-ray wavelength was 1.2398 Å. The X-ray scattering intensities were collected using Rayonix LX170HS CCD detectors positioned at 200.92 mm (0.0014 < q < 0.08 Å −1 ) and 1014.2 mm (0.077 < q < 0.485 Å −1 ). ChiZ1-64 was at 9.1 mg/mL in the NMR buffer. The X-ray exposure time was limited to 5 s to minimize protein degradation. Data processing was performed using the ATSAS software package [63].

NMR Spectroscopy
Samples for the solution NMR experiments were prepared in the NMR buffer containing 10% D 2 O and 50 µM of DSS (2,2-dimethyl-2-silapentane-5-sulfonic acid, for referencing). All the experiments were performed at 25 • C in an 800 MHz magnet equipped with cryoprobe. A sequential backbone assignment was performed using standard HNCO, HN(Cα)CO, HNCαCβ, and CαCβ(CO)NH experiments. The data were processed using Topspin 2.1 (Bruker), and analyzed using the CCPNmr software.
The backbone dynamics were characterized by measuring the amide 15 N R 1 and R 2 relaxation rates. The R 1 measurements were performed using the Bruker pulse sequence hsqct1etf3gpsi with the Biomolecules 2020, 10, 946 5 of 23 following time delays: 10, 62.5, 125, 250, 500, 750, 1000, and 1500 ms. The R 2 relaxation rates were measured using the Carr-Purcell-Meiboom-Gill experiment (hsqct2etf3gpsi) with the following time delays : 5, 30, 62.5, 125, 187.5, 250, 312.5, and 375 ms. Relaxation delays between scans were 6 s for R 1 and 4 s for R 2 . The signal intensities for each residue were fit to an exponential to extract R 1 and R 2 ; the fitting errors were reported as errors in these parameters. In addition, using the Bruker pulse sequence hsqcnoef3gpsi, the 1 H-15 N heteronuclear NOE value for each residue was obtained as the ratio of signal intensities collected with and without proton saturation, with a 10 s relaxation delay between scans. The same settings were used for relaxation measurements at pH 4.0. The NOE values were the average of two independent measurements, with the errors corresponding to the standard deviations of those measurements.
The measurement of amide proton exchange rates was carried out using the CLEANEX-PM pulse sequence (fhsqccxf3gpph). The CLEANEX-PM spin lock times were 10,15,20,25,30,40,50,75, and 100 ms. A fast HSQC reference spectrum was collected using the same pulse sequence parameters. All the experiments were run with a relaxation delay of 3 s. The amide proton exchange rates were calculated by fitting equation 1 in Hwang et al. [64] to the signal intensities for each residue at different spin lock times. The intrinsic exchange rates were calculated using the SPHERE server (https://protocol.fccc.edu/research/labs/roder/sphere/sphere.html) [65]. The protection factors were calculated as k intrinsic /k ex [66].
The significance of the differences in the means of R 1 , R 2 , NOE, and the protection factor between the two halves of ChiZ1-64 was analyzed by the independent samples t-test assuming unequal variances (Welch's t-test) using the scipy stats module in python.
To start, energy minimization was carried out using sander for 2000 steepest descent steps, followed by 3000 conjugate gradient steps. Subsequently, temperature equilibration, pressure equilibration, and production run were performed using pmemd.cuda on GPUs [75]. Under a constant volume, the temperature was ramped from 0 K to 300 K in 40 ps and then maintained at 300 K for 60 ps, using the Langevin thermostat with a friction coefficient of 3 ps −1 at 1 fs timesteps. The simulations then switched to constant pressure (Berendsen thermostat at 1 atm with a pressure relaxation time of 2 ps) and the timestep increased to 2 fs. The first 3 ns was nominally pressure equilibration, and the remaining simulations counted as the production run. The non-bonded cutoff was 10 Å in all simulations, except in C36m, where a force switch was imposed between 10 and 12 Å. All the bonds connected to hydrogens were constrained using the SHAKE algorithm [76].
For each of the force fields tested, 4 replicate simulations started with different random seeds were run for 500 ns each. Simulations in two of the force fields, FF14D and FF03WS, were expanded to 12 replicates, each lasting 3 µs. Snapshots were saved every 20 ps for analysis. In only 1.7% of snapshots ChiZ1-64 came within the non-bonded cutoff (10 Å) from its periodic images.

Calculation of SAXS Profiles
From the MD conformations, SAXS profiles were calculated using the FOXS code [77]. The optimal parameter for the hydration shell scattering density was selected for each water model according to Henriques et al. [78]. For each trajectory, a SAXS profile was calculated on every 10th saved conformation, and then an average was taken over these conformations. The resulting SAXS profile was linearly scaled to best match the experimental counterpart. The average of the scaled SAXS profiles over the replicate simulations was taken as the MD prediction. In cases where the number of replicates is 12, we also report the standard deviation among the replicates at each q as a measure of the calculation error.
All graphs were plotted using matplotlib and seaborn in python3.

Calculation of Chemical Shifts
Chemical shifts were calculated using the SHIFTX2 code [79] (at 300 K and pH 7). For each residue, the corresponding random-coil chemical shifts generated from the ncIDP database [80] were subtracted to yield Cα and Cβ secondary chemical shifts (without any scaling). Details for the averages and standard deviations largely followed the protocol for SAXS profiles.

Radius of Gyration, Secondary Structures, and Hydrogen Bonds
Cpptraj [81] was used for determining the radius of gyration, secondary structures (by implementing DSSP [82]), and hydrogen bonds. DSSP was modified to include PPII, following Mansiaux et al. [83]. Specifically, three or more consecutive residues that were classified as coil and fell into the PPII region of the Ramachandran map were reclassified as PPII.

Dihedral Principal Component Analysis
Dihedral principal component analysis (dPCA) [84,85] was performed through cpptraj [81], yielding 248 eigenmodes for the backbone ϕ and ψ angles of the 62 non-terminal residues of ChiZ1-64 (ϕ and ψ each were represented by their sine and cosine). To display the energy landscape in conformational space, the histogram of the projections of the saved snapshots along the two dPCA eigenmodes with the largest eigenvalues was calculated and then converted to a free energy surface according to the Boltzmann relation. These two projected coordinates were also used to group the snapshots into 16 clusters using the hierarchical Ward agglomerative algorithm. The snapshot that had the highest similarity score to all the members in a cluster was selected as the representative. The similarity score was defined as [86]: where r i n,m denotes the distance between atoms n and m in snapshot i, N is the total number of atoms in ChiZ1-64, and the average over j ran over all the snapshots in the given cluster.
The contribution from the fluctuations of torsion angle n to eigenmode k was determined by the amplitude of this eigenmode's components, v k 2n−1 and v k 2n , for the sine and cosine of the torsion angle (denoted by indices 2n − 1 and 2n). Specifically, this contribution was: Reference [85]. Note that the sum of ∆ k n over all the torsion angles is 1.

Contact Maps
Mdtraj [87] was used to load the trajectories, select atoms, and calculate distances between sidechain and sidechain or sidechain and backbone heavy atoms, excluding pairs from the same residue. Two heavy atoms were considered to be in contact if they were within 3.5 Å of each other. For each pair, the fraction of snapshots in which contacts formed was recorded.
The contacts formed by the two aromatic residues, Trp24 and Tyr47, with arginines were further examined to see whether they were cation-π interactions. The distance between the centers of mass of the indole or phenol ring and of the cationic group (including N ε , C ξ , N η1 , and N η2 ), and the angle between the line connecting these two points and the normal of the ring were calculated. The overwhelming majority of Trp24 contacts with Arg16, Arg25, and Arg26 had the above distance < 5 Å and the above angle < 60 • , and hence were deemed cation-π interactions. The same was true of Tyr47 contacts with Arg46 and Arg49.

NMR Relaxation Properties
From each trajectory, the NH bond vector time-correlation function for each non-proline residue was calculated as a time average: where P 2 (x) is the second-order Legendre polynomial and n(t) is the NH bond unit vector at time t. Each correlation function, with τ ranging from 20 ps to 25 ns, was then least-squares fit to a sum of exponentials, using the Levenberg-Marquardt algorithm from the scipy.optimize.curve_fit module in python. Note that the sum of the amplitudes was not restricted to 1, in contrast to most other studies, under the assumption that an ultrafast decay was completed by τ = 20 ps (the time interval at which we saved the snapshots in the MD simulations), thereby accounting for some missing amplitudes (see Section 3.7). To determine the optimal number of exponentials for modeling the simulation data, we compared the chi-squares (χ 2 ) of the fits with increasing n, starting at n = 2. Any fit with fitting errors higher than 10% of any fitted parameter was rejected. An increase from the n exponentials to n + 1 exponentials was accepted specifically if: This procedure led to n = 3 as the optimum for all residues. The three time constants were ordered as τ 1 > τ 2 > τ 3 .
After the tri-exponential fit, the spectral density was obtained as: Finally, the longitudinal and transverse relaxation times and the NOE were obtained as [35]: Biomolecules 2020, 10, 946 8 of 23 15 N spin relaxation by NH dipole-dipole interactions and the nitrogen chemical shift anisotropy, respectively. The meanings of the other symbols are: µ 0 , permittivity of free space; , reduced Plank constant; γ N and γ H , gyromagnetic ratios of nitrogen and hydrogen; ω H = γ H B 0 , Larmor frequency of hydrogen (800 MHz in our case); ω N , counterpart of nitrogen; r NH , NH bond length (set to 1.02 Å); and ∆ CSA (= −170 ppm), chemical shift anisotropy of nitrogen. For each relaxation property, we report the discrepancies between the measured and predicted values as the root mean squared error (RMSE), calculated over the entire ChiZ1-64 sequence except for the first and last residues. A bootstrapped 95% confidence interval was obtained to determine the error in the calculated R 1 , R 2 , and NOE.
We also considered two modifications to the tri-exponential NH bond vector time-correlation functions. The first was to include an ultrafast decay component, with time constant τ f (< 20 ps) and an amplitude of 1-A sum . The second was to account for the possibility that the longest timescale was exaggerated by the AMBER14SB/TIP4PD force field selected here. We hence tested scaling down the three time constants from the tri-exponential fits by a factor 1 + τ i /τ s , with τ s of the order of 10 ns. This scaling has little effect on τ 2 and τ 3 , but reduced the longest time constant τ 1 by roughly half, from 7-17 ns to 5-8 ns.

Data Availability
The chemical shifts of ChiZ1-64 have been deposited in BMRB (accession # 50115). Python scripts written for the NMR relaxation analysis are available on GitHub at https://github.com/achicks15/CorrF unction_NMRRelaxation.

Sequence Characteristics and Disorder of ChiZ1-64
ChiZ1-64 has disparate amino acid compositions between the first 32 residues (N-half) and the last 32 residues (C-half), in particular concerning prolines, glycines, and charged residues ( Figure 1b). Of the 12 prolines (19% of the sequence), two thirds are in the N-half. In contrast, of the nine glycines (14%), seven, or nearly 80%, are in the C-half. Prolines are known to break α-helices and β-strands but promote PPII helices, whereas glycines break all secondary structures. There is a significant net charge, +10e, coming from 13 arginines, two aspartates, and one glutamate. All the three anionic residues are in the N-half, whereas eight, or 62%, of the cationic residues are in the C-half, resulting in the contrast between a near balance of opposite charges in the N-half and total imbalance in the C-half. Lastly, we note that each half contains an aromatic residue, Trp24 in the N-half and Tyr47 in the C-half.
Giving the abundance of prolines and glycines and the high net charge, it is not surprising that the entire sequence of ChiZ1-64 is predicted to be disordered with high confidence [88][89][90] (Figure 1c). The 1 H-15 N HSQC spectrum confirms the disorder, with proton chemical shifts confined to the narrow range of 7.7 to 8.6 ppm (Figure 1d).

SAXS Profile and Secondary Chemical Shifts
The SAXS profile (Figure 2a), i.e., the scattering intensity I(q) as a function of q, the magnitude of the scattering vector, shows ChiZ1-64 as a typical IDP, especially when presented as a Kratky plot ( Figure 2b). The radius of gyration R g obtained from a fit to the Debye approximation ( Figure S1a) is 24.17 ± 0.05 Å. This value is slightly largely than that, 22.3 Å, predicted by a scaling relation, deduced from a set of IDPs [91]. A modest degree of expansion is also indicated by an upward tilt of the Kratky plot at high q. Secondary chemical shifts for Cα and Cβ can indicate the presence of α-helices and β-strands (corresponding to ∆δCα − ∆δCβ > 2 ppm and < −2 ppm, respectively) [30]. For ChiZ1-64, only a single residue had |∆δCα − ∆δCβ| > 1 (Figure 2c), indicating a lack of α-helices and β-strands for the entire sequence.
To check the convergence of the 36 μs simulations, we calculated the Rg histograms from the 12 replicate trajectories ( Figure S3). The histograms all showed broad distributions, with significant frequencies for Rg between 15 to 40 Å and mean Rg values ranging from 22.44 to 26.44 Å. Combining the 12 replicate simulations, the overall mean Rg was 24.4 Å, with a standard deviation of 1.4 Å among the replicates. The mean Rg agrees well with the experimental results. Overall, the selected force field reproduced the experimental data well for both residue-specific properties and global conformational properties.

High Poly-Proline II Propensities
Consistent with the lack of α-helices and β-strands indicated by secondary chemical shifts, the contents of these secondary structures were minimal in the MD simulations ( Figure 3). Two stretches of residues in the N-half, 13-15 and 30-32, formed 310 helices with a moderate frequency (~7%). Note that 310 helices have a much lower intrinsic stability than α-helices. In addition, 310 helices and antiparallel β-sheets were formed infrequently by C-half residues (45 to 61). On the other hand, ChiZ1-64 exhibited high PPII propensities, which only the MD simulations were able to reveal. Here, PPII Secondary chemical shifts for C α and C β can indicate the presence of α-helices and β-strands (corresponding to ∆δC α − ∆δC β > 2 ppm and < −2 ppm, respectively) [30]. For ChiZ1-64, only a single residue had |∆δC α − ∆δC β | > 1 (Figure 2c), indicating a lack of α-helices and β-strands for the entire sequence.
To check the convergence of the 36 µs simulations, we calculated the R g histograms from the 12 replicate trajectories ( Figure S3). The histograms all showed broad distributions, with significant frequencies for R g between 15 to 40 Å and mean R g values ranging from 22.44 to 26.44 Å. Combining the 12 replicate simulations, the overall mean R g was 24.4 Å, with a standard deviation of 1.4 Å among the replicates. The mean R g agrees well with the experimental results. Overall, the selected force field reproduced the experimental data well for both residue-specific properties and global conformational properties.

High Poly-Proline II Propensities
Consistent with the lack of α-helices and β-strands indicated by secondary chemical shifts, the contents of these secondary structures were minimal in the MD simulations ( Figure 3). Two stretches of residues in the N-half, 13-15 and 30-32, formed 3 10 helices with a moderate frequency (~7%). Note that 3 10 helices have a much lower intrinsic stability than α-helices. In addition, 3 10 helices and anti-parallel β-sheets were formed infrequently by C-half residues (45 to 61). On the other hand, ChiZ1-64 exhibited high PPII propensities, which only the MD simulations were able to reveal. Here, PPII was counted when contiguous residues (minimum of three) fell in the PPII region on the Ramachandran map ( Figure S4). Three stretches of residues sampled PPII over 50% of the time. All of them are in the N-half: residues 4-6, 10-12, and 27-29. In comparison, the highest PPII frequency in the C-half was only 36% for residue 44. Prolines are the most direct reason for the high PPII propensities, as the high-PPII stretches in the N-half contain or border prolines at positions 3, 6, 7, 10, 12, and 29; in the C-half, residue 44 is also a proline.
Biomolecules 2020, 10, x 10 of 23 was counted when contiguous residues (minimum of three) fell in the PPII region on the Ramachandran map ( Figure S4). Three stretches of residues sampled PPII over 50% of the time. All of them are in the N-half: residues 4-6, 10-12, and 27-29. In comparison, the highest PPII frequency in the C-half was only 36% for residue 44. Prolines are the most direct reason for the high PPII propensities, as the high-PPII stretches in the N-half contain or border prolines at positions 3, 6, 7, 10, 12, and 29; in the C-half, residue 44 is also a proline. Proline strongly prefers the PPII region on the Ramachandran map ( Figure S4). This preference extends to the preceding residue, unless it is a glycine. However, PPII helices are only marginally stable. Unlike α-helices and β-sheets, PPII helices are not stabilized by backbone hydrogen bonds. Although prolines provide some impetus, PPII stretches may not form unless stabilized by other interactions (see below).

Flat Energy Landscape in Conformational Space
The lack of stable secondary structures portended a high degree of diversity in the conformations sampled by ChiZ1-64. To quantify this aspect, we performed backbone dihedral principal component analysis (dPCA) [84,85] on conformations saved from the MD simulations. Each conformation was projected onto the first two eigenmodes with the largest eigenvalues, and the distribution of the conformations in this two-dimensional subspace was obtained. The resulting free energy surface shows a broad, shallow basin, with local barriers all less than 2 kBT (kB: Boltzmann constant; T: absolute temperature) ( Figure S5a).
Another indication of the conformational diversity is provided by the closely spaced eigenvalues ( Figure S6a; in a contrasting scenario where a few large eigenvalues are separated from many small eigenvalues, the former would correspond to modes involving the concerted motions of a large portion of the protein, whereas the latter would correspond to localized motions). When normalized by the sum of all eigenvalues, the four largest eigenvalues were 0.029, 0.025, 0.023, and 0.021; the eigenvalues decreased smoothly with an increasing mode number. The first four eigenmodes, represented by the fluctuation amplitudes of individual torsion angles, are displayed in Figure S6be. The amplitudes of the φ angles, with the exception for those of a few glycines, were low, reflecting the fact that φ was mostly confined to the range of −50° to −150° ( Figure S4). The ψ values spanned a wide range, covering different secondary structures (−100° to 0° for α-and 310 helices, and 100° to 180° for PPII helices and β-strands). Residues with high ψ amplitudes in the first three modes mostly Proline strongly prefers the PPII region on the Ramachandran map ( Figure S4). This preference extends to the preceding residue, unless it is a glycine. However, PPII helices are only marginally stable. Unlike α-helices and β-sheets, PPII helices are not stabilized by backbone hydrogen bonds. Although prolines provide some impetus, PPII stretches may not form unless stabilized by other interactions (see below).

Flat Energy Landscape in Conformational Space
The lack of stable secondary structures portended a high degree of diversity in the conformations sampled by ChiZ1-64. To quantify this aspect, we performed backbone dihedral principal component analysis (dPCA) [84,85] on conformations saved from the MD simulations. Each conformation was projected onto the first two eigenmodes with the largest eigenvalues, and the distribution of the conformations in this two-dimensional subspace was obtained. The resulting free energy surface shows a broad, shallow basin, with local barriers all less than 2 k B T (k B : Boltzmann constant; T: absolute temperature) ( Figure S5a).
Another indication of the conformational diversity is provided by the closely spaced eigenvalues ( Figure S6a; in a contrasting scenario where a few large eigenvalues are separated from many small eigenvalues, the former would correspond to modes involving the concerted motions of a large portion of the protein, whereas the latter would correspond to localized motions). When normalized by the sum of all eigenvalues, the four largest eigenvalues were 0.029, 0.025, 0.023, and 0.021; the eigenvalues decreased smoothly with an increasing mode number. The first four eigenmodes, represented by the fluctuation amplitudes of individual torsion angles, are displayed in Figure S6b-e. The amplitudes of the ϕ angles, with the exception for those of a few glycines, were low, reflecting the fact that ϕ was mostly confined to the range of −50 • to −150 • ( Figure S4). The ψ values spanned a wide range, covering different secondary structures (−100 • to 0 • for αand 3 10 helices, and 100 • to 180 • for PPII helices and β-strands). Residues with high ψ amplitudes in the first three modes mostly were found in the two N-half stretches, 13-15 and 30-32, with a moderate 3 10 propensity. The fourth mode mostly involved C-half residues (45 to 61) that formed 3 10 helices and anti-parallel β-sheets infrequently.
To find a minimal set of conformations that still conveyed the overall sense of conformational diversity, we used the projections of the MD conformations in the subspace of the first two eigenmodes to group them into 16 clusters ( Figure S5b) and selected one conformation from each cluster. The selection was based on a similarity score, which measured the extent of similarity of a given conformation to all the other conformations in the same cluster. The highest similarity score for any conformation with all the other cluster members ranged from 0.15 to 0.19, about the same as that between two randomly chosen conformations, again highlighting the conformational diversity.
The set of 16 conformations, one from each cluster with the highest similarity score, illustrates the conformational diversity in the MD simulations ( Figure 4). All these conformations contained at least one PPII stretch; five of them contained a 3 10 helix; two contained a hybrid 3 10 -α helix (featuring both i to i + 3 and i to i + 4 hydrogen bonds); one contained an antiparallel β-sheet. Visual inspection also revealed that arginines frequently formed salt bridges with the aspartates and glutamates as well as cation-π interactions with the tryptophan and tyrosine. Furthermore, the cationic and anionic side chains frequently formed hydrogen bonds with backbone carbonyls and amides, respectively. Sometimes these interactions grew into a network. Thus, while the backbone conformations were diverse, the salt bridges, cation-π interactions, and side chain-backbone hydrogen bonds were pervasive, albeit formed by different partners at different times.
To find a minimal set of conformations that still conveyed the overall sense of conformational diversity, we used the projections of the MD conformations in the subspace of the first two eigenmodes to group them into 16 clusters ( Figure S5b) and selected one conformation from each cluster. The selection was based on a similarity score, which measured the extent of similarity of a given conformation to all the other conformations in the same cluster. The highest similarity score for any conformation with all the other cluster members ranged from 0.15 to 0.19, about the same as that between two randomly chosen conformations, again highlighting the conformational diversity.
The set of 16 conformations, one from each cluster with the highest similarity score, illustrates the conformational diversity in the MD simulations ( Figure 4). All these conformations contained at least one PPII stretch; five of them contained a 310 helix; two contained a hybrid 310-α helix (featuring both i to i + 3 and i to i + 4 hydrogen bonds); one contained an antiparallel β-sheet. Visual inspection also revealed that arginines frequently formed salt bridges with the aspartates and glutamates as well as cation-π interactions with the tryptophan and tyrosine. Furthermore, the cationic and anionic side chains frequently formed hydrogen bonds with backbone carbonyls and amides, respectively. Sometimes these interactions grew into a network. Thus, while the backbone conformations were diverse, the salt bridges, cation-π interactions, and side chain-backbone hydrogen bonds were pervasive, albeit formed by different partners at different times.   10 , and hybrid 3 10 -α helices in purple, yellow, and green respectively; β-sheet, orange). Cationic, anionic, and aromatic side chains involved in salt bridges, cation-π interactions, and SC-BB hydrogen bonds are shown in blue, red, and orange, respectively. Boxed regions, after enlargement, are shown in Figure 5.

Correlated Segments Revealed by Contact Maps
To quantify these prevailing interactions formed in the MD simulations, we calculated the contact frequencies between heavy atoms on any two side chains (SC-SC; Figure 5a) or between a heavy atom on any side chain and a heavy atom on the backbone of any other residue (SC-BB; Figure  5b). A contact was formed when two heavy atoms were less than 3.5 Å apart. Overall, the N-half formed nonlocal SC-SC contacts much more frequently than the C-half. To quantify this difference, we took the highest contact frequency among the SC heavy atoms of two residues to represent that residue pair and, for each residue, defined its nonlocal contact number as the average of the contact frequencies among all the partner residues except for the three nearest neighbors in either direction. The mean of the nonlocal contact numbers for the N-half residues was 0.00112, nearly twice of the counterpart, 0.00069, for the C-half residues. Residues forming SC-SC contacts with significant frequencies could roughly be grouped into five segments along the sequence (indicated by red boxes in Figure 5a). The N-half broke into three segments: Thr2 to Pro6, Arg5 to Pro12, and Asp11 to Pro29. The fourth segment, Glu28 to Pro40, straddled the two halves. The rest of the C-half contained one more segment, Pro44 to Pro63. For several residues, including Arg5, Asp11, and Glu28 (Glu28 is illustrated in Figure 5a inset #4), the contacts extended beyond a single segment, explaining why every two adjacent segments in the N-half had a two-residue overlap. Contacts made by the three anionic residues, Asp11, Asp20, and Glu28, traversed the N-half (illustrated by an Asp20-Arg5 salt bridge in Figure 5a inset #16) and even extended into the entire C-half.
The most extensive interaction network was formed with Trp24, Arg25, Arg26, Glu28 at the core (Figure 5a blue solid box and inset #4; see also Figure 5b inset #15; Figure S7a). Trp24 formed cationπ interactions with Arg25 and other arginines, whereas Glu28 formed multiple salt bridges with Arg25, Arg26, and other arginines. In the C-half, the most extensive interaction network (Figure 5a blue dash box; Figure S7b) had cation-π interactions of Tyr47 with Arg46 and Arg49 at the core (e.g., Tyr47-Arg49 in Figure 4 #15). We will see that these salt bridges and π interactions align with the regions of slow backbone dynamics when presenting Figures 6 and 7.
The five correlated segments each contain one or more transiently formed PPII stretches ( Figure  S8). The three most prevalent PPII stretches (residues 4-6, 10-12, and 27-29) identified above fall right into Boxes 1, 2, and 3. It is thus evident that SC-SC contacts contribute to the prevalence of the PPII

Correlated Segments Revealed by Contact Maps
To quantify these prevailing interactions formed in the MD simulations, we calculated the contact frequencies between heavy atoms on any two side chains (SC-SC; Figure 5a) or between a heavy atom on any side chain and a heavy atom on the backbone of any other residue (SC-BB; Figure 5b). A contact was formed when two heavy atoms were less than 3.5 Å apart.
Overall, the N-half formed nonlocal SC-SC contacts much more frequently than the C-half. To quantify this difference, we took the highest contact frequency among the SC heavy atoms of two residues to represent that residue pair and, for each residue, defined its nonlocal contact number as the average of the contact frequencies among all the partner residues except for the three nearest neighbors in either direction. The mean of the nonlocal contact numbers for the N-half residues was 0.00112, nearly twice of the counterpart, 0.00069, for the C-half residues. Residues forming SC-SC contacts with significant frequencies could roughly be grouped into five segments along the sequence (indicated by red boxes in Figure 5a). The N-half broke into three segments: Thr2 to Pro6, Arg5 to Pro12, and Asp11 to Pro29. The fourth segment, Glu28 to Pro40, straddled the two halves. The rest of the C-half contained one more segment, Pro44 to Pro63. For several residues, including Arg5, Asp11, and Glu28 (Glu28 is illustrated in Figure 5a inset #4), the contacts extended beyond a single segment, explaining why every two adjacent segments in the N-half had a two-residue overlap. Contacts made by the three anionic residues, Asp11, Asp20, and Glu28, traversed the N-half (illustrated by an Asp20-Arg5 salt bridge in Figure 5a inset #16) and even extended into the entire C-half.
The most extensive interaction network was formed with Trp24, Arg25, Arg26, Glu28 at the core (Figure 5a blue solid box and inset #4; see also Figure 5b inset #15; Figure S7a). Trp24 formed cation-π interactions with Arg25 and other arginines, whereas Glu28 formed multiple salt bridges with Arg25, Arg26, and other arginines. In the C-half, the most extensive interaction network (Figure 5a blue dash box; Figure S7b) had cation-π interactions of Tyr47 with Arg46 and Arg49 at the core (e.g., Tyr47-Arg49 in Figure 4 #15). We will see that these salt bridges and π interactions align with the regions of slow backbone dynamics when presenting Figures 6 and 7. first glance, the relaxation properties are relatively uniform across the sequence, except for the extreme four residues at each terminus, with reduced R1, R2, and NOE. The resulting "bell" shape for R2 has been suggested as arising from the residue-residue connectivity of a (denatured or disordered) polypeptide chain [54,55]. The average R1 for residues 5-60 was 1.74 s −1 ; the only pronounced deviation was a local minimum at residues Gly42 and Ala43.  (11-14, 23-28, and 46-50) with higher-thanaverage R2s and NOEs. Gaps in the plots are due to prolines (and unresolved residues in the case of experimental data). shaded cyan regions highlight residues (11-14, 23-28, and 46-50) with higher-than-average R 2 s and NOEs. Gaps in the plots are due to prolines (and unresolved residues in the case of experimental data).
The five correlated segments each contain one or more transiently formed PPII stretches ( Figure S8). The three most prevalent PPII stretches (residues 4-6, 10-12, and 27-29) identified above fall right into Boxes 1, 2, and 3. It is thus evident that SC-SC contacts contribute to the prevalence of the PPII stretches. This point is clearly illustrated by the contrast between Pro29 and Pro63. These prolines are both free from the direct influence of neighboring prolines or glycines and yet differ significantly in PPII frequencies (52% for Pro29 vs. 15% for Pro63). The most likely reason for the much higher PPII frequency of Pro29 is that it is next to a stretch of residues (Trp25 to Glu28) that form extensive interactions. Pro44 is close to a stretch of residues (Arg46 to Arg48) that form less extensive interactions and has an intermediate PPII frequency (36%).
The patterns of SC-BB contacts largely mirrored those of the SC-SC contacts. The SC-BB contacts segregated into the same five segments. Most frequent were contacts between adjacent residues, in particular hydrogen bonds between arginines and backbone carbonyls (e.g., Arg25 with the carbonyls of residues 24 and 25, as shown in Figure 5b inset #15) and between anionic residues and backbone amides. Still, nonlocal SC-BB hydrogen bonds occurred with significant frequencies in the N-half segments. For instance, Arg16 hydrogen bonded with the carbonyl of residue 25; Arg23 with residues 16, 17, and 18; and Arg25 with residue 13, as shown in Figure 5b inset #12; Arg16 with residue 11; and Asp20 with the backbone amide of residue 16, as shown in Figure 5b inset #15. There were relatively fewer nonlocal SC-BB hydrogen bonds in the C-half. All in all, the SC-BB hydrogen bonds contribute to the stability of the correlated segments identified by the SC-SC contacts and, at the same time, also directly influence the backbone 15 N relaxation and amide proton exchange rates. The means ± standard deviations of the three time constants were 11.5 ± 2.4, 2.4 ± 0.5, and 0.34 ± 0.06 ns for the non-terminal residues. The three exponentials with these time constants each contribute most to a different relaxation property, specifically, with the slow, intermediate, and fast timescales controlling R2, R1, and NOE, respectively. The amplitudes (A2) associated with the intermediate time constant were nearly uniform along the sequence (at 0.36 ± 0.04), except for two very low values at Gly42 and Ala43 (Figure 7a). These results for A2 largely explain the corresponding behavior of R1 presented above, i.e., near constancy except for higher-than-average values for Gly42 and Ala43. On the other hand, A1 and A3 showed disparities between the N-and C-halves. In the Nhalf, the A1 and A3 averages were nearly the same, at 0.22 and 0.23, respectively. In the C-half, the A1 average moved down to 0.15, while the A3 average moved up to 0.27. Given the near constancy of A2 and Asum, the opposite movements of A1 and A3 were inevitable. The disparity in A1 between the two halves explains the corresponding disparity in R2, with lower A1 values in the C-half accounting for the lower R2s (i.e., weaker transverse relaxation) in that half. Likewise, the disparity in A3 between the two halves explains the corresponding disparity in NOE, with higher A3 values in the C-half accounting for the lower NOEs (i.e., higher flexibility) in that half.
The area under the CNH(τ) curve (AUC) equals the spectral density, J(0), at zero frequency, to which R2 is particularly sensitive. The AUC values (and their contributions from the three exponentials) are displayed in Figure 7b. Two patterns are apparent (which are also true of the A1 component). First, the N-half overall had higher AUCs than the C-half (with averages at 3.8 and 2.4 ns, respectively). Second, there were three local maxima at residues 11-13, 24-27, and 47-48. These

Sequence-Specific Backbone Dynamics
In Figure 6 (black solid curves), we display the longitudinal and transverse relaxation rates (R 1 and R 2 ) and nuclear Overhauser enhancements (NOEs) of individual backbone 15 N sites at pH 7.0. At first glance, the relaxation properties are relatively uniform across the sequence, except for the extreme four residues at each terminus, with reduced R 1 , R 2 , and NOE. The resulting "bell" shape for R 2 has been suggested as arising from the residue-residue connectivity of a (denatured or disordered) polypeptide chain [54,55]. The average R 1 for residues 5-60 was 1.74 s −1 ; the only pronounced deviation was a local minimum at residues Gly42 and Ala43.
Closer inspection revealed a small but systematic difference in R 2 between the N-and C-halves, with mean values for residues 5-32 and 33-60 at 4.76 and 4.17 s −1 , respectively (black dashed lines in Figure 6b). There was also a distinction in NOE between the two halves, with mean values at 0.34 for residues 5-32 and 0.25 for residues 33-60 (black dashed lines in Figure 6c). A t-test treating the N-half and C-half as two independent samples found the p-values for the differences in mean R 2 and in the mean NOE between the two halves to be both below 0.05 (Table S1), therefore indicating statistical significance. The overall low NOE values once again corroborate the lack of stable backbone structures. Still, the R 2 and NOE data together suggest that the N-half overall has larger amplitudes for motions on the slower (e.g., 10-ns) timescale but smaller amplitudes on the faster (sub-ns) timescale than the C-half. Also worth noting are three stretches of residues, 11-14 and 23-28 in the N-half and 45-50 in the C-half (blue shading in Figure 6b,c), that had higher-than-average R 2 s and NOEs.
The relaxation properties at pH 4.0 showed an even stronger disparity between the N-and C-halves ( Figure S9). The mean R 2 s in the two halves were 3.10 and 2.26 s −1 , and the mean NOEs had a wide gap, with values of 0.24 and 0.03 for the two halves. A distinction in the mean R 1 also became apparent between the N-and C-halves. The p-values for the differences in mean values were below 0.001 for all the three relaxation parameters, indicating a strong statistical significance. A likely consequence of the decrease in pH to 4.0 is the protonation of the three histidines (at positions 8, 48, and 59), which would amplify the charge imbalance in the C-half and thereby increase its disorder.
The MD simulations afforded the opportunity for a detailed interpretation of the NMR relaxation data. After evaluating the NH bond vector time-correlation functions, C NH (τ), from the MD trajectories (at 20 ps time intervals) and fitting them to a sum of three exponentials, the resulting spectral densities were used, without any modification, to calculate the relaxation properties. The results were close to the experimental counterparts but with systematic underestimates in R 1 and overestimates in R 2 (colored solid curves). The root mean square errors (RMSEs) relative to the experimental data (excluding the extreme residue at each terminus) were 0.38 s −1 for R 1 , 1.8 s −1 for R 2 , and 0.09 for NOE. Importantly, the MD simulations recapitulated the sequence-dependent features of the experimental data, including: (1) the overall differences in R 2 and NOE between the N-and C-halves (as indicated by disparate mean values in the two halves, shown as colored dashed lines in Figure 6b,c); (2) the three stretches of residues showing the local maxima in R 2 and NOE (blue shading in Figure 6b,c); and (3) the local minimum in R 1 at residues 42 and 43 (Figure 6a).

Amplitudes of Backbone Dynamics on Different Timescales
Given the above qualitative agreement with the experimental data, we now report on the MD results for C NH (τ), specifically their tri-exponential fits (with time constants τ 1 , τ 2 , and τ 3 , ordered from large to small, and amplitudes A 1 , A 2 , and A 3 ; see Figure S10 for representative fits). It is important to note that we did not restrain the sum of the amplitudes, A sum = A 1 + A 2 + A 3 , to be 1. Implicitly, we assumed that the missing amplitude, 1 − A sum , represented an ultrafast decay that occurred before the first time point, 20 ps, at which we evaluated C NH (τ). Indeed, adding an ultrafast decay component with an amplitude of 1 − A sum and a time constant of τ f = 10 ps largely made up for some underestimates of the tri-exponential fits at short times (Figure S10 insets). Data between 0 and 20 ps would be required for a precise fit of the ultrafast component for each residue, but a global value of 10 ps for τ f apparently worked well for most residues. The mean ± standard deviation of A sum for non-terminal residues was 0.80 ± 0.04 (black dashed line in Figure 7a). In comparison, the order parameters for NH libration calculated after superimposing the peptide plane were 0.933 ± 0.001, implicating additional contributions (e.g., rapid fluctuations in the ϕ and ψ angles adjoining the peptide plane [53]) to the ultrafast decay. Interestingly, the three local maxima in A sum , at Asp11, Arg25, and Arg49, apparently coincided with the residues showing higher-than-average R 2 s and NOEs (triangles filled in blue in Figure 7a), whereas the minimum in A sum at Gly42 (triangles filled in red in Figure 7a) coincided with the residues showing lower-than-average R 1 s.
The means ± standard deviations of the three time constants were 11.5 ± 2.4, 2.4 ± 0.5, and 0.34 ± 0.06 ns for the non-terminal residues. The three exponentials with these time constants each contribute most to a different relaxation property, specifically, with the slow, intermediate, and fast timescales controlling R 2 , R 1 , and NOE, respectively. The amplitudes (A 2 ) associated with the intermediate time constant were nearly uniform along the sequence (at 0.36 ± 0.04), except for two very low values at Gly42 and Ala43 (Figure 7a). These results for A 2 largely explain the corresponding behavior of R 1 presented above, i.e., near constancy except for higher-than-average values for Gly42 and Ala43. On the other hand, A 1 and A 3 showed disparities between the N-and C-halves. In the N-half, the A 1 and A 3 averages were nearly the same, at 0.22 and 0.23, respectively. In the C-half, the A 1 average moved down to 0.15, while the A 3 average moved up to 0.27. Given the near constancy of A 2 and A sum , the opposite movements of A 1 and A 3 were inevitable. The disparity in A 1 between the two halves explains the corresponding disparity in R 2 , with lower A 1 values in the C-half accounting for the lower R 2 s (i.e., weaker transverse relaxation) in that half. Likewise, the disparity in A 3 between the two halves explains the corresponding disparity in NOE, with higher A 3 values in the C-half accounting for the lower NOEs (i.e., higher flexibility) in that half.
The area under the C NH (τ) curve (AUC) equals the spectral density, J(0), at zero frequency, to which R 2 is particularly sensitive. The AUC values (and their contributions from the three exponentials) are displayed in Figure 7b. Two patterns are apparent (which are also true of the A 1 component). First, the N-half overall had higher AUCs than the C-half (with averages at 3.8 and 2.4 ns, respectively). Second, there were three local maxima at residues 11-13, 24-27, and 47-48. These were the same maxima as identified based on A sum (Figure 7a), but were now much more conspicuous. They explain the higher-than-average R 2 s of the involved residues (Figure 6b).
Ultimately, the higher amplitudes (A 1 ) for the slow timescale (and higher AUCs) of the N-half came from the more frequent SC-SC and SC-BB contacts in this half, in particular salt bridges, cation-π interactions, and SC-BB hydrogen bonds mediated by charged residues, resulting in correlated segmental motions. Indeed, the two most extensive interaction networks, centered around residues 24-27 and 46-49, were directly responsible for the local maxima in AUC and the corresponding local maxima in R 2 at these residues. In contrast, for Gly41 and Gly42, the absence of a sidechain not only allows them to access the left-handed side of the Ramachandran map ( Figure S4), but also precludes them from forming any SC-SC or SC-BB contacts ( Figure 5), resulting in much faster backbone dynamics.

Non-Uniform Amide Proton Exchange Rates along the Sequence
Amide proton exchange rates (k ex ; Figure 8a) further corroborated the presence of correlated segments suggested by the NMR relaxation experiments and MD simulations. The average k ex of the N-half, 3.9 s -1 , was less than one third of the counterpart of the C-half, 12.4 s -1 .
Biomolecules 2020, 10, x 16 of 23 were the same maxima as identified based on Asum (Figure 7a), but were now much more conspicuous. They explain the higher-than-average R2s of the involved residues ( Figure 6b). Ultimately, the higher amplitudes (A1) for the slow timescale (and higher AUCs) of the N-half came from the more frequent SC-SC and SC-BB contacts in this half, in particular salt bridges, cationπ interactions, and SC-BB hydrogen bonds mediated by charged residues, resulting in correlated segmental motions. Indeed, the two most extensive interaction networks, centered around residues 24-27 and 46-49, were directly responsible for the local maxima in AUC and the corresponding local maxima in R2 at these residues. In contrast, for Gly41 and Gly42, the absence of a sidechain not only allows them to access the left-handed side of the Ramachandran map ( Figure S4), but also precludes them from forming any SC-SC or SC-BB contacts ( Figure 5), resulting in much faster backbone dynamics.

Non-Uniform Amide Proton Exchange Rates along the Sequence
Amide proton exchange rates (kex; Figure 8a) further corroborated the presence of correlated segments suggested by the NMR relaxation experiments and MD simulations. The average kex of the N-half, 3.9 s -1 , was less than one third of the counterpart of the C-half, 12.4 s -1 . kex is strongly dependent on the amino acid sequence [66]. We calculated the intrinsic exchange rates (kintrinsic) from the sequence using the SPHERE server (Figure 8b) [65]. By taking the ratio kintrinsic/kex, we obtained the protection factors (Figure 8c). Interestingly, for the C-half residues the kex values were all close to the kintrinsic values, indicating there is very little influence beyond the immediate amino acid sequence (average protection factor at 1.14). In contrast, for most of the N-half k ex is strongly dependent on the amino acid sequence [66]. We calculated the intrinsic exchange rates (k intrinsic ) from the sequence using the SPHERE server (Figure 8b) [65]. By taking the ratio k intrinsic /k ex , we obtained the protection factors (Figure 8c). Interestingly, for the C-half residues the k ex values were all close to the k intrinsic values, indicating there is very little influence beyond the immediate amino acid sequence (average protection factor at 1.14). In contrast, for most of the N-half residues the protection factors were higher than 1, averaging 2.57. A t-test showed that the difference in mean protection factor between the N-and C-halves is statistically significant (p-value at 0.015; Table S1). This difference can be nicely explained by the disparities in the SC-SC and SC-BB contacts between the two halves. In particular, the two residues with the highest protection factors (residue 14 at 5.4 and residue 21 at 10.8) are located in the most correlated segment (residues 11-29, identified by extensive interactions and high R 2 s and NOEs).

Discussion
By combining NMR, SAXS, and MD simulations, we have characterized the conformations and dynamics of ChiZ1-64 and delineated their linkage to the amino acid sequence. The conformations of ChiZ1-64 were diverse, with the only notable feature being high propensities of PPII stretches, especially in the N-half. Backbone 15 N relaxation experiments revealed non-uniform R 2 s and NOEs along the sequence, with high values for residues 11-14 and 23-28. These or neighboring residues also have high protections factors for amide proton exchange. MD simulations recapitulated these observations and suggest that the reason for the non-uniform dynamics is the formation of correlated segments, which are stabilized by PPII stretches, salt bridges, cation-π interactions, and sidechain-backbone hydrogen bonds. Moreover, the extent of segmental correlation is sequence-dependent: segments where internal interactions are more prevalent manifest elevated "collective" motions and suppressed local motions.
Similar to ChiZ1-64, sequence-specific backbone dynamics have been reported on a number of other IDPs using NMR, some in combination with MD simulations [38,[48][49][50]53,55,92,93]. Whereas stable secondary structures such as α-helices and β-hairpins can certainly lead to slow backbone dynamics [38,48,50,53], as demonstrated here for ChiZ1-64, interaction networks, in particular those mediated by charged and aromatic residues, can lead to the formation of correlated segments, which can have slow dynamics even when the backbone remains disordered. We propose the correlated segment as a defining feature for the conformation and dynamics of IDPs. Contact maps provide a way to identify correlated segments and characterize their stabilizing interactions. For example, it is interesting to investigate whether cation-π or other types of interactions contribute to the slow dynamics of two tryptophans in the C-terminal domain of the nucleoprotein of Sendai virus [38,48], or the precise interactions that may be responsible for the slow dynamics of two stretches of residues in HOX transcription factors [93]. π-π interactions between aromatic residues have been proposed to produce elevated R 2 s in A1-LCD [56], though it remains to be seen whether explicit-solvent MD simulations can quantitatively explain the NMR data. The accumulation of this type of knowledge over a large number of IDPs will advance our understanding of how amino acid sequences of IDPs, through the formation of correlated segments, code for dynamics.
There is pressing need for the continued development of IDP force fields. The AMBER14SB/TIP4PD force field selected here based on benchmarking against SAXS profiles and chemical shifts also performed reasonably well for dynamic properties. Coincidentally, the same force field was also selected by Kämpf et al. [53] from comparison with backbone 15 N relaxation data. Still, for ChiZ1-64, the MD results had an apparent systematic underestimation of R 1 and an overestimation of R 2 . The opposite deviations suggest an exaggeration of the longest timescale in the NH bond vector time-correlation functions. To test this idea, we scaled down the three time constants from the tri-exponential fits by a factor 1 + τ i /τ s , with τ s on the order of 10 ns, along with the addition of an ultrafast decay component with the time constant τ f = 10 ps and amplitude 1 − A sum noted above. With τ s = 16.75 ns, the systematic errors were reduced for R 1 and almost eliminated for R 2 ; the NOE calculations maintained a good agreement with the experimental data ( Figure S11). Whether TIP4PD indeed makes ns dynamics too slow and, if so, how to improve this promising water model warrants further studies. It is also possible that the Langevin thermostat affects the dynamics of ChiZ1-64. Dynamic properties of IDPs have much to contribute in force field validation and improvements.
Although the functional role of ChiZ in Mtb cell division remains open, it may involve interactions with other divisome proteins, including FtsQ and FtsI [62]. Like ChiZ, both of the latter proteins contain disordered cytoplasmic regions high in charged amino acids. The interactions between all these IDRs may lead to fuzzy complexes. Moreover, these highly charged IDRs are also very likely to associate with the highly anionic Mtb membranes. The conformational and dynamic characterization of the ChiZ IDR in isolation done here will set the stage for studying these more complex systems. Given the disparity between the two halves of the ChiZ IDR, we expect the N-half to be more calcitrant and the C-half more adaptive in interacting with the various partners. In the full-length protein, the C-terminus of the IDR would be tethered to the membrane via its linkage to the transmembrane helix. The very C-terminal residues of the IDR would thus be restricted, though the rest of the C-half could still be free to sample its conformational space.

Conclusions
It is becoming evident that conformational dynamics play crucial roles in the functionality of IDPs. A number of experimental techniques can characterize IDP dynamics on different timescales, but in many cases the interpretations of such data are not straightforward. Computational methods such as MD simulations can help with the interpretations and with elucidating the link between sequence, dynamics, and function. In this study, we combined small angle X-ray scattering, NMR spectroscopy, and MD simulations to characterize a newly identified disordered region of ChiZ, ChiZ1-64. Overcoming the traditional limitations of MD simulations of IDPs with regard to force fields and sampling, we determined that the backbone dynamics of ChiZ1-64 are sequence-dependent, with several segments, mostly in the first 32 residues, showing high amplitudes in correlated motions. These correlated segmental dynamics are promoted by PPII formation and side chain-side chain and side chain-backbone interactions. The overexpression of ChiZ has been shown to halt Mtb cell division, potentially through interactions with FtsI and FtsQ, two other Mtb divisome proteins with disordered regions. Although we cannot absolutely determine ChiZ's mechanistic role in Mtb cell division, we hypothesize that sequence-dependent dynamics will be critical for this understanding. Potentially, the intrinsic fast dynamics of the C-half would allow it to readily adapt to binding partners, including the Mtb membrane and other divisome proteins, while the N-half rich in correlated segments may adopt nascent conformations that become stabilized by binding with partners. The characterization and methods illustrated here will also provide a framework for future studies to investigate the roles of dynamics in IDP functions.

Supplementary Materials:
The following are available online at http://www.mdpi.com/2218-273X/10/6/946/s1, Figure S1: Measured and predicted SAXS profiles; Figure S2: Measured and predicted chemical shifts; Figure S3: Radii of gyration (R g ) from 12 replicate simulations; Figure S4: Ramachandran maps for the 62 non-terminal residues in ChiZ1-64; Figure S5: Dihedral principal component analysis (dPCA) and clustering; Figure S6: Eigenvalues and eigenmodes from dPCA; Figure S7: Enlarged view of Box 3 and Box 5 from contact maps in Figure 5; Figure S8: PPII stretches in the 16 conformations in Figure 4.; Figure S9: NMR relaxation parameters at pH 4.0; Figure S10: Representative correlation functions and tri-exponential fits; Figure S11: Backbone 15 N relaxation parameters from MD simulations, after modifications to include ultrafast decay and tempered slow dynamics; Table S1: p-values and t-statistics, comparing the means of the two halves of ChiZ1-64.