Investigating the Conformational Ensembles of Intrinsically-Disordered Proteins With a Simple Physics-Based Model.

Intrinsically disordered proteins (IDPs) play an important role in an array of biological processes but present a number of fundamental challenges for computational modeling. Recently, simple polymer models have regained popularity for interpreting the experimental characterization of IDPs. Homopolymer theory provides a strong foundation for understanding generic features of phenomena ranging from single-chain conformational dynamics to the properties of entangled polymer melts, but is difficult to extend to the copolymer context. This challenge is magnified for proteins due to the variety of competing interactions and large deviations in side-chain properties. In this work, we apply a simple physics-based coarse-grained model for describing largely disordered conformational ensembles of peptides, based on the premise that sampling sterically-forbidden conformations can compromise the faithful description of both static and dynamical properties. The Hamiltonian of the employed model can be easily adjusted to investigate the impact of distinct interactions and sequence specificity on the randomness of the resulting conformational ensemble. In particular, starting with a bead-spring-like model and then adding more detailed interactions one by one, we construct a hierarchical set of models and perform a detailed comparison of their properties. Our analysis clarifies the role of generic attractions, electrostatics and side-chain sterics, while providing a foundation for developing efficient models for IDPs that retain an accurate description of the hierarchy of conformational dynamics, which is nontrivially influenced by interactions with surrounding proteins and solvent molecules.


Model Details Bonded interactions given by the Amber99 SB-ILDN force field
The potential for two covalently bonded atoms i and j with distance r ij is represented by where b ij is the equilibrium distance and k b ij is the harmonic force constant. The angle potential between three continuous atoms i, j and k is represented by a harmonic potential V a (θ ijk ) = 1 2 k θ ijk (θ ijk − θ 0 ijk ) 2 , where θ 0 ijk is the equilibrium angle and k θ ijk is the force constant. The potential for dihedral angles φ between two planes ijk and jkl composed by four continuous atoms i, j, k and l is defined where φ s is the equilibrium dihedral angle and k φ is the force constant. The parameters k b ij , b ij , k θ ijk , θ 0 ijk , k φ , multiplicity n and φ s are directly taking from the Amber99SB-ILDN force field without modification. Example plots of V b (r ij ), V a (θ ijk ) and V d (φ ijkl ) can be found in Fig. S1.

Excluded volume effects and 1-4 interactions
In the physics-based CG model considered in this work, the excluded volume interactions of (heavy) atoms along the peptide backbone are represented with Weeks-Chandler-Andersen (WCA) potentials ( Fig. S1(d)), constructed directly from the Lennard-Jones interactions of the AMBER99SB-ILDN force field. Exclusions were treated per the AMBER99SB-ILDN force field, i.e., i/i + 1 and i/i + 2 pairs were excluded, while the interaction for i/i + 3 (i.e., 1-4) pairs were reduced by a factor of 0.5 (i.e., fudgeLJ = 0.5 in the terminology of the GROMACS simulation suite).

Hydrophobicity scale and IDP classification of ACTR
The models employing hetero-hydrophobic interactions scale the attraction between C β atoms according to the residue identities. The hydrophobicity scale, based on the Miyazawa-Jernigan interaction matrix, 2 is shown in Fig. S2(a) and in Table S1. IDPs that tend to sample heterogeneous conformational ensembles, including many of the known proteins which undergo couple folding and binding.  interactions. The top and bottom triangles correspond to the "full" coarse-grained resolution and a coarser resolution obtained by partitioning the peptide chain into segments of four resides and averaging over pairs belonging to these segments, respectively. We test the convergence of the trajectories by calculating the autocorrelation function

Pairwise hydrophobic and electrostatic interactions
, where t is the lag time and A is an observable. Fig. S4 presents the autocorrelation functions of the mean square gyration radius c(R 2 g , t) and mean square end-to-end distance c(R 2 e , t) for ACTR and NCBD.

Sub-trajectories
We also randomly divided each trajectory into two groups and calculated R g and h(i) for each subtrajectory (Fig. S5).  Figure S6: Percentage of helical segments per residue, h(i), for ACTR determined from simulations of (a) models 1 (black) and 2 (red), and (b) models 3a (black), 3b (red) and 4 (green).
The fraction of helical segments per residue, h(i), for ACTR determined from simulations of models 1, 2, 3a, 3b and 4 is nearly zero (Fig. S6), since hydrogen-bonding-like interactions are not included in these models.

Scaling of root mean square distances
In the main text we have presented the root mean square normalized distance between C α atoms for two residues separated by | j − i | residues along the chain, d 0 Cα (| j − i |) , for each of the considered models. We define d 0 In previous work, 1 these root mean square distance curves have been fit to an effective scaling law, in correspondence with the ν values presented based on the fitting of S(q) in the main text. Although the results from the simulation models that behave as self-avoiding walks can be approximately fit to a single effective ν which matches that determined from S(q), the other models demonstrate more complex behavior. In particular, as discussed in the main text, d 0 Cα (| j − i |) for the more detailed models shows three distinct regions along with three distinct effective ν values. This is an additional benefit of the normalized root mean square distance, which better highlights these regions. Figure S8 presents a plot of

T -dependence of bead-spring models
We considered two bead-spring (BS) models with WCA (BS) or Lennard-Jones (BS-LJ) potentials employed to represent interactions between monomers. The distribution of R g for these models is presented in Fig. S9. While the distribution is temperature independent for the BS model (panel (a)), R g decreases with increasing temperature for the BS-LJ model.  Free-energy landscapes along R g and R e Fig. S11 presents free-energy surfaces as a function of R g and R e for ACTR determined from simulations of models 3a, 3b and 4. Clearly, the shape parameters R g and R e do not clearly resolve the differences between the generated ensembles.   Fig. 2b of. 5 The experimental profile is also presented in both forms in Figs. 1 and S1 of. 6

Knots in polyA
In contrast to ACTR and polyG, we observe knotted structures in the simulations of polyA ( Fig. S13). It seems that the higher rigidity (larger h(i)) and tight compactness (small R g ) of polyA allows the formation of large stable loops during the simulations, enabling threading of the termini through the loop. Knots have also been found in other polypeptides, such as in polyglutamine (polyQ). 7 In our work, the knotting probability of polyA is roughly 3%, and these knots persist for tens to thousands of τ . These short lifetimes result in a small effect on the average fraction of helical segments (Fig. S14).