Complex Conformational Space of the RNA Polymerase II C-Terminal Domain upon Phosphorylation

Intrinsically disordered proteins (IDPs) have been closely studied during the past decade due to their importance in many biological processes. The disordered nature of this group of proteins makes it difficult to observe its full span of the conformational space using either experimental or computational studies. In this article, we explored the conformational space of the C-terminal domain (CTD) of RNA polymerase II (Pol II), which is also an intrinsically disordered low complexity domain, using enhanced sampling methods. We provided a detailed conformational analysis of model systems of CTD with different lengths; first with the last 44 residues of the human CTD sequence and finally the CTD model with 2-heptapeptide repeating units. We then investigated the effects of phosphorylation on CTD conformations by performing simulations at different phosphorylated states. We obtained broad conformational spaces in nonphosphorylated CTD models, and phosphorylation has complex effects on the conformations of the CTD. These complex effects depend on the length of the CTD, spacing between the multiple phosphorylation sites, ion coordination, and interactions with the nearby residues.


Figure S1 .
Figure S1.The minimum distance between periodic images along the simulation time for the most expanded sequence from 2CTDs and exp-CTDs.a) 2CTD-2P-5P and b) exp-CTD-5P-22P-40P.

Figure S2 :
Figure S2: Comparison of the correlations of chemical shifts of a) Cα and b) Cβ between experimental values and chemical shifts determined from C36m and C36mw FFs for exp-CTDnon-phos sequence with linear regression analysis (trendlines are shown using red solid lines and the equations for the trendlines and R 2 values are displayed on the plots).

Figure S3 :
Figure S3: The free energy landscape using radius of gyration and end to end distance as reaction coordinates for exp-CTD-non-phos with 44 residues from the simulations with C36m (a) and C36mw (b) force fields.Low energy conformations are shown in cartoon representations.

Figure S4 :
Figure S4: The free energy landscapes from PCA analysis for 2CTD-2P-5P-9P-12P system using (a) full 500 ns of the simulation, (b) last 400 ns and (c) first 100 ns of the 500 ns simulation.The low energy conformations are also shown in each panel with cartoon representation.

Figure S5 :
Figure S5: The free energy landscapes from PCA analysis for 2CTD-non-phos system using extended 400 ns of the simulation; (a) full 400 ns, (b) first 200 ns and c) last 200 ns of the 400 ns simulation.One of common low energy conformations is also shown with cartoon representation.

Figure S6 :
Figure S6: Comparison of the secondary structure predictions from NMR chemical shifts (experimental with δ2d software) and for the central structure from the simulations using C36mw FF with the DSSP, STRIDE and KAKSI programs.

Figure S7 :
Figure S7: Visual comparison of the secondary structure predictions for the central structure from the simulations using C36mw FF with DSSP, STRIDE and KAKSI programs.

Figure S9 :
Figure S9: Distribution of radius of gyration (Rg) calculated over the MD simulation trajectories for CTD sequences with 44 residues.The black line shows the Rg distribution for the nonphosphorylated system for comparison.Standard errors are calculated by splitting the full 200 ns trajectory into 40 ns small trajectories.

Figure S12 :
Figure S12: Distribution of total number of intrapeptide H-bonds calculated over the MD simulation trajectories for CTD sequences with 44 residues.The black line shows the H-bond distribution for the non-phosphorylated system for comparison.Error bars are calculated by splitting the full 200 ns trajectory into 40 ns small trajectories.

Figure S15 :
Figure S15: Distribution of radius of gyration (Rg) calculated over the MD simulation trajectories for the 2CTD models.The black line shows the Rg distribution for the nonphosphorylated system for comparison.Error bars are calculated by splitting the full 200 ns trajectory into 40 ns small trajectories, except for 2CTD-2P-5P-9P-12P, which the 400ns trajectory was split into 80 ns small trajectories.

Figure S18 :
Figure S18: Distribution of total number of intrapeptide H-bonds calculated over the MD simulation trajectories for the 2CTD models.The black line shows the H-bond distribution for the non-phosphorylated system for comparison.Error bars are calculated by splitting the full 200 ns trajectory into 40 ns small trajectories, except for 2CTD-2P-5P-9P-12P, which the 400ns trajectory was split into 80 ns small trajectories.

Figure S21 :
Figure S21: Distribution of rotational entropies calculated over the MD simulation trajectories for 2CTD systems.

Table S1 :
Simulation system details for each CTD model.For each simulation, the cutoff distance from the box edge was set to 10 Å.

Table S2 :
Total acceptance ratios between the neighboring replicates for REMD simulations of CTD models with 14 (8 replicates) and 44 (16 replicates) residues.

Table S3 :
P-values from the T-tests of the distributions of radius of gyration and H-bonds for phosphorylated states from both 2CTDs and exp-CTDs with respect to their non-phosphorylated states.If the p-value has a decimal point power which is less than 10 -10 it was considered as 0.0 for the following table.

Table S4 :
Average number of intrapeptide H-bonds for CTDs with 44 residues and 14 residues.

Table S5 :
Densities of phosphorylated residues with respect to each sequence.Density_1 represents the number of phosphorylated residues divided by the total number of residues in the sequence while Density_2 represents the number of phosphorylated residues divided by the total number of Serine residues in the sequence.