Computational Investigation of RNA CUG Repeats Responsible for Myotonic Dystrophy 1

Myotonic Dystrophy 1 (DM1) is a genetic disease caused by expansion of CTG repeats in DNA. Once transcribed, these repeats form RNA hairpins with repeating 1×1 nucleotide UU internal loop motifs, r(CUG)n, which attract muscleblind-like 1 (MBNL1) protein leading to the disease. In DM1 CUG can be repeated thousands of times, so these structures are intractable to characterization using structural biology. However, inhibition of MBNL1-r(CUG)n binding requires a detailed analysis of the 1×1 UU internal loops. In this contribution we employ regular and umbrella sampling molecular dynamics (MD) simulations to describe the structural and thermodynamic properties of 1×1 UU internal loops. Calculations were run on a reported crystal structure and a designed system, which mimics an infinitely long RNA molecule with continuous CUG repeats. Two-dimensional (2D) potential of mean force (PMF) surfaces were created by umbrella sampling, and the discrete path sampling (DPS) method was utilized to investigate the energy landscape of 1×1 UU RNA internal loops, revealing that 1×1 UU base pairs are dynamic and strongly prefer the anti–anti conformation. Two 2D PMF surfaces were calculated for the 1×1 UU base pairs, revealing several local minima and three syn–anti ↔ anti–anti transformation pathways. Although at room temperature the syn–anti ↔ anti–anti transformation is not observed on the MD time scale, one of these pathways dominates the dynamics of the 1×1 UU base pairs in temperature jump MD simulations. This mechanism has now been treated successfully using the DPS approach. Our results suggest that local minima predicted by umbrella sampling calculations could be stabilized by small molecules, which is of great interest for future drug design. Furthermore, distorted GC/CG conformations may be important in understanding how MBNL1 binds to RNA CUG repeats. Hence we provide new insight into the dynamic roles of RNA loops and their contributions to presently incurable diseases.

Equilibration and production input files used in the study of r(3×CUG inf ). p. S7 Table S2. RNA CUG motifs with 1×1 UU loops extracted from CoSSMos database.
p. S28 Figure S15. RMSD analysis of dangling uridine ends in 3×CUG 3SYW . p. S29 Figure S16A and S16B Basepair step analysis of two neighboring duplexes in 3×CUG inf /anti-anti and 3×CUG inf /syn-anti. This trick allows us to design an infinitely long linear RNA duplex.
Movie S4. Trajectory of U 6 U 17 extracted from MD simulation of 3×CUG inf /syn-anti ( Figure 1C) between 349 and 379 ns ( Figure 8). The Watson-Crick G 7 C 16 base pair with 3 hydrogen-bonds is also shown in the movie in order to emphasize the Na + pocket ( Figure 9). Freely moving Na + ions are displayed with blue spheres. Whenever a Na + ion is within 3 Å of O 2 and O 2 P of U 6 and N 7 of G 7 it is displayed in a bigger and transparent blue sphere. In the MD trajectory of U 6 U 17 extracted from MD simulation of 3×CUG inf /syn-anti no syn-anti→anti-anti transformation was observed. Yet, we saw that there were at least four times where U 6 of U 6 U 17 unstacked from the helical axis ( Figures 3G and 8).
Transformation of U 6 from stacked state to unstacked state only happened when the Na + binding pocket was not occupied by a Na + ion. Yet, this state has a short lifetime because Na + ions occupy the binding pocket rapidly and stabilize the syn-anti UU conformation by bringing back U 6 to the stacked conformation.
Preparation of (θ 1 ,χ) states. The initial conformation of the RNA in the model system 1×CUG was designed to have the published NMR coordinates (PDB accession code 2L8U). 1 Following the minimization and equilibration steps described in the main text (Section "Molecular dynamics simulations"), an MD simulation for 7.2 ns was run where the pseudo-dihedral angle (θ 1 ) was incremented by 5° intervals for each 100 ps time interval ( Figure 2B). During this process, all the heavy atoms of RNA except U5 and U14 ( Figure 1A) were kept frozen using positional restraints.
Furthermore, torsional restraints were imposed on U14 to keep it around the initial conformation. A χ torsional restraint was imposed on U5 to keep it in the initial anti conformation while a θ 1 torsional restraint was used to keep U5 in the new pseudo-dihedral state. At the end of the simulation, only 36 of the θ 1 states (0, 10, 20, …, 350) were used in the next step. The 5° increment on θ 1 was done in order to smoothly transform the conformation of U5 to its new conformational state. Once the initial θ 1 torsional states were created each state was simulated for another 3.6 ns where now χ was incremented by 10° intervals for each 100 ps time interval (Figure 2A). Again, positional restraints were imposed on the heavy atoms of RNA except U5 and U14. Similar to the first step, torsional restraints were imposed on U14 to keep it near the initial conformation. A χ torsional restraint was imposed on U5 to keep it in the new χ state while a θ 1 torsional restraint was imposed on U5 to keep it in the initial conformation. At the end of the MD simulations, 36×36=1296 states with different (θ 1 ,χ) combinations were created. We further did 52 extra simulations to create a better overlap of the distributions shown in Figure S5.
Preparation of (θ 1 ,θ 2 ) states. The methodology described above to create the different (θ 1 ,χ) states was followed here, too. Initial structures created by rotating around θ 1 described above were simulated for another 7.2 ns where now θ 2 was incremented by 5° intervals for each 100 ps time interval ( Figure   2B). Similar positional restraints were imposed on the heavy atoms of RNA except U5 and U14. The χ torsions of U5 and U14 were restrained to stay in anti conformations. A θ 1 torsional restraint was imposed on U5 to keep it in the initial conformation while a θ 2 torsional restraint was imposed on U14 to keep it in the new conformational state. At the end of each MD simulation, only 36 of the θ 2 states (0, 10, 20, …, 350) were taken to be used in the umbrella sampling MD simulations, which include 36×36=1296 states with different (θ 1 ,θ 2 ) combinations.
Equilibration. Once the initial conformational states were created as described above, pressure was equilibrated in two steps in which the heavy atoms of RNA except U5 and U14 ( Figure 1A) were held fixed with restraint forces of 0.1 and 0.01 kcal/mol-Å 2 , respectively, at each step. Similar torsional restraints described above as well as torsional restraints on χ and θ were imposed on RNA during the equilibration steps. Constant pressure dynamics with isotropic positional scaling was turned on in each Yildirim et al. S5 step. The reference pressure was 1 atm with a pressure relaxation time of 2 ps. The temperature was kept at 300 K. For each step 50 ps of MD was run with a 1 fs time step. Particle Mesh Ewald (PME) 2,3 was always turned on.
Production runs. A similar procedure to that described above in the second equilibration step was followed in each production run. No restraints were used in the production runs except the ones imposed on χ and θ torsions. To restrain the χ and θ torsions in the umbrella sampling MD simulations, a square bottom well with parabolic sides was used with a 50 kcal/mol-rad 2 force constant. For each simulation, a total of 2 ns of MD were run with a 2 fs time step; (θ 1 ,χ) and (θ 1 ,θ 2 ) data were written at intervals of 10 fs. MD simulations were carried out with the sander.MPI module of AMBER12. 4 For umbrella sampling calculations, over 150K CPU hours were used.

Section S2. Discrete path sampling (DPS).
As the detailed formalism of DPS has been presented elsewhere, 5,6 we will only discuss the key steps involved in building stationary point databases (which constitute a kinetic transition network) 7,8 using DPS in this section.
In the DPS approach, transitions between different conformational states are characterized by discrete paths. A discrete path consists of a connected sequence of minima with intervening transition states. A minimum on the PES is a stationary point having non-zero normal mode frequencies, whereas a transition state has one unique imaginary normal mode frequency. Small displacements directed parallel and antiparallel to the eigenvector corresponding to the imaginary frequency of a transition state lead to the adjoining minima. 9 The initial guesses for transition states connecting local minima were obtained using the doubly-nudged 10 elastic band method, 11,12 which were then further refined using the hybrid eigenvector-following method. 11,13 For the DPS simulations, the AMBER topology file was properly symmetrized, following previous work. 14 The solvent and salt effects were included implicitly via the Generalized Born (GB) solvent model. 15,16 The geometry optimizations and transition state searches were carried out using the OPTIM 17 code via an AMBER9 18 interface.
The overall steady-state rate constant for the conformational transitions between the syn-anti and anti-anti can be effectively expressed as a weighted sum over all the discrete paths, if the dynamics between adjacent local minima is assumed to be Markovian. 5,6 The kinetically relevant set of pathways which make the largest contributions to the rate constant was extracted from the stationary point database using the recursive enumeration analysis method, 19,20 and visualized in order to draw mechanistic insights.

Yildirim et al. S6
Briefly, a disconnectivity graph segregates the energy landscape into disjoint sets of minima, which are mutually accessible through transition states lying below a chosen energy threshold. 21 Transitions between different sets require surmounting of higher potential/free energy barriers and hence are much slower. The disconnectivity graphs depicted in this work were constructed from the stationary point databases using the disconnectionDPS code. 22 Yildirim et al. S7 Table S1. AMBER equilibration and production input files used in the study of r(3×CUG inf ). Note that in the first step of equilibration, positional restrains were imposed on the RNA and constant volume dynamics (NVT) were used. In the second step, however, no restraints were imposed on the system, and constant pressure dynamics (NPT) were used. Reference pressure was 1 atm with a pressure relaxation time of 2 ps. Production run is a continuation of the second step of equilibration.  ns production run (explicit solvent) &cntrl imin=0, ntx=5,irest=1, ntpr=5000,ntwr=5000,ntwx=5000, ntc=2,ntf=2,ntb=2,cut=8, igb=0,ntr=0, nstlim=10000000,dt=0.001,nscm=5000,nrespa=1, ntt=3,gamma_ln=1,tempi=300,temp0=300, ntp=1,taup=2.   CoSSMos database 23 developed by Znosko and co-workers was used to find RNA CUG motifs with 1×1 UU loops (Table S1). In house code was used to extract the data from each PDB file. A and B include 192 and 96 yellow/red data points, respectively, extracted from the structures described in Table S1.
Note that all the 1×1 UU base pairs in RNA CUG seen in the literature are in anti-anti conformations and are stacked within RNA helical axis.

Yildirim et al. S29
Figure S15. RMS deviation of U 1 U 2 (A) and U 20 U 21 (B) dangling uridine ends with respect to stable conformations shown in Figure 6. Black, red, and green represent the conformations of A, B, and C, respectively, shown in Figure 6. Note that in A the conformations are less stable after 500 ns compared to B that is due to fraying of terminal base pairs.