Sampling globally and locally correct RNA 3D structures using Ernwin, SPQR and experimental SAXS data

Abstract The determination of the three-dimensional structure of large RNA macromolecules in solution is a challenging task that often requires the use of several experimental and computational techniques. Small-angle X-ray scattering can provide insight into some geometrical properties of the probed molecule, but this data must be properly interpreted in order to generate a three-dimensional model. Here, we propose a multiscale pipeline which introduces SAXS data into modelling the global shape of RNA in solution, which can be hierarchically refined until reaching atomistic precision in explicit solvent. The low-resolution helix model (Ernwin) deals with the exploration of the huge conformational space making use of the SAXS data, while a nucleotide-level model (SPQR) removes clashes and disentangles the proposed structures, leading the structure to an all-atom representation in explicit water. We apply the procedure on four different known pdb structures up to 159 nucleotides with promising results. Additionally, we predict an all-atom structure for the Plasmodium falceparum signal recognition particle ALU RNA based on SAXS data deposited in the SASBDB, which has an alternate conformation and better fit to the SAXS data than the previously published structure based on the same data but other modelling methods.


Ernwin energy function: Loop-loop interaction energy
In the original Ernwin publication, we used the reference ratio method to fit the distance between hairpins like loop-helix orientations to a target distribution.While this worked well for RNA molecules up to 200 nucleotides, it caused problems with longer molecules, where it disallowed isolated hairpins (with a distance above 100 Å to the next hairpin).With the new versions of Ernwin an alternative energy function is available, which is based on a chemically intuitive binary classification in interacting or non-interacting pairs of loops.

Handling of multiloops
A multiloop with n single-stranded segments is fully determined after sampling n − 1 segments and the n th segment does not have to be sampled.Here we introduce a way to still assign a fragment to this broken multiloop segment.We choose to measure the fit of the fragment assigned to the broken multiloop segment by the deviation between the stem that would be assigned to it (virtual stem) and the way this stem is actually placed.This deviation is measured by the distance between the stems' starts, the angle between them and the difference in twist angle.
Ernwin supports two ways of assigning a fragment to the broken multiloop segment: 1. Searching-Strategy: Whenever a junction is evaluated, Ernwin can search through all possible parameters for the broken multiloop segment and accept the one that minimizes the deviation between the true and the virtual stem.(SFJ energy) 2. Sampling-Strategy: Or Ernwin can treat the broken multiloop segment like any other segment and sample a single fragment for it.(FJC energy) In both cases the constraints on the maximal deviation between the stem and the virtual stem are then evaluated (implemented a junction constraint energy) and structures with a too high deviation are rejected.Obviously the second strategy (Sampling-Strategy) makes rejects by far more likely than the first.
We benchmark the different approaches by sampling a 3-way junction with 3 equally long multiloop segments and equally long stems.Ideally, all 3 multiloop segments should be indistinguishable and have the same distribution of conformations.The only thing that breaks this symmetry in our model is the fact that we have one broken multiloop segment, and this introduces a bias into such loops.We measure this asymmetry using the distribution of angles between adjacent stems as a 1-dimensional proxy for the distribution of multiloop segment conformations.As in our model and for the given scenario, the artificial bias introduced by the broken multiloop segment is the only thing that makes these distributions distinguishable, we know that more similar distributions for all 3 segments mean better constraint energies.

Move sets
For Ernwin, we have implemented and benchmarked the following movers in addition to the Simple-Mover which just changes a random fragment: The NElementMover and the OneOrMoreElementMover change two or more random elements at once before evaluating the constraints and energy functions.The ConnectedElement-Mover is a version thereof that changes n elements that are connected to each other.The WholeML-Mover is very similar, but instead of using a fixed number of fragments to change, it moves all fragments of the multiloop at once.
The EnergeticJunctionMover changes 2 or more adjacent single stranded regions of a multiloop.In a single Monte Carlo step it tries all combinations of fragments for the selected single stranded regions until the multiloop fulfills the constraint.This allows for computational optimizations compared to doing a full Monte Carlo step for every combination of fragments.With this move type, version 2 of the new multiloop constraint (assigning a single fragment to the broken multiloop segment) becomes somewhat feasible.
The MoveAndRelaxer not only addresses the difficulty to fulfill the junction constraints but also the challenges in fulfilling the clash constraints, which mostly come up for larger structures.After sampling a single fragment, we try to change the remaining fragments as needed to remove clashes and fix junction constraints.This is done in a way inspired by the (full) cyclic coordinate descent [1,2] algorithm and can be seen as a kind of gradient walk.We note that after removal of the broken multiloop segments, the Bulge Graph [3] is a rooted tree, where the root node is the element that is first placed (typically the stem "s0" placed at the origin).This tree is a spanning tree of the Bulge Graph, which is a minimum spanning tree if we use the lengths (in nts) of elements as edge weight.For each loop that no longer fulfills the constraints due to the change of the first fragment (in case of pseudoknots it can be more than one loop), starting at the smallest loop, we begin relaxation at the segment closest to the minimum spanning tree's root, because changing this segment will have the greatest impact on the broken multiloop segment.For this segment we try fragments until the constraints are fulfilled or we have tried N (currently 75) fragments.In the latter case we pick the best fragment and continue by relaxing the segment next closest to the minimum spanning tree's root.During choosing of the best fragment, removal of clashes between stems within the loop we try to close is given priority over minimizing the gap at the broken multiloop segments, because the number of changeable segments between the clashing stems (degrees of freedom) is smaller or equal the number of segments in the loop, which can be relaxed to achieve loop closure.After relaxing the junctions, we try to move additional interior loop elements to remove the remaining clashes.
The gradient walk in this order sometimes gets stuck in a local optimum which does not fulfill all constraints, but even for structures with complex pseudoknotted loop structures it finds a valid solution often enough.With this move type, the second type of fragment based constraints is mostly identical to the first type, because the move itself performs searches through the fragments.By choosing a maximum deviation of 12 Å distance between the stems' starts and 48 degrees between the stem angles and twist angle, we typically find a matching combination of fragments in reasonable time, since we can evaluate this property of the junction independent of the rest of the structure.
To benchmark the different move types we used 3 test structures with different complexity: A 3-way junction, chain X of 2GDI, a tRNA (1FIR) as a simple structure with a single 4-way junction and a 3-way junction plus a kissing hairpin pseudoknot (2YGH).We used no energy (and accepted all structures that fulfilled the constraints).As a starting conformation we use the structure with the highest RMSD to the native structure out of 25 random structures.We then let simulations run for 10 hours and counted the number of unique multiloop conformations in the sampled ensemble after 1 minute, 10 minutes, 1 hour and 10 hours.A conformation is considered unique based on the identity of the sampled fragments for all not-broken multiloop segments.

Fitting the ensemble to the SAXS derived PDD
As Ernwin is a coarse grained method and individual nucleotide positions will be refined anyway by SPQR to remove clashes, a rough estimation of the pair distance distribution function is sufficient for our simulations.Thus we neglect the effect of the hydration shell and use fewer points than the number of atoms for calculating the PDD.This improves the simulation speed a lot, primarily because Ernwin can always hold the full coarse grained fragment library in the RAM, whereas the all atom coordinates of loops and would have to be read from the disk when the fragment is first used.Additionally also the computation time needed to evaluate the PDD in each sampling step depends on the number of point used.Currently Ernwin uses a sub-optimal naive O(n 2 ) algorithm for calculating the PDD from a set of points in 3D space, but this could be improved to O(nlog(n)) by using Fast Fourier Transforms.
Using only the start and end coordinates of stems is insufficient for getting a good approximation of the all atom PDD, so we need more points.For longer RNA molecules it is enough to use 1 point per nucleotide, in our case the the coordinates of the C1' atoms, as we have shown on the example of braveheart [4].However, we noticed that for shorter RNA molecules, the errors introduced by using only 1 point per nucleotide do not average out as well as they do for longer molecules.Therefore we extended out fragment library to now store 3 point per nucleotide in the local coordinate system of the fragment.While the use of 3 points per nucleotide is common in coarse grained RNA structure prediction, we like to point out that Ernwin still uses rigid fragments without inner degrees of freedom and only uses these points for the energy evaluation (calculation of the PDD).
We provide 2 energy functions based on the pair distance distribution.The easier one (called "PDD-energy" simply applies an exponential potential on the deviation between the experimental PDD and the sampled, coarse grained PDD.We calculate this deviation as area between the PDD curves, and since we model the PDD as histogram of distances, this is simply a sum of absolute differences.Assuming 1 Å bins, the energy is where p sampled (d) is the value of the pair distance distributions function of the sampled model at distance d and p experimental (d) is the value of the experimental SAXS PDD at distance d.The second version, the "EPD-energy" (ensemble pair distance distribution) tries to optimize the PDD of the sampled ensemble and not of an individual structure.As our tool especially shines when dealing with larger RNAs, which we expect to be intrinsically flexible to a certain degree, optimizing the pair distance distribution function of the ensemble instead of the PDD of an individual structure better captures reality.We achieve this in the framework of the reference ratio method (as described in the main text) by optimizing the distribution of distributions towards an artificial distribution around the experimental PDD.We do this point-wise by viewing the PDD as a normalized histogram and using a Gaussian distribution around the experimental frequency of each bin as target distribution, as illustrated in Figure S2.The energy depends on the probability density of the reference distribution R(d) and the target distribution where the reference distribution is a normal distribution around the observed value of the (normalized) PDD at this distance.As standard deviation we used e experimental (d), the error from the SAXS data at this distance d, normalized by the same normalization factor as the PDD.This error is scaled by a configurable factor c.This factor is the same for all bins and we chose a pretty arbitrary value of 50 for c, which was chosen by looking at plots similar to figure S2 with the goal of giving the RNA enough flexibility (a considerable distance between the red and the orange line) without taking a very huge value (the lower orange line should be significantly above zero on most of the plot).The PDD of the structures that were sampled so far are shown as dots, to indicate that we calculate the PDD in discrete bins.Each color of the dots corresponds to a single structure (i.e. a single sampling step).At every sampling step, the energy of the proposed new structure is evaluated in a pointwise fashion for each of the PDD's bins individually.At the left and right side of the central plot, this is indicated for two of those bins.By comparing the previously sampled frequencies of a given distance to the desired distribution given by the experimental PDD at this distance and the corresponding errors, we can derive an energy function.In the example illustrated here we have already finished n sampling steps.At the distance bin d 1 each of the n structures so far shows a higher percentage of nucleotide pairs falling into this bin than expected from the experiment.Thus, in the left plot, the reference distribution has a peak above the peak of the target distribution.So if structure n + 1 has fewer nucleotide pairs at distance d 1 , it would be favored by the energy.However, the energy is a sum over all distance bins.This plot shows a second distance, d 2 as example on the right hand side, but in fact all distance bins contribute equally.In some of the previously sampled structures fewer nucleotide pairs were at d 2 than expected from the experiment, a couple of structures had more nucleotides than expected at d 2 and some structures had almost the expected percentage of nucleotides at this distance.
Overall, the sampled and the desired frequencies are pretty similar.Thus it does not matter so much how many nucleotide pairs structure n + 1 has at distance d 2 , the energy is going to always be pretty similar.
The reference distribution is estimated using a kernel density estimate based on the PDD of the previously sampled structures at the distance d.
2 Supplementary methods -SPQR 2.1 Removal of clashes and reconstruction of broken bonds in SPQR.
The detection of clashes and broken bonds is straightforward in the evaluation of energy of SPQR.After a clash detection by the evaluation of the total energy by SPQR_ENERG, an energy relaxation is performed by the binary SPQR_wSA if there exists any clash or broken bond, which makes use of a particular energy function which tolerates the clashes and helps to remove them.The excluded volume interaction acts on pairs or particles, which can be phosphate, sugars or bases.The sugar particle is in practice a virtual site defined by the position and orientation of the base particle, which is a triangle from where it can be defined a reference frame given its position and orientation.Therefore, the interaction energy between particles A and B is given by where U IJ (R IJ,I ) depends on the species of I and J, and the vector joining the positions R I − R J in the reference frame given by the orientation of particle I.The functional form of U IJ is where the parameter a ij depend on the species of I and J.Note that, for unstructured particles such as the phosphate beads, the above expression is reduced to a hard-sphere potential.In a similar fashion, the potential becomes infinite when a bond between sugar and phosphate beads is broken.A modified version of SPQR, included in the simulation package, can deal with these situations by imposing an energy term which does not diverge but pushes particles away when a clash is detected, or favors the formation of a bond when this is broken.In that case, the energy U IJ is given by where C w = 10 5 ϵ 0 / Å2 , in terms of the SPQR energy unit ϵ 0 .For bonds and angles, the interaction potentials become where R 0 IJ is a reference vector for each type of sugar-phosphate bond: intra or inter-nucleotide, and depends on the sugar pucker and glycosidic bond angle state.In the case of angle interactions, makes use of a reference value θ 0 and a constant C a = 10 5 ϵ 0 .

Link detection and removal in SPQR.
Among the SPQR tools, SPQR_DLINK is a python script which can detect the links between loops from a given SPQR conformation.The format can be easily extended to atomistic PDB structures by means of the script pdb2spqr.py.The script requires the secondary structure in Vienna format from a FASTA sequence, from where the loops are defined in terms of the residue indexes which compose them.These can be hairpin loops (HL), internal loops (IL) and stems (ST).Hairpin loops include also their closing pair, while internal loops are not closed but only a single segment in the piercing method, or closed by considering the stack bounded by the complementary pairs of the first and last nucleotide of the loop.
For the evaluation of the Gauss integral and links in the piercing method, the loops are identified and discretized by the segments joining the consecutive sugar and phosphate beads.In the latter case, each triangle is composed of two consecutive beads and the geometrical center of the loop.The links which have at least one piercing, or a Gauss integral different from zero, are written in an output file which contains the type of each loop and the residue numbers of the nucleotides that constitute it.
The removal is done by a simulation with the SPQR_wSA version, which in this case, receives the link input file created by SPQR_DLINK.For disentangling the loops, a repulsive interaction is defined between specific virtual sites defined on each loop as follows: • The geometrical center of loop i, denoted by R i • The edge site(s) of loop i, denoted by r j i .For a hairpin loop, there is only one edge site, which is defined as the midpoint between the sugar beads of the closing pair.Stems and internal loops have two edge sites, defined as the midpoint of the closing pairs or by the positions of the sugar beads of its first and last nucleotides, respectively.The energy function has an additional repulsive energy function between each pair of linked loops defined in the list, by imposing energy terms between virtual sites and modifying the usual energy function.More specifically, the repulsive energy term is exerted between the geometrical center of each loop and the edge sites of its counterpart, as If a stem is involved, an additional repulsive term between the geometrical centers is also added.In addition, the excluded volume interaction between beads of different loops is ommited, which can be written as The energy terms are illustrated in Figure S3 for loops found in the samples of the PDB:4PQV.The values of K ll and K ss were of 500 reduced units, except when an edge site of a duplex was involved, where a value of 5 reduced units was employed.
For linked hairpins, it is clear that a repulsive interaction between the center of mass and closing loops will eventually separate and unlink the loops.The stems are typically more elongated, so they must be pushed away from their center.
The repulsion between the geometrical centers and the closing pairs ensures that the loops will drift away in the right direction.If the edge sites are not included, it is likely that the loops will disentangle towards their side and therefore, the removal of the loop will succeed but creating another link involving a consecutive loop.
The whole procedure is in the script SPQR_REFINE.Using an atomistic structure as input, it minimizes the energy, removes clashes and links if they are found, to reconstruct the structure in an atomistic pdb.

ERMSD restrained simulations in SPQR.
SPQR has the possibility to apply restraints on the ERMSD, a metric which depends on the relative positions and orientations of the nucleobases which has been shown to be particularly useful for dealing with nucleic acids [5].In the current version, the restraints can be applied by means of a harmonic potential which depends on this quantity with respect to a reference structure.The structure can be composed of a set of fragments, and therefore, the structures can be enforced locally as in the case of the secondary structure restraints or the tertiary contacts which are composed by three or more nucleotides.

Tertiary contact search in SPQR.
The search has been performed on the structures with the same parameters, and making flexible the pucker and glycosidic bond angle state of the nucleotides which are not involved in the stems.A simulated annealing started from a temperature of 12ϵ S , where ϵ S is the SPQR energy unit, as described in [6], with 200000 Monte Carlo sweeps for each annealing step and a temperature prefactor of α = 0.75.The configurations were saved each 10000 sweeps, and all of them were analyzed in search of possible contacts.All the ocurrences of nucleotides corresponding to motifs which interact according to the Ernwin annotations are saved and used as ERMSD restraints in a later simulation, which enforces these pairs and pulls back the whole structure to its original shape.The simulation is performed at a temperture of 9ϵ S for 50000 MC sweeps, while the constant κ E has the value of 5000ϵ S for the tertiary contacts and 5ϵ S for the global arrangement.As a final step, the energy is minimized by a short Monte Carlo simulation at zero temperature for 10000 sweeps.10 different random seeds were used in this approach to recover as much contacts as possible, which in some cases, by small perturbations could be missed from the criteria of the SPQR annotation tool.

Backmapping from SPQR structures.
A brute-force backmapping is applied on the final, refined structure, with the largest number of tertiary contacts predicted by Ernwin and without links and clashes in the SPQR representation.In this case, a template atomistic nucleotide is placed on top of each SPQR nucleotide, taking into consideration its glycosidic bond angle conformation and sugar pucker.The template is taken from the PDB as a representative nucleotide for each of the states represented in SPQR.As an alternative, it can also be taken from the initial structure proposed by Ernwin.The backmapping procedure selects the one that has the lowest RMSD after optimal superposition of the position and orientation of the nucleobase in an SPQR representation over the SPQR refined structure.An additional step can be added by relaxing the positions of the phosphorus atoms using the script SPQR BMAP RELAX.py .
Finally, a short steered-MD simulation is performed in Gromacs 2021.2 on the backmapped structures after a short energy minimization using steepest descent.The simulations employed the TIP3P water model [7] and the Amber99 force field [8] with χOL3 [9] and parmbsc0 [10] corrections in a truncated dodecahedral box with Na + counterions [11].The temperature was kept constant at 300K by means of the V-rescale thermostat using a consant of τ =0.1 ps.The simulations employed a harmonic moving restraint on the ERMSD with respect to the refined SPQR structure, using Plumed 2.7.1 [12,13] .For the PDB structures, the simulations had a steering constant of 10000 kj/mol, which was applied over 240 ps, and the ERMSD cutoff was of 10.The lowest ERMSD structure was selected from the whole simulation.
In addition, in the SASDK34 structure we applied a RMSD steering with respect to the reconstructed Ernwin structure.This produced in general a slightly better deviation of χ 2 with respect to the Ernwin structure results and produced slightly lower clashscore.The coupling constant was of 10000 kj/mol Å2 and the simulation was of 200 ps.igure S4: Effect of different types of junction constraints.This figure shows an example of a sampled structure of a 3-way junction which for our model exhibits 3-fold symmetry on the level of the secondary structure.The angular histograms describe the sampled distribution of angles between the adjacent stems using two sampling approaches respectively."New" refers to a fragment-based constraint, while "old" uses only distance-based constraints.It can be seen that the fragment based approach creates the bi-modal distribution of angles (corresponding to a stacking pair of stems) on all 3 junction elements and thus captures the symmetry of the example much better than the old approach.the same time.The MoveAndRelaxer and combinations of movers using it are most useful for complex multiloops and pseudoknots, whereas the computational time spent on the relaxation step does not pay off that much for simple 3-or 4-way junctions.

Benchmark against solved PDB structures
See Figures S6, S7 and S8.
4 Supplementary Results -SPQR 4.1 Special cases SPQR can remove links between stems, hairpins and internal loops, which can be composed by one or two strands.A junction, which by definition has three or more stems connected by unstructured segments, can be defined as a closed loop and incorporated in the Gauss integral, but an analogous definition of the virtual sites might not be optimal for the removal of the link.In this case, we opted for identifying the single strand which pierces the other loop, and treat it as a link between a single strand and a closed internal loop.This decomposition can be generalized to any type of junction, which will be automatized in future versions of the code.The structure before and after refinement is

Clashscore after refinement
See Table S3.

Detailed benchmark of results against real SAXS data
Table S4 includes the adjustable parameters of Crysol 3.0's fit.Crysol 3.0 (now crysol --shell=water) minimizes the χ 2 by fitting the contrast of dummy water beads.There are 3 kinds of beads, for outer surfaces (convex), for concave surfaces and (small) inner cavities [15].The values of these parameters are fitted in a range from -10 to 2. [15] In table S4 we mark values of the parameter at the edge of the allowed interval in red, to indicate that these fits have to be taken with a grain of salt.

Effect of errors in the pair distance distribution function
The pair distance distribution function is the Fourier transformation of the scattering intensity profile.In practice, this conversion depends on an initial estimate of the maximal interatomic distance (D max ) in the molecule, and wrong estimates can distort the resulting PDD function.To benchmark the effect of this on Ernwin's sampling, we re-ran GNOM on the scattering curve of the plasmodium falceparum SRP RNA (SASBDB id SASDK34) with the same parameters as used by Soni et al [16], but varying maximal distances.We then ran 4 Ernwin simulations with 3000 steps for each generated PDD, and analyzed every 25th structure from the last 1000 steps with CRYSOL.Figure S14 shows the best χ 2 of each trajectory against the D max used as input to GNOM.At least in the case of this RNA, the Ernwin method seems to be pretty robust against errors in the initial estimation of the maximal distance.As can be seen in figure S15, different values for D max do not destroy the overall shape of the PDD too much.Additionally, by using rigid elements based on a fixed secondary structure, the maximal distance of the Ernwin model is limited by what is chemically possible.Even if the maximal distance is pretty wrong, providing the corresponding PDD to Ernwin is still better than providing no experimental data at all, as most simulations without a PDD energy never reach a χ 2 below 100.Table S3: Clash score for the sets of generated structures.

Figure S1 :
Figure S1: Fragment based multiloop constraint: The native conformation of the 3-way junction is shown in bright colors (green for stems, red for multiloop segments).Additionally, possible assigned multiloop segments and their corresponding virtual stem are shown in dimmer colors: green if they deviate at most by 8 Å and 32 degrees, yellow if the deviation is at most 12 Å and 48 degrees and red otherwise.The inset shows the same model from a different angle.

Figure S2 :
Figure S2: Ensemble based SAXS energy: The experimental PDD is shown as red line in the central plot.The errors exaggerated by a configurable factor are shown as orange lines.The PDD of the structures that were sampled so far are shown as dots, to indicate that we calculate the PDD in discrete bins.Each color of the dots corresponds to a single structure (i.e. a single sampling step).At every sampling step, the energy of the proposed new structure is evaluated in a pointwise fashion for each of the PDD's bins individually.At the left and right side of the central plot, this is indicated for two of those bins.By comparing the previously sampled frequencies of a given distance to the desired distribution given by the experimental PDD at this distance and the corresponding errors, we can derive an energy function.In the example illustrated here we have already finished n sampling steps.At the distance bin d 1 each of the n structures so far shows a higher percentage of nucleotide pairs falling into this bin than expected from the experiment.Thus, in the left plot, the reference distribution has a peak above the peak of the target distribution.So if structure n + 1 has fewer nucleotide pairs at distance d 1 , it would be favored by the energy.However, the energy is a sum over all distance bins.This plot shows a second distance, d 2 as example on the right hand side, but in fact all distance bins contribute equally.In some of the previously sampled structures fewer nucleotide pairs were at d 2 than expected from the experiment, a couple of structures had more nucleotides than expected at d 2 and some structures had almost the expected percentage of nucleotides at this distance.Overall, the sampled and the desired frequencies are pretty similar.Thus it does not matter so much how many nucleotide pairs structure n + 1 has at distance d 2 , the energy is going to always be pretty similar.

Figure S3 :
Figure S3: Depictions of a structure for link removal: Hairpin-hairpin link in A) SPQR representation and B) ring representation, stem-internal loop in C) SPQR representation and D) ring representation Ring representations include geometrical centers and edge virtual sites.Note that the repulsive interactions are represented by double arrows in green and cyan for the interaction between a geometrical center and the edge sites of one loop and the other, and viceversa.The interaction between the geometrical centers is colored in orange.

Figure S5 :
Figure S5: Comparison of distributions of angles.For a symmetric 3-way junction, we measure the angle between the stems adjacent to each multiloop segment to produce 3 distributions.Then we compare them in a pairwise manner using the two sample Kolmogorov-Smirnov test.We used different movers (rows in the figure) and different junction constraint energies (x axis of each figure) for sampling and only took every 25th sampling step to achieve independent data points.The left column is the normal KS-test.For the right column the angle distributions were blurred before the KS-test was performed.The junction constraint energies used (= x-labels) were: JDIST: distance based constraints, 8FJC: Fragment based junction closure energy, corresponds to Sampling-Strategy at cutoff 8 Å and 4*8=32 degrees, 8SFJ: Searching Fragment based junction closure energy, corresponds to Searching-Strategy

Figure S6 :
Figure S6: top Native structure for 2R8S (yellow) and predicted structure with the best PDD energy (blue).Bottom right A predicted structure with reasonable PDD energy but bad fit (RMSD 32) to the native structure (bright colors) and the native conformation (dim colors).

Figure S7 :
Figure S7: Native structure for 3R4F (yellow) and predicted structure (blue).Top: The predicted structure with the best PDD energy Bottom left The lowest RMSD structure among the set of structures with the best PDD energy of the respective trajectory.In total it had the 3rd best PDD energy value.Bottom right The structure with the best RMSD (but with a bad PDD energy)

Figure S8 :
Figure S8: Native structure for 4PQR (yellow) and predicted structure (blue) with the best PDD energy.

Figure S13 :
Figure S13: Contact between h0 and s9 in 1L9A a) after Ernwin reconstruction and b) after SPQR refinement.

Figure S14 :
Figure S14: Benchmark using different values for D max .

Figure S15 :
Figure S15: The PDD calculated with GNOM for different values of D max

Table S2 :
Benchmark of different move types, full data.A summary of this data can be found in tableS1.

Table S4 :
Description of the structure with the lowest χ 2 of the best trajectories.The conformation was assigned by manual inspection.The values of the adjustable para,meters that were fitted by Crysol 3.0 are reported and marked in red if they are at the border of the allowed interval.χ 2 (Ernwin) Convex Concave Inner χ 2 (refined) Convex Concave Inner Conformation