Single-particle Cryo-EM and molecular dynamics simulations: A perfect match

Knowledge of the structure and dynamics of biomolecules is key to understanding the mechanisms underlying their biological functions. Single-particle cryo-electron microscopy (cryo-EM) is a powerful structural biology technique to characterize complex biomolecular systems. Here, we review recent advances of how Molecular Dynamics (MD) simulations are being used to increase and enhance the information extracted from cryo-EM experiments. We will particularly focus on the physics underlying these experiments, how MD facilitates structure refinement, in particular for heterogeneous and non-isotropic resolution, and how thermodynamic and kinetic information can be extracted from cryo-EM data


Introduction
In recent years, single-particle cryo-electron microscopy (cryo-EM) has become a major structural biology tool to study biomolecules [1,2].Essentially, using cryo-EM protein structures can be resolved at atomic resolution [3,4], observe solvent-mediated interactions contributing to drug binding [5], and uncover the dynamics of biomolecules by resolving many distinct conformational states [6].
Preparation of samples for cryo-EM involves rapid freezing to embed them in vitreous ice [7].The frozen samples are then imaged using an electron microscope and the resulting two-dimensional projections are combined to reconstruct a three-dimensional cryo-EM density.Finally, computational methods are used to refine atomic models against this density [8].
Molecular Dynamics (MD) simulations provide insights into the dynamics of biomolecules at the atomistic level by solving Newton's equations of motion for every atom in the biomolecular system.Their interactions are described by a potential function (force field) which is defined via a set of parameters derived from the fundamental laws of quantum mechanics and/or experimental data [9].Recent substantial advances in hardware, software, and methods [10,11] have facilitated routine application of atomistic MD methods to study biomolecular complexes at timescales exceeding microseconds [9] and comprising millions of atoms [12].Machine learning approaches to generate structural ensembles are very promising [13] but due to the limited space not discussed in this review.
The advances in both MD and cryo-EM have opened up new routes to previously inaccessible information (i) to understand the physical processes underlying cryo-EM experiments, (ii) to accurately extract structural information from cryo-EM densities, and (iii) to infer the thermodynamics and kinetics of biomolecules and biomolecular complexes from cryo-EM data.

Physics underlying cryo-EM experiments
The high vacuum required in cryo-EM prohibits the direct investigation of liquid samples at physiological temperatures.Instead, the sample is shock-frozen to cryogenic temperatures of about 90 K, embedding the sample biomolecules in ice.To preserve the biomolecules of interest in a hydrated state, the sample solution is first applied to a cryo-EM grid in a thin film (Figure 1a), which is then rapidly cooled by plunging it into a cryo coolant, e.g., liquid ethane (Figure 1b) [14].The rapid cooling is crucial because it embeds the biomolecules in vitreous (amorphous) ice resulting in a state presumably similar to the hydrated state at physiological temperatures (Figure 1c).In contrast, slow cooling would result in crystalline ice which would damage the biomolecules.
At lower temperatures, biomolecules can generally access fewer conformations and the transition rates between states are exponentially decreased [15].Temperature effects on protein dynamics have been studied with MD simulations for more than 30 years [16], often comparing to the results of X-ray crystallography experiments performed at different temperatures [15].In crystals, the highly concentrated biomolecules act as cryo-protectants [17] and, therefore, rapid cooling is not required to achieve vitreous ice.
If the temperature decrease during plunge-freezing is markedly faster than the rates of conformational changes of the studied biomolecule, the biomolecules are expected to be trapped within the conformational states that were accessible before cooling (Figure 1d, red).In contrast, if the temperature decrease is slower than the conformational changes, one would expect the biomolecules to equilibrate into lower free-energy minima, thereby perturbing the room-temperature structural ensemble (Figure 1d, blue).Evidence that a larger part of the physiological structural ensemble is preserved during the cooling process comes from cryo-EM experiments of the ribosome where the samples were kept at different temperatures before cooling [18].The largest conformational change of the ribosome is a rotation between its two subunits and was shown by combining MD simulations with cryo-EM data to occur on microsecond time scales [19].The width of the rotation angle distributions in the cryo-EM reconstructions was shown to be larger for higher initial temperatures, suggesting that the rotation did not reach an equilibrium distribution during cooling [18].Precisely how much of the room temperature structural ensemble of biomolecules is lost during plunge-freezing has been unknown until recently.
The effect of cryogenic temperature on the structure and dynamics of a protein embedded in a membrane was explored with MD simulations by Mehra et al. [20].Three sets of simulations were carried out: 'hot', 'cold' and 'cooled'.The 'hot' and 'cold' simulations both started from a cryo-EM structure and ran at the temperature before and after cooling, respectively.The 'cooled' simulations started from conformations observed in the 'hot' simulations with the 'cold' temperature.This approach corresponds to instantaneous cooling but does not simulate the actual cooling process.The ensemble of 'cooled' structures best resembled the cryo-EM structure, supporting the notion that many conformations are trapped during cooling in the cryo-EM experiment.
To quantify how much of the structure ensemble is lost during rapid cooling, recently Bock et al. [21] have combined continuum model calculations, MD simulations, and kinetic modeling.The cooling rate during plunge-freezing was estimated by solving the heat equation of a solvent layer 'sandwiched' by two liquid ethane layers, which suggested that temperatures below 150 K are reached within several 100 ns, depending on the thickness of the water film.First, an ensemble of 41 structures of a bacterial ribosome was generated using equilibrated MD simulation started from a cryo-EM structure [22].To simulate the actual cooling process, MD simulation were started from each of these structures during which the temperature was decreased to cryogenic temperatures within 100 pse128 ns approaching realistic cooling rates.Using a Bayesian approach, a kinetic model was identified which best reproduced and predicted the structural ensemble obtained from the 41 cooling MD simulations.The results suggested significant narrowing of the ensemble despite relatively fast cooling, and identified mainly three causes: thermal contraction, reduced thermal motion within local free-energy minima, and equilibration into lower-free energy minima by overcoming barriers separating these minima (Figure 1d).This kinetic model suggested that free-energy barriers below 10 kJ/mol are overcome during plunge-freezing while larger barriers are not.This threshold would increase (decrease) with slower (faster) cooling.Conformational changes subject to barriers lower than the threshold are expected to equilibrate into lower freeenergy minima, showing that cryo-EM data contains kinetic information.This finding suggests how the combination of cryo-EM cooling at different cooling rates and MD simulations of the cooling process can be used to extract kinetic information.It also suggests that and how MD simulations can be used to 'recalibrate' cryo-EM ensembles to physiological temperatures.In addition to energetic barriers, the ability of biomolecules to reach conformational states is affected by the temperature-dependent diffusion coefficient which can be extracted from MD simulations [23].Especially for large-scale motions with a flat free-energy surface the diffusion coefficient is expected to affect the ensemble observed by cryo-EM.
A different approach was recently used to probe the thermodynamics of solvent molecules within or bound to biological complexes.In particular, the recent dramatic resolution enhancements achieved by cryo-EM [1,2] reaching resolutions below 3 A ˚allowed researchers to visualize individual structural water molecules [22,24].This level of detail is particularly interesting for studying the interactions between drugs and their target biomolecules, which often is governed by adjacent hydrogen-bonded water molecules.In high resolution cryo-EM structures, in addition to direct (drug-target) interactions, a plethora of such watermediated (drug-water-target) interactions were observed [5].
The fact that these structures were obtained at cryogenic temperatures raises the question to what extent the conformations of drug molecules and the positions of water molecules are relevant at physiological temperatures.To address these questions, MD simulations at temperatures ranging between 90 K and 37 C were carried out [5].The ensembles of water positions together with a neural network that links the hydrogenbond occupancies with conformational deviations of the antibiotic suggested that the waters that mediate antibioticeribosome interactions are stable at physiological temperatures and influence the conformation of the antibiotic.

Refining atomistic models into cryo-EM maps
MD simulations are used to describe the dynamics of a biomolecule as a set of conformational states, relative probabilities of these states and rates of interconversion among them.Yet, MD simulations heavily rely on highquality starting structures, i.e., estimated atom positions and types.In contrast, cryo-EM provides a snapshot of the electronic potential function of a biomolecule, distorted by noise during the process of image collection.Although modern cryo-EM densities routinely reach sub-nanometer resolution, deriving atomistic structures from them is far from trivial.It is hence not surprising that cryo-EM and MD simulations naturally synergize: cryo-EM yields crucial structural data to inspire and guide MD simulations, while MD simulations profoundly enhance the interpretability of cryo-EM data by augmenting it with information about stereochemistry and non-covalent interactions.
The idea that a MD simulation can be exploited to flexibly 'steer' a biomolecule to align it with a cryo-EM density emerged long before near-atomistic reconstructions became available [25e28,19].However, with the ever increasing resolution and spatial and orientational resolution heterogeneity, these pioneering methods faced the problem of getting trapped in numerous local minima or structural distortions caused by the strong bias needed to reach the desired structuremap agreement.This limitation gave rise to new requirements in computational structural biology that MD-based refinement would need to fulfill, resolutionindependent density fitting, verifiable stereochemical accuracy, and automation.As a result, several methods have been developed [29e34], each targeting a specific combination of the three requirements or all of them.For example, Igaev et al. [31] have combined wellestablished techniques such as real-space correlationbased biasing potentials [28] and simulated annealing [25] with a novel adaptive resolution and half-mapbased validation scheme, thereby overcoming the previous shortcomings in an automated fashion (Figure 2).Notably, their method has a larger radius of convergence (i.e., how different the starting structure can be from the target state defined by the density) than conventional non-MD techniques, while it does not rely on additional restraints due to the use of a chemically accurate force field and efficient thermodynamic sampling.More recently, Blau et al. [34] have further improved this approach by (i) using a new type of refinement potential based on relative entropy, which is smooth and almost parameter-free; and (ii) introducing an adaptive bias scheme to balance between force-field and density-based force, potentially reducing the need for cross-validation.Testing other, more physicsinformed similarity measures can further improve the quality of the flexible fitting approach [35].
At the same time, regular community-wide competitions objectively assess the effectiveness of a certain method.Recently, the results of the 2019 Cryo-EM Model Challenge have been presented [36] that specifically target the quality of optimized atomistic models, the reproducibility of results, and the performance of various similarity measures.In a followup competition, the reliability and reproducibility of modeling ligands bound to protein and protein/nucleiceacid complexes have been tested [37].Such events, not only offer individual teams with opportunities to test their methods in an unbiased fashion; more importantly, they provide specific recommendations about refining near-atomic cryo-EM structures which are currently becoming gold standards for the whole community [36,37].Using correlation-based refinement in MD simulation to steer the atomic positions of a biomolecule such that they optimally fit a cryo-EM map.The molecule is subjected to a global biasing potential in addition to the MD force field.The forces resulting from the force field act on every atom to enhance the real-space correlation coefficient between the cryo-EM density (green) and the density calculated from the current atomic positions (blue).

Kinetics and thermodynamics from cryo-EM data
The MD-based refinement approaches discussed above aim to minimize the deviation between an experimental cryo-EM map and a cryo-EM map predicted from a single structural model.For 3D reconstructions, cryo-EM images are usually first sorted into different conformational states (classes), which then are refined individually.This refinement against cryo-EM maps of different conformational states along with the number of particles in the cryo-EM images that are assigned to the states provides a first description of the thermodynamics of a biomolecular system [18,19].In this section, we discuss MD-based approaches to gain additional information of the structural ensembles either from 3D cryo-EM map ordeven betterddirectly from the 2D images.Towards this end, Bonomi et al. developed an integrative modeling approach which uses multiple-replica MD simulations to represent the ensemble of structures [38,39].In this Bayesian approach, the MD force field was used as the structural prior information.The likelihood function is the probability of obtaining the 3D cryo-EM map given the ensemble of structures from all replicas.It is a particular advantage of the Bayesian approach that experimental uncertainly and noise can be included, e.g., in the cryo EM-maps, in a methodically straightforward manner.By applying this approach, structure and dynamics can be determined simultaneously and, importantly, the effects originating from noise and structural heterogeneity can be disentangled.
By combining MD simulations and cryo-EM, Ode et al. estimated the free-energy landscape of glutamate dehydrogenase [40].To this end, 3D reconstructed maps were obtained from the focused classification of the mobile nucleotide binding domain.A principal component analysis (PCA) of a 200-ns MD simulation was used to produce a set of 28 conformations.Weights for these conformations were chosen to minimize the deviation between a linear combination of the maps generated from these conformations and each cryo-EM map.These weights were then combined with the relative number of particles used to reconstruct each cryo-EM map to estimate the free-energy landscape under the assumption that the plunge-freezing does not affect the ensemble.
Włodarski et al. extended the Bayesian Inference Of Ensembles (BioEn) method [41] to reweigh ensembles generated by structure-based MD simulations utilizing cryo-EM maps [42].The authors applied their method to a cryo-EM map of the ribosome-bound trigger factor (TF), which showed that part of the density cannot be explained by TF dynamics and instead can be linked to the presence of a previously unaccounted protein.
The EMMIVox method [43] uses Bayesian inference to generate a hybrid energy function combining MD force fields with restraints to fit cryo-EM maps.To balance the stereochemical quality of the refined structures with the fit to the data, the method entails pre-filtering of the voxels to reduce correlations and thereby prevent overfitting to the data.Further, prior models for the uncertainty in the data are obtained from independent reconstructions (half maps).In addition to the inclusion of B-factors, which represent thermal fluctuations and static scattering of atoms around their mean positions, considering multiple structures (ensemble) increases the correlation with the cryo-EM maps.Further analysis showed that the dynamics of 48% of the residues cannot be described by the unimodal Gaussian distribution underlying B-factors, further underscoring the need for an ensemble description.
The most promising e and most challenging e approach to structure ensemble refinement is to directly use the 2D cryo-EM images.Its key advantage is that it avoids the need to assign each 2D image to a particular conformational state, which is error-prone.In fact, typically only a subset of images can actually be assigned sufficiently reliably, and the information contained within the many unused images is lost.The challenge of this approach is the high computational cost of calculating likelihoods for large numbers of structural models with unknown orientation and large numbers of 2D images.
To obtain free-energy landscapes of biomolecules, Dashti et al. first sorted 2D images according to their projection direction and then projected them onto a low dimensional manifold identified using machine learning [44,45].After combining the projections from all directions into a universal description, free-energy landscapes were determined from the densities via Boltzmann inversion.For any point in the free-energy landscape, a 3D map can be compiled from the corresponding 2D images.In this way, the authors obtained 3D movies of conformational changes along low freeenergy paths of a non-translating ribosome [44].More recently, Dashti et al. extended this method and studied the ryanodine receptor type 1 (RyR1) [45].After extracting free-energy surfaces of RyR1 with and without bound ligands, the authors used a master equation approach and MD simulations to obtain the probability of transitioning between bound and unbound states.
Shifting the focus from structural ensembles to transition paths between members of such ensembles, Cryo-BIFE, a method to estimate the free energy along a path in conformation space from the 2D cryo-EM images, was presented by Giraldo-Barreto et al. [46].Besides providing additional mechanistic insights into protein function, transition paths are a crucial input to the method and can be obtained from MD simulations connecting minimum free-energy conformations.A Bayesian approach, where the likelihood is the probability of obtaining the set of 2D images for a given freeenergy profile along the path, is then used to obtain the posterior probability of free-energy profiles.
Knowledge of transition states and the heights of the associated energy barriers, which is needed to gain kinetic information of a biomolecular system, is difficult to obtain, because these states are only rarely visited and thus imaged.However, the posterior probability quantifies the degree of certainty with which free-energy profiles can be extracted from the experimental data, allowing one to decide whether more data is required to achieve the required accuracy.Building on this cryo-BIFE approach, Tang et al. proposed a Bayesian approach to reweigh a prior conformational ensemble with 2D cryo-EM images [47].For a multistate peptide, the authors showed that their method is able to retrieve an ensemble density from the synthetic cryo-EM images generated from conformations of MD simulations.

Conclusion
Many biomolecules are highly dynamic and their functions entail or are realized by conformational changes.Understanding in atomistic detail how the motions of biomolecules give rise to the function requires the full structural ensemble at physiological temperatures, which is much larger than the number of conformational states typically resolved by cryo-EM.MD simulations have the potential to produce these ensembles, but are limited by the length and time scales that can be reached with the available computational resources and by the approximations in the description of the underlying physics, e.g., the force fields.In principle, cryo-EM data contains information of the full structural ensemble.Practically, access to the ensemble is limited by the perturbation of the ensemble during the experiment, e.g. by plunge-freezing.Another limitation is caused by the limited number of images (particles) that can be obtained.In particular, high free-energy conformations, such as transition states, are crucial to understand the kinetics.However, because of their low occupancy they are only imaged sparsely, thereby impeding 3D reconstruction.
In this review we provided examples of how these limitations can be overcome to some extent by combining cryo-EM data with MD simulations.At the same time, we see many possibilities for further developing this combination.During cryo-EM sample preparation, the biomolecules tend to accumulate at the airewater interface, causing problems such a preferential orientation of the particles and even denaturation of the biomolecules [48].Studying these processes with MD simulations might help to find approaches that mitigate or correct for these unwanted effects.Using nonequilibrium time-resolved cryo-EM short-lived intermediate states can be resolved by sufficiently reducing the reaction time prior to plunge-freezing [18,49].Cryo-EM data from different time points after the start of a reaction could be integrated with ensemble refinement and kinetic models that describe the population of states as a function of reaction time.Such an approach has the intriguing potential to directly extract kinetic information in addition to the structural information.Most time-resolved cryo-EM approaches are limited by the timescale of the plunge-freezing after the reaction is initialized, currently reaching 6 ms [50].However, a recently introduced approach involves rapid melting of the vitrified sample with a laser, reaching room temperature.A reaction can be triggered by a second laser, e.g., releasing a caged ATP molecule.After the heating laser is turned off, the sample rapidly revitrifies and is subsequently imaged [51,52].The reachable time resolution of 5 ms matches the time scale of many conformational changes [51].MD simulations following the experimental temperature profiles could be directly compared to the cryo-EM results and thereby provide atomistic 'movies' of the conformational changes fundamentally advancing our understanding of biomolecular dynamics.