Determination of the Structure and Dynamics of the Fuzzy Coat of an Amyloid Fibril of IAPP Using Cryo-Electron Microscopy

In recent years, major advances in cryo-electron microscopy (cryo-EM) have enabled the routine determination of complex biomolecular structures at atomistic resolution. An open challenge for this approach, however, concerns large systems that exhibit continuous dynamics. To address this problem, we developed the metadynamic electron microscopy metainference (MEMMI) method, which incorporates metadynamics, an enhanced conformational sampling approach, into the metainference method of integrative structural biology. MEMMI enables the simultaneous determination of the structure and dynamics of large heterogeneous systems by combining cryo-EM density maps with prior information through molecular dynamics, while at the same time modeling the different sources of error. To illustrate the method, we apply it to elucidate the dynamics of an amyloid fibril of the islet amyloid polypeptide (IAPP). The resulting conformational ensemble provides an accurate description of the structural variability of the disordered region of the amyloid fibril, known as fuzzy coat. The conformational ensemble also reveals that in nearly half of the structural core of this amyloid fibril, the side chains exhibit liquid-like dynamics despite the presence of the highly ordered network backbone of hydrogen bonds characteristic of the cross-β structure of amyloid fibrils.


■ INTRODUCTION
In the last several years, cryo-electron microscopy (cryo-EM) has been pushing the boundaries of structural biology in terms of structural resolution, system complexity, and macromolecular size. 1,2 Imaging single particles by rapid cryo-cooling and vitrification enables structural studies capturing near-native conformations, while offering sample protection from beam radiation. 3−5 Technical advances in electron detectors, computational algorithms accounting for beam-induced motion, and automation of data collection and image analysis have paved the way for a spectacular increase in the resolution of cryo-EM density maps. 6,7 The Electron Microscopy Data Bank (EMDB) currently holds 15,800 single-particle cryo-EM density maps, which offer exquisitely detailed structural information about macromolecular systems of central importance in cell biology. 8 In standard cryo-EM structure determination, two-dimensional (2D) images of single particles are first classified in conformationally homogeneous classes and then averaged in a computational image processing step, thereby leading to a substantial increase in the signal-to-noise ratio, and thus in structure resolution. 9 However, the continuous dynamics of flexible regions are difficult to detect, therefore complicating the generation of homogenous classes of structures. The resulting low densities cannot be readily used to determine atomistic structures, and are thus often excluded from the final structural model. While methods such as the manifold embedding approach 10 can determine multiple structures from cryo-EM density maps, to account for the conformational dynamics, one should quantitatively and atomistically interpret the cryo-EM density maps as an envelope that corresponds to an averaged conformational ensemble of states with certain populations that interconvert with a characteristic timescale. 10 Such a viewpoint moves away from a single-structure interpretation of the data and links the data to the statistical mechanics concept of free energy landscapes of conformational ensembles. 11,12 Integrative structural ensemble-modeling methods incorporate experimental information into molecular simulations and enable the determination of structural ensembles that maximally conform to the experimental data with atomistic resolution. 13−24 This technique has been applied using nuclear magnetic resonance (NMR) spectroscopy, 18,20,21,25−29 fluorescence resonance energy transfer (FRET) microscopy, 30 small-angle scattering techniques (SAXS/ SANS), 31−34 transition rate constants, 23,24 and cryo-EM data 35 and AF distance map data. 36 One of such methods, cryo-EM metainference (EMMI), 35 can accurately model a thermodynamic ensemble by combining prior information on the system, such as physicochemical knowledge (e.g., a force field), with noisy (i.e., subject to systematic and random errors) and heterogeneous (i.e., encoding a conformational ensemble) experimental data, using cryo-EM density maps. EMMI has already been used in a series of complex macromolecular systems, including a CLP protease, 37 microtubules, 38 microtubule-tau complexes, 39 ASCT2 transporter, 40 and SARS-CoV-2 spike protein, 41,42 allowing access to the continuous dynamics of biomolecules with atomistic resolution. The quality of the EMMI structural ensembles, however, is closely related to the exhaustiveness of the conformational sampling, which requires a computational time that scales exponentially with the barriers delimiting individual structural states. 10 To tackle this rare event problem, several enhanced sampling methods have been developed. Enhanced sampling molecular simulation methods can be classified as trajectorybased 43−47 and collective-variable (CV)-based. 48−52 For detailed reviews, the reader can refer to the recent literature. 53,54 A particularly powerful CV-based enhanced sampling method, which is very efficient once appropriate CVs are chosen, is metadynamics. 52,55 Metadynamics adds a history-dependent bias to the system as a function of microscopic degrees of freedom of the system known as collective variables. With this bias, the simulations can escape deep free energy minima and sample transitions between different states. The choice of the CVs is critical to achieve the desired speed-up of convergence. 53 Recent developments in identifying and automating the search for appropriate CVs have increased the efficiency of this method, thus providing a remedy to the conformational sampling problem, 56−60 which typically affects standard molecular dynamics simulations ( Figure 1A). Metainference has been successfully combined with metadynamics 61 and shown significant improvements over unrestrained force fields for many different biophysical systems. 27,29,62,63 However, metadynamics has not yet been combined with EMMI ( Figure 1A). Combining EMMI with enhanced sampling methods can lead to accurate and efficient determination of structural ensembles using the large number of datasets in cryo-EM databases, which can in turn provide atomistic insight into a range of biomolecular systems and processes.
In this work, we present the MEMMI method, which incorporates metadynamics into EMMI to accelerate the ability of EMMI to sample structural ensembles that include slowly interconverting states. We illustrate the application of this approach to determine the structural ensemble of an amyloid fibril formed by the full-length (residues 1−37) islet amyloid polypeptide (IAPP), an aberrant assembly associated with the degeneration of pancreatic β-cells in type-2 diabetes (T2D). When functioning correctly, IAPP, together with insulin, contributes to glycaemic control. IAPP and insulin are synthesized and stored together in pancreatic β-cells, but when IAPP aggregates in the extracellular space of the islets of Langerhans, amyloid-induced apoptosis of β-cells may occur. 64 In 95% of T2D patients, IAPP is found as extracellular amyloid deposits, 65−67 which form through surface-catalyzed secondary nucleation. 68 IAPP fibrils represent a challenging system for structural biology studies since no unique structure can be readily resolved in the low-density regions of the 12-residue long Nterminal tails, due to conformational heterogeneity and associated errors in the measurement. For this reason, although recent cryo-EM experiments determined various amyloid fibril structures of IAPP, 69−71 the structural heterogeneity in the disordered flanking regions, known as fuzzy coat, has so far proved impossible to resolve accurately. It would be desirable to acquire a better understanding of the conformational properties of the fuzzy coat since this region is thought to play a central role in the interactions of amyloid fibrils with other cellular components such as RNA molecules and molecular chaperones. 72,73 Moreover, the fuzzy coat is likely to be involved in cell membrane binding, potentially promoting the catalysis of aggregation and capturing amyloid precursors. 72,73 Recent studies on the tau protein, which is implicated in a family of neurodegenerative diseases known as tauopathies, show that depending on pH conditions, the thick fuzzy coat can change the fibril properties, including mechanical stiffness, and repulsive and adhesive behaviors. 72,73 Here, we detail the dynamics of the fuzzy coat of an IAPP fibril and utilize a thermodynamical theory of melting to characterize the different regions of the fibril to gain insight into its mechanical properties.

MEMMI Method. Cryo-EM Forward Model.
A cryo-EM density map resulting from class-averaging and three-dimensional (3D) reconstruction is typically encoded as voxels on a grid, and the map is generally distributed in this form. For computational efficiency, and to enable differentiability and analysis of correlations between data points, the map can be converted to a Gaussian mixture model ϕ D (x) (GMM) consisting of N D Gaussian components where x is a vector in Cartesian space, ω D,i is the scaling factor of the i-th component of the data GMM, and G is a normalized Gaussian function centered at x D,i with covariance matrix D,i . The agreement between models generated by molecular dynamics (MD) and the data GMM is calculated by the following overlap function ov MD,i where ϕ M (x) corresponds to the model GMM obtained from molecular dynamics. To deal with the heterogeneity of the system, EMMI simulates many replicas, r, of the system. The overlap between model GMM and data GMM is estimated over the ensemble of replicas as an average overlap per GMM component ov i MD, . This forward model overlap can then be compared to the data GMM self-overlap ov DD Metainference. Metainference is a Bayesian approach for modeling statistical ensembles by combining prior information on a system with experimental data subject to noise or systematic errors. 17 This framework is particularly well suited to structural ensemble determination through molecular dynamics simulations, in which the prior (i.e., the force field) is updated with information from experimental methods, such as NMR spectroscopy, SAXS, or cryo-EM data. Metainference is designed to handle systematic errors (such as biases in the force field or forward model), random errors (due to noise in experimental data), and errors due to the limited sample size of the ensemble. 18 The model generation is governed by the metainference energy function, defined as E MI = −k B T log-(p MI ), in which k B is the Boltzmann constant, T is the temperature, and p MI is the metainference posterior probability Here, X is a vector representing the atomic coordinates of the full ensemble, consisting of individual replicas X r ; σ SEM is the error incurred by the limited number of replicas in the ensemble; σ B encodes the random and systematic errors in the prior, forward model, and experiment; and , while σ B is computed per data point i and replica r as where ov i MD, is the ensemble average of the overlap. The metainference energy function for multiple replicas then becomes where E σ represents the energy associated with the error σ = (σ B , σ SEM ) E MD represents the molecular dynamics force field. While the space of conformations X r is sampled by multi-replica molecular dynamics simulations, the error parameters for each data point σ r,i B are sampled by a Monte Carlo sampling scheme at each time step. The error parameter related to the limited number of replicas used to estimate the forward model (σ SEM ) can be chosen as a constant or estimated on the fly by using a windowed average. 32 Metadynamic Cryo-EM Metainference (MEMMI). To accelerate the sampling of the metainference ensemble, one can utilize an enhanced sampling scheme such as metadynamics. 52,61 In this case, we use parallel-bias metadynamics (PBMetaD) 56 with the multiple walkers scheme. 74 Here, V PB is a time-dependent biasing potential acting on a set of N CV collective variables s(X), which in turn are functions of the system coordinates In contrast to conventional metadynamics, in PBMetaD, multiple one-dimensional bias potentials V G are deposited Biochemistry pubs.acs.org/biochemistry Article rather than a single high-dimensional one. This alleviates the curse of dimensionality while still allowing an efficient exploration of phase space. 56 Additionally, the use of multiple replicas through the multiple walkers scheme 74 allows the sharing of the bias potential to drastically improve the sampling performance, while at the same time being a natural fit for the replica averaging approach of EMMI. Analogously to welltempered metadynamics, these bias potentials V G (s j (X), t) eventually converge to the free energy F(s j (X)). The MEMMI energy function then becomes While the PBMetaD bias potential is shared among replicas, each replica may still experience a varying potential depending on its location in phase space. Thus, the arithmetic average over the forward models (i.e., the overlap) no longer presents an unbiased estimate of the ensemble average. It therefore needs to be replaced with a weighted average, utilizing the bias potential of each replica r to unbias the ensemble at time t The unbiasing procedure used here is analogous to the standard umbrella-sampling technique. 48 Now the ensemble average X ov ( ) i MD, is given by The MEMMI energy E MEMMI is thus equal to eq 5. Initial Fibril Structure. We build the initial structure of the full fibril by starting from a deposited fibril structure (PDB: 6Y1A), which only contains the core of the fibril, and extending each of the 16 polypeptide chains by adding the missing 12-residue N-terminal sequence with guidance from the cryo-EM density map EMD-10669. We use the macromolecular model-building program Coot. 75 We note that since all of the simulations that we present in this work reached convergence, the results that we report are independent of the initial configuration. We also note that the electron density map EMD-10669 used here for the MEMMI approach was generated using a helical reconstruction method, which in principle could lead to a periodicity not only for the fibril core but also for the disordered N-terminal regions. There are indeed periodic densities visible outside of the fibril core in the map, and it is unclear whether these regions should be seen as artifacts of the reconstruction approach or valid densities. The original reconstruction procedure does not mention the use of a mask to discard information on the fuzzy coat, 71 as is often done for amyloid reconstructions. 76 On the other hand, welldefined densities beyond the fibril core often have important structural context, 77 and we thus decided to use the density map as is.
Molecular Dynamics Setup and Equilibration. We continue by creating a 12.34 nm × 12.34 nm × 12.34 nm cubic simulation box, solvating with 58291 water molecules and neutralizing the net charge by adding 48 Clions. We use the CHARMM22* 78 force field and TIP3P 79 water models. We continue with an energy minimization followed by a 500 ps NPT equilibration at a temperature of 310 K and pressure of 1 atm, followed by an additional 2 ns NVT equilibration at 310 K. The molecular dynamics parameters are the same used previously. 41 MEMMI Simulations. We first express the experimental voxel map data as a data GMM containing 10,000 Gaussians in total, resulting in a 0.975 correlation to the original voxel map EMD-10669, 64 using the gmmconvert utility. 80 We continue by extracting 32 configurations from the previous NVT equilibration and initiate an MEMMI simulation, consisting of 32 replicas, resulting in an aggregate runtime of 5.49 μs, using PLUMED.2.6.0-dev 81 and gromacs-2020.6. 82 The simulation is performed in the NVT ensemble at 310 K using the same MD parameters as in the equilibration step. Configurations are saved every 10 ps for post-processing. The cryo-EM restraint is updated every 2 MD steps, using neighbor lists to compute the overlaps between model and data GMMs, with a neighbor list cutoff of 0.01 and update frequency stride of 100 steps. The biasing collective variables s = [s i ] in the simulation are shown in Figure S1A, and the biasing scheme is PBMetaD 56 with the well-tempered 83 and multiple walkers 74 protocols. The hill height is set to 0.3 kJ/mol, with a deposition frequency of 200 steps and an adaptive Gaussian diffusion scheme. 84 The biasing collective variables correspond to degrees of freedom of the left-hand-side N-terminal. The respective degrees of freedom of the right-hand-side Nterminal do not feel a metadynamics potential and therefore in the remaining text will be referred to as EMMI degrees of freedom and are also listed in Figure S1. As a post-processing step, we generate the final structural ensemble by resampling the generated configurations based on the converged unbiasing weights for each structure after an equilibration of 7 ns shown in Figure 1A. To establish convergence, we perform a clustering analysis on the structural ensemble based separately on the first and second half of each replica (taking into account the weights) using the GROMOS method, 85 and with metric the root-mean-square deviation (RMSD) calculated on the Cα (CA) atoms ( Figure S1B). Time traces of CVs as well as their time-dependent free energy profile are shown in Figures S2− S4. For molecular visualizations and calculating the local correlation of the final structural-ensemble-generated cryo-EM map with the experimental cryo-EM map, we use Chimera and gmconvert. 86 Except otherwise mentioned, all of the structural analysis is performed on the degrees of freedom of the N-tails on the left-hand side, which is the one with MEMMI restraints, and the two central pairs of polypeptide chains in the overall stack of eight pairs, to avoid the finite size effects.

Structure and Dynamics of IAPP Amyloid Fibrils.
Acceleration of the Conformational Sampling. MEMMI accelerates the conformational sampling by biasing a set of microscopic degrees of freedom of the system, also known as collective variables (CVs, Figure S1). In addition, MEMMI also corrects possible inaccuracies in the force field used in the , which corresponds to the disulfide bond dihedral of the 4th polypeptide in the eight-layer stack of the fibril. This polypeptide is thus representative of a buried monomer with little interaction with the fibril ends. Note that, due to C2 helical symmetry, there are two of these dihedrals, one corresponding to the right-hand-side N-terminal tail and one to the left-hand-side N-terminal tail ( Figure 1B). The sampling of one side is accelerated by a biasing potential of eq 7 (MEMMI), while the other is not (EMMI). Compared to the dihedral, the biased disulfide χ 3 4 (MEMMI) shows an increased transition rate, and thus more efficient conformational sampling ( Figure 1C). While both methods characterize the wells of the stable states (L, R), MEMMI is able to provide access to higher free energy transition state (TS) regions ( Figure 1D). Monitoring a nonperiodic CV, such as the number of contacts between N-tail three and four, shows diffusion along high−low contact regions in both MEMMI and EMMI cases but is somewhat more frequent in the MEMMI case. The combination of diffusion shown in the traces of CVs and the free energy profiles as a function of simulation time of all collective variables biased by EMMI and MEMMI are shown in Figures S2−S4 and indicate that our simulations are well converged. Taken together, these results show that both the MEMMI and EMMI simulations are converged in the low free energy regions, while MEMMI enables visiting high free energy regions.

Conformational Heterogeneity of the Fuzzy Coat.
A structure of the IAPP fibril core (residues 13−37) has been previously published (PDB: 6Y1A) using the cryo-EM data used also in the present work (EMD-10669). Here, we determine a structural ensemble of the whole IAPP fibril (residues 1−37). We model the system as a stack of eight polypeptides per side (Figure 2A). While the core of the fibril maintains a parallel β-sheet structure, the flanking region (residues 1−12) exhibits a large conformational heterogeneity ( Figure 2B). While we find the cores residues 12−37 to be largely in a β-sheet conformation, we also note significant heterogeneity for residues 23−24, 32−34, and 37. As shown in Figure 2B, residues 23−24 interact with TYR37 and maintain mostly a coil structure, while residues 32−34 interact with the fuzzy coat and maintain mostly a coiled structure. We also detected a small population of α-helical conformations in the region of residues 5−9.
Correlation between Experimental and Calculated Cryo-EM Maps. We estimate the correlation of the MEMMI structural ensemble with the experimental cryo-EM map ( Figure 3). We find that using a structural ensemble, IAPP (residues 1−37) correlates better with the experimental cryo-EM map than a single structure (PDB: 6Y1A) ( Figure 3A,B). The coefficient of correlation of the structural ensemble to the experimental electron density map is on average 0.92. Furthermore, an important feature of MEMMI is its ability to estimate the error in the experimental electron density map ( Figures 3C and S5). We find that the relative error per Gaussian data point is on average 0.09, where the relative error is the error of each Gaussian data point with respect to the Biochemistry pubs.acs.org/biochemistry Article total overlap between all data GMM and the ith component of the data GMM. 35 The low-density, high-error volume around residues S34, N35, and T36 can likely be attributed to the lack of MES/NaOH buffer in our MEMMI simulations, which is present in the experimental setup. Both MEMMI and EMMI exhibit good correlation (about 0.83) in the respective N-tail region with the cryo-EM map ( Figure 3D). Comparison of the Dynamical Properties of the Fibril Core and Fuzzy Coat. The conformational properties of the fuzzy coat and core region have been shown to be relevant in modulating the properties of amyloid fibrils, including their ability to interact with various cellular components. 72,73 To investigate this phenomenon in the case of the present IAPP amyloid fibril, we take inspiration from a thermodynamic theory of melting and characterize the residue-dependent Lindemann parameter Δ L (Figure 4), which encodes information on solid-like and liquid-like behavior. 87 At the backbone level, we find that the fibril core (23−30) is solid-like (Δ L < 0.15), while the flanking region (1−12) is on the verge of a liquid-like behavior (Δ L ≥ 0.15). The Lindemann parameters of the side chains indicate more mobility and are liquid-like outside the region of residues 20−32. These results reveal that about half of the structural core of this amyloid fibril remains rather disordered at the side-chain level, a phenomenon observed also for folded native states, even in otherwise rigid systems such as ubiquitin. 87 The fuzzy coat thus exhibits a degree of conformational heterogeneity that is much lower than that of monomeric disordered proteins such as amyloid-β, which generally exhibit Lindemann parameters Δ L ≥ 1.0 ( Figure S7).
Residue-Specific Solubility of the Amyloid Fibril Surface. To investigate whether or not the surface of the amyloid fibril is soluble, we calculated the solubility per residue using the structure-corrected CamSol solubility score 88 ( Figure 5). We thus found three insoluble regions: (i) residues 5−7 (ATC) in the fuzzy coat, (ii) residues 26−29 (ILSS) at the end of the fibril fragment, and (iii) residues 14−18 (NFLVH) on the fibril surface, stretched along the helical axis. The latter result is consistent with experimental evidence of residues N14, H18, and S20 being implicated in IAPP aggregation 66,89 via secondary nucleation. We note that the solvent-exposed and aggregation-prone residues 5−7 and 18−20 might present attractive targets for structure-based drug discovery. This strategy for choosing targets was recently demonstrated experimentally in the case of α-synuclein fibrils. 90

■ CONCLUSIONS
We have presented the MEMMI method for the simultaneous determination of the structure and dynamics of large and conformationally heterogeneous biomolecular structures from