Bayesian analysis of individual electron microscopy images: Towards structures of dynamic and heterogeneous biomolecular assemblies
Introduction
The structural characterization of large and dynamic biomolecular assemblies is rapidly advancing, providing important insight into the function of the molecular machines and supramolecular assemblies involved in transcription and translation of genetic information, signal transduction, protein trafficking, cellular adhesion, and many other cellular processes. Electron microscopy (EM) occupies a central role in this endeavor by reporting on molecular structures with single-particle resolution, unhampered by the need to obtain crystals, and without the system size limits faced in nuclear magnetic resonance (NMR) studies (Frank, 2006). However, structural disorder in dynamic systems greatly limits the use of traditional EM methods that rely on sophisticated image pre-processing, such as class-averaging, to obtain 3D reconstructions (Saibil, 2000a, Leschziner and Nogales, 2007, Patwardhan et al., 2012). Here, we develop a method that aims to extract the maximum information by analyzing the raw EM data image-by-image within a Bayesian framework.
EM reconstructions achieve near-atomic resolution (Lerch et al., 2012, Beck et al., 2012, Ludtke et al., 2008, Zhang et al., 2013, Wang et al., 2006, Nogales et al., 1995) and reveal detailed dynamic information (Heymann et al., 2003, Ramrath et al., 2012, Cianfrocco et al., 2013). Elaborate algorithms have been developed on the modeling and simulation side to extract structural details from flexible fitting into three-dimensional (3D) electron density maps (Trabuco et al., 2008, Tama et al., 2004, Topf et al., 2008, Lindert et al., 2009, Mears et al., 2007, Schröder et al., 2007, Heymann et al., 2004, Delarue and Dumas, 2004, Loquet et al., 2012, Jaitly et al., 2010). Complementary to 3D reconstruction methods, recent integrative multi-scale protocols refine macromolecules against 2D class-averages and physico-chemical constraints. In particular, a maximum-likelihood cross-correlation metric that matches 3D models against class-averaged 2D projection images, has been used, via simulated annealing, to obtain accurate models for several multi-domain complexes (Velazquez-Muriel et al., 2012), and a Natural Moves Monte Carlo method has been successfully used to refine the group II chaperonin Mm-cpn against heterogeneous projection averages (Zhang et al., 2012). Obtaining high-resolution models typically requires a large number of EM images, even for molecules exhibiting distinct features in projection that enable sophisticated clustering and reconstruction techniques. In case of highly dynamic assemblies, the traditional EM approaches face additional challenges. In particular, it is difficult to separate molecular motions from differences in the projection view if the number of relevant structural states is large (e.g., in a multidomain protein with flexible linkers, such as the ESCRT-I–II supercomplex (Boura et al., 2012)). This problem is compounded by the presence of alternative or possibly incomplete assemblies, reflecting the often weak pairwise interactions holding the assemblies together. One thus faces challenges not only in identifying the orientations of the molecules imaged, but also in assigning proper conformations and assembly states.
To classify images of heterogeneous particles, standard techniques use iterative optimization algorithms to produce the 3D density map most consistent with the 2D averaged projection views of each model (Elmlund et al., 2008, Chiu et al., 2005, Orlova and Saibil, 2010, Saibil, 2000b). Such analyses work best for images that present common features or discernible symmetries aiding in the cluster analysis (Elmlund and Elmlund, 2012, Elmlund and Elmlund, 2009). Maximum-likelihood methods that do not require the standard class-averaging techniques, have also been developed to classify conformational states (Scheres et al., 2007a, Elad et al., 2008), and to provide 3D density maps of macromolecules (Wang et al., 2013, Scheres et al., 2007b). Such reconstruction methods are limited by requiring large numbers of particles, making applications to dynamic systems challenging.
Here, we develop the Bayesian inference of EM (BioEM) approach, geared primarily towards the analysis of EM images of dynamic biomolecular assemblies but applicable more broadly. Importantly, we analyze the EM data image-by-image from the start, without filtering or averaging the images. As is commonly done in the refinement of protein and nucleic acid structures from X-ray crystal diffraction data or from NMR spectra (Brunger et al., 1998), we use structural models or, if needed, an entire ensemble of structures. We quantify how well any one of the models reproduces each of the observed images (in crystallography, this would correspond to calculating R factors). In the spirit of the em2D score developed in (Velazquez-Muriel et al., 2012), we determine the probability for each of the EM single-particle images to be created by projection of any one of the models in our ensemble. Based on earlier Bayesian inference approaches for NMR (Rieping et al., 2005), here, we use the Bayesian framework to provide a quantitative measure for comparing and analyzing structural models with respect to individual raw EM images, in contrast to earlier maximum-likelihood or Bayesian approaches for EM (Sigworth, 1998, Scheres et al., 2005, Doerschuk and Johnson, 2000, Scheres et al., 2007a, Sigworth et al., 2010, Scheres, 2012a, Scheres, 2012b, Kucukelbir et al., 2012) focused primarily on either image classification or the reconstruction of 3D maps. Our likelihood function accounts for uncertainties in the particle orientations and positions, variations in the relative image intensities, statistical noise, and the possible presence of broken particles other than the system of interest. We then feed the calculated likelihoods into a Bayesian framework to consistently and quantitatively assess how well different structures explain the data, and which structures (or structural ensembles) explain the data best.
The EM structural models can be constructed in multiple ways, for instance on the basis of our recently developed coarse-grained energy function, as implemented in the ensemble refinement of SAXS (EROS) approach (Rozycki et al., 2011, Boura et al., 2011). Our models should provide realistic, energetically meaningful structures that can account for molecular binding interactions and conformational changes. In addition, we want our mapping between structures and EM images to be computationally efficient. While our Bayesian approach is general and would work, at one end of the spectrum, with atomistic models or, at the other end, with highly coarse-grained models that treat entire domains as featureless blobs, we here concentrate on an intermediate level of residue-based coarse graining. We find that representing proteins with one site per amino acid strikes an appropriate balance between model detail, structural and energetic accuracy, flexibility and computational efficiency.
The paper is organized as follows. We first introduce the likelihood function connecting structural models and images. We then specify our Bayesian framework, including prior distributions for the model parameters. The resulting posterior provides a probabilistic measure of the degree of consistency between a structural model and an EM image. We test our method on raw experimental EM images of the unliganded chaperonin GroEL. This test demonstrates discriminatory power within a candidate pool consisting of X-ray crystal structures in different functional states, of EM structures obtained previously, and of coarse-grained models. We then test our ability to conduct an EM ensemble refinement, using synthetic images of the ESCRT-I–II supercomplex. In an ensemble of 18 model structures that jointly span the structural ensemble of ESCRT-I–II in solution (Boura et al., 2012), our probability measure correctly identifies the model from which images were generated. Moreover, we demonstrate that we can correctly identify the size of the structural ensemble. Overall, the BioEM approach should provide a useful tool to extract structural information from electron micrographs of dynamic systems, even in cases where standard EM reconstruction techniques fail.
Section snippets
Relating individual EM images to structures through Bayesian inference
Our first goal is to construct a quantitative measure of how well a particular set of structural models represents the observed EM single-particle images. Bayesian inference establishes such a measure by assigning a posterior probability P(model∣data) to a particular model set given the image data. This probability is a product of the likelihood L given by the probability of observing the data for the model set, and of the prior probability of the model set and its parameters, P(model∣data) ∝ L
Analysis of experimental EM images of GroEL
To validate our method against experiment, we test its ability to distinguish different structures of the chaperonin GroEL in individual experimental EM images. The challenge is to identify the correct structure, as determined by the functional state of the protein, within a pool of 8 experimental candidate structures of GroEL in a variety of conformational states (Suppl. Table 1). These structures differ by Cα-backbone root-mean-square distances (RMSDs) of 0–13Å (Suppl. Tables 2–4). GroEL is a
Discussion
We developed a Bayesian framework to extract structural information from EM images of dynamic biomolecular assemblies, where traditional approaches face major challenges (Patwardhan et al., 2012). The central idea is to analyze each EM particle individually from the start, without filtering, clustering or averaging the images. By analyzing single raw particles, we avoid information loss when lumping different structures into the same class (see Suppl. Fig. 6). The key quantity in the BioEM
Acknowledgments
To obtain the source code please contact the corresponding author. The authors thank Dr. J. Bernard Heymann for help with the Bsoft program, and Drs. James Hurley and Alasdair Steven for stimulating discussions. We also thank Dr. Jürgen Köfinger for useful discussions concerning the Bayesian methodology. This work was supported by the Intramural Research Program of the National Institute of Diabetes and Digestive and Kidney Diseases, National Institutes of Health, and by the Max Planck Society,
References (72)
- et al.
Protein secondary structure determination by constrained single-particle cryo-electron tomography
Structure
(2012) - et al.
Crystal structure of wild-type chaperonin GroEL
J. Mol. Biol.
(2005) - et al.
Solution Structure of the ESCRT-I and -II Supercomplex: implications for membrane budding and scission
Structure
(2012) - et al.
Crystal structure of the temperature-sensitive and allosteric-defective chaperonin GroEL(E461K)
J. Struct. Biol.
(2006) - et al.
Exploring the structural dynamics of the E-coli chaperonin GroEL using translation–libration–screw crystallographic refinement of intermediate states
J. Mol. Biol.
(2004) - et al.
Electron cryomicroscopy of biological machines at subnanometer resolution
Structure
(2005) - et al.
Human TFIID binds to core promoter dna in a reorganized structural state
Cell
(2013) - et al.
Detection and separation of heterogeneity in molecular complexes by statistical analysis of their two-dimensional projections
J. Struct. Biol.
(2008) - et al.
High-resolution single-particle orientation refinement based on spectrally self-adapting common lines
J. Struct. Biol.
(2009) - et al.
SIMPLE: software for ab initio reconstruction of heterogeneous single-particles
J. Struct. Biol.
(2012)
A new cryo-EM single-particle ab initio reconstruction method visualizes secondary structure elements in an ATP-fueled AAA+ motor
J. Mol. Biol.
The 13 angstrom structure of a chaperonin GroEL-protein substrate complex by cryo-electron microscopy
J. Mol. Biol.
Outcome of the first electron microscopy validation task force meeting
Structure
Molecular dynamics of protein complexes from four-dimensional cryo-electron microscopy
J. Struct. Biol.
Bsoft: image processing and molecular modeling for electron microscopy
J. Struct. Biol.
Coarse-grained models for simulations of multiprotein complexes: application to ubiquitin binding
J. Mol. Biol.
A Bayesian adaptive basis algorithm for single particle reconstruction
J. Struct. Biol.
Structure of AAV-DJ, a retargeted gene therapy vector: cryo-electron microscopy at 4.5 Ångstrom resolution
Structure
GPU-enabled FREALIGN: accelerating single particle 3D reconstruction and refinement in Fourier space on graphics processors
J. Struct. Biol.
EM-fold: de novo folding of alpha-helical proteins guided by intermediate-resolution electron microscopy density maps
Structure
De novo backbone trace of GroEL from single particle electron cryomicroscopy
Structure
A corkscrew model for dynamin constriction
Structure
Methods for three-dimensional reconstruction of heterogeneous assemblies
Methods Enzymol.
A method of focused classification, based on the bootstrap 3D variance analysis, and its application to EF-G-dependent translocation
J. Struct. Biol.
Image restoration in cryo-electron microscopy
Methods Enzymol.
ATP-bound states of GroEL captured by cryo-electron microscopy
Cell
Optimal determination of particle orientation, absolute hand, and contrast loss in single-particle electron cryomicroscopy
J. Mol. Biol.
SAXS ensemble refinement of ESCRT-III CHMP3 conformational transitions
Structure
Maximum-likelihood multi-reference refinement for electron microscopy images
J. Mol. Biol.
A Bayesian view on cryo-em structure determination
J. Mol. Biol.
RELION: implementation of a Bayesian approach to cryo-EM structure determination
J. Struct. Biol.
Modeling experimental image formation for likelihood-based classification of electron microscopy data
Structure
Combining efficient conformational sampling with a deformable elastic network model facilitates structure refinement at low resolution
Structure
A maximum-likelihood approach to single-particle image refinement
J. Struct. Biol.
An introduction to maximum-likelihood methods in cryo-EM
Methods Enzymol.
An adaptive expectation–maximization algorithm with GPU implementation for electron cryomicroscopy
J. Struct. Biol.
Cited by (54)
Conformational heterogeneity and probability distributions from single-particle cryo-electron microscopy
2023, Current Opinion in Structural BiologyReweighting methods for elucidation of conformation ensembles of proteins
2022, Current Opinion in Structural BiologyWeight average approaches for predicting dynamical properties of biomolecules
2022, Current Opinion in Structural BiologyCitation Excerpt :The atomic structure is then modeled from the 3D density map using a biased MD simulation called the flexible fitting [23]. Although multiple states of the protein are classified during the 3D reconstruction, the obtained density map should still contain effects of protein dynamics, suggesting that an ensemble-based structure determination is required [24]. The metainference introduces a replica average of the density map during the fitting, which enables the determination of weight of each state from the density map [25].
Validation tests for cryo-EM maps using an independent particle set
2020, Journal of Structural Biology: X