Bayesian analysis of individual electron microscopy images: Towards structures of dynamic and heterogeneous biomolecular assemblies

https://doi.org/10.1016/j.jsb.2013.10.006Get rights and content

Abstract

We develop a method to extract structural information from electron microscopy (EM) images of dynamic and heterogeneous molecular assemblies. To overcome the challenge of disorder in the imaged structures, we analyze each image individually, avoiding information loss through clustering or averaging. The Bayesian inference of EM (BioEM) method uses a likelihood-based probabilistic measure to quantify the consistency between each EM image and given structural models. The likelihood function accounts for uncertainties in the molecular position and orientation, variations in the relative intensities and noise in the experimental images. The BioEM formalism is physically intuitive and mathematically simple. We show that for experimental GroEL images, BioEM correctly identifies structures according to the functional state. The top-ranked structure is the corresponding X-ray crystal structure, followed by an EM structure generated previously from a superset of the EM images used here. To analyze EM images of highly flexible molecules, we propose an ensemble refinement procedure, and validate it with synthetic EM maps of the ESCRT-I–II supercomplex. Both the size of the ensemble and its structural members are identified correctly. BioEM offers an alternative to 3D-reconstruction methods, extracting accurate population distributions for highly flexible structures and their assemblies. We discuss limitations of the method, and possible applications beyond ensemble refinement, including the cross-validation and unbiased post-assessment of model structures, and the structural characterization of systems where traditional approaches fail. Overall, our results suggest that the BioEM framework can be used to analyze EM images of both ordered and disordered molecular systems.

Introduction

The structural characterization of large and dynamic biomolecular assemblies is rapidly advancing, providing important insight into the function of the molecular machines and supramolecular assemblies involved in transcription and translation of genetic information, signal transduction, protein trafficking, cellular adhesion, and many other cellular processes. Electron microscopy (EM) occupies a central role in this endeavor by reporting on molecular structures with single-particle resolution, unhampered by the need to obtain crystals, and without the system size limits faced in nuclear magnetic resonance (NMR) studies (Frank, 2006). However, structural disorder in dynamic systems greatly limits the use of traditional EM methods that rely on sophisticated image pre-processing, such as class-averaging, to obtain 3D reconstructions (Saibil, 2000a, Leschziner and Nogales, 2007, Patwardhan et al., 2012). Here, we develop a method that aims to extract the maximum information by analyzing the raw EM data image-by-image within a Bayesian framework.

EM reconstructions achieve near-atomic resolution (Lerch et al., 2012, Beck et al., 2012, Ludtke et al., 2008, Zhang et al., 2013, Wang et al., 2006, Nogales et al., 1995) and reveal detailed dynamic information (Heymann et al., 2003, Ramrath et al., 2012, Cianfrocco et al., 2013). Elaborate algorithms have been developed on the modeling and simulation side to extract structural details from flexible fitting into three-dimensional (3D) electron density maps (Trabuco et al., 2008, Tama et al., 2004, Topf et al., 2008, Lindert et al., 2009, Mears et al., 2007, Schröder et al., 2007, Heymann et al., 2004, Delarue and Dumas, 2004, Loquet et al., 2012, Jaitly et al., 2010). Complementary to 3D reconstruction methods, recent integrative multi-scale protocols refine macromolecules against 2D class-averages and physico-chemical constraints. In particular, a maximum-likelihood cross-correlation metric that matches 3D models against class-averaged 2D projection images, has been used, via simulated annealing, to obtain accurate models for several multi-domain complexes (Velazquez-Muriel et al., 2012), and a Natural Moves Monte Carlo method has been successfully used to refine the group II chaperonin Mm-cpn against heterogeneous projection averages (Zhang et al., 2012). Obtaining high-resolution models typically requires a large number of EM images, even for molecules exhibiting distinct features in projection that enable sophisticated clustering and reconstruction techniques. In case of highly dynamic assemblies, the traditional EM approaches face additional challenges. In particular, it is difficult to separate molecular motions from differences in the projection view if the number of relevant structural states is large (e.g., in a multidomain protein with flexible linkers, such as the ESCRT-I–II supercomplex (Boura et al., 2012)). This problem is compounded by the presence of alternative or possibly incomplete assemblies, reflecting the often weak pairwise interactions holding the assemblies together. One thus faces challenges not only in identifying the orientations of the molecules imaged, but also in assigning proper conformations and assembly states.

To classify images of heterogeneous particles, standard techniques use iterative optimization algorithms to produce the 3D density map most consistent with the 2D averaged projection views of each model (Elmlund et al., 2008, Chiu et al., 2005, Orlova and Saibil, 2010, Saibil, 2000b). Such analyses work best for images that present common features or discernible symmetries aiding in the cluster analysis (Elmlund and Elmlund, 2012, Elmlund and Elmlund, 2009). Maximum-likelihood methods that do not require the standard class-averaging techniques, have also been developed to classify conformational states (Scheres et al., 2007a, Elad et al., 2008), and to provide 3D density maps of macromolecules (Wang et al., 2013, Scheres et al., 2007b). Such reconstruction methods are limited by requiring large numbers of particles, making applications to dynamic systems challenging.

Here, we develop the Bayesian inference of EM (BioEM) approach, geared primarily towards the analysis of EM images of dynamic biomolecular assemblies but applicable more broadly. Importantly, we analyze the EM data image-by-image from the start, without filtering or averaging the images. As is commonly done in the refinement of protein and nucleic acid structures from X-ray crystal diffraction data or from NMR spectra (Brunger et al., 1998), we use structural models or, if needed, an entire ensemble of structures. We quantify how well any one of the models reproduces each of the observed images (in crystallography, this would correspond to calculating R factors). In the spirit of the em2D score developed in (Velazquez-Muriel et al., 2012), we determine the probability for each of the EM single-particle images to be created by projection of any one of the models in our ensemble. Based on earlier Bayesian inference approaches for NMR (Rieping et al., 2005), here, we use the Bayesian framework to provide a quantitative measure for comparing and analyzing structural models with respect to individual raw EM images, in contrast to earlier maximum-likelihood or Bayesian approaches for EM (Sigworth, 1998, Scheres et al., 2005, Doerschuk and Johnson, 2000, Scheres et al., 2007a, Sigworth et al., 2010, Scheres, 2012a, Scheres, 2012b, Kucukelbir et al., 2012) focused primarily on either image classification or the reconstruction of 3D maps. Our likelihood function accounts for uncertainties in the particle orientations and positions, variations in the relative image intensities, statistical noise, and the possible presence of broken particles other than the system of interest. We then feed the calculated likelihoods into a Bayesian framework to consistently and quantitatively assess how well different structures explain the data, and which structures (or structural ensembles) explain the data best.

The EM structural models can be constructed in multiple ways, for instance on the basis of our recently developed coarse-grained energy function, as implemented in the ensemble refinement of SAXS (EROS) approach (Rozycki et al., 2011, Boura et al., 2011). Our models should provide realistic, energetically meaningful structures that can account for molecular binding interactions and conformational changes. In addition, we want our mapping between structures and EM images to be computationally efficient. While our Bayesian approach is general and would work, at one end of the spectrum, with atomistic models or, at the other end, with highly coarse-grained models that treat entire domains as featureless blobs, we here concentrate on an intermediate level of residue-based coarse graining. We find that representing proteins with one site per amino acid strikes an appropriate balance between model detail, structural and energetic accuracy, flexibility and computational efficiency.

The paper is organized as follows. We first introduce the likelihood function connecting structural models and images. We then specify our Bayesian framework, including prior distributions for the model parameters. The resulting posterior provides a probabilistic measure of the degree of consistency between a structural model and an EM image. We test our method on raw experimental EM images of the unliganded chaperonin GroEL. This test demonstrates discriminatory power within a candidate pool consisting of X-ray crystal structures in different functional states, of EM structures obtained previously, and of coarse-grained models. We then test our ability to conduct an EM ensemble refinement, using synthetic images of the ESCRT-I–II supercomplex. In an ensemble of 18 model structures that jointly span the structural ensemble of ESCRT-I–II in solution (Boura et al., 2012), our probability measure correctly identifies the model from which images were generated. Moreover, we demonstrate that we can correctly identify the size of the structural ensemble. Overall, the BioEM approach should provide a useful tool to extract structural information from electron micrographs of dynamic systems, even in cases where standard EM reconstruction techniques fail.

Section snippets

Relating individual EM images to structures through Bayesian inference

Our first goal is to construct a quantitative measure of how well a particular set of structural models represents the observed EM single-particle images. Bayesian inference establishes such a measure by assigning a posterior probability P(model∣data) to a particular model set given the image data. This probability is a product of the likelihood L given by the probability of observing the data for the model set, and of the prior probability of the model set and its parameters, P(model∣data)  L

Analysis of experimental EM images of GroEL

To validate our method against experiment, we test its ability to distinguish different structures of the chaperonin GroEL in individual experimental EM images. The challenge is to identify the correct structure, as determined by the functional state of the protein, within a pool of 8 experimental candidate structures of GroEL in a variety of conformational states (Suppl. Table 1). These structures differ by Cα-backbone root-mean-square distances (RMSDs) of 0–13Å (Suppl. Tables 2–4). GroEL is a

Discussion

We developed a Bayesian framework to extract structural information from EM images of dynamic biomolecular assemblies, where traditional approaches face major challenges (Patwardhan et al., 2012). The central idea is to analyze each EM particle individually from the start, without filtering, clustering or averaging the images. By analyzing single raw particles, we avoid information loss when lumping different structures into the same class (see Suppl. Fig. 6). The key quantity in the BioEM

Acknowledgments

To obtain the source code please contact the corresponding author. The authors thank Dr. J. Bernard Heymann for help with the Bsoft program, and Drs. James Hurley and Alasdair Steven for stimulating discussions. We also thank Dr. Jürgen Köfinger for useful discussions concerning the Bayesian methodology. This work was supported by the Intramural Research Program of the National Institute of Diabetes and Digestive and Kidney Diseases, National Institutes of Health, and by the Max Planck Society,

References (72)

  • H. Elmlund et al.

    A new cryo-EM single-particle ab initio reconstruction method visualizes secondary structure elements in an ATP-fueled AAA+ motor

    J. Mol. Biol.

    (2008)
  • S. Falke et al.

    The 13 angstrom structure of a chaperonin GroEL-protein substrate complex by cryo-electron microscopy

    J. Mol. Biol.

    (2005)
  • R. Henderson et al.

    Outcome of the first electron microscopy validation task force meeting

    Structure

    (2012)
  • J. Heymann et al.

    Molecular dynamics of protein complexes from four-dimensional cryo-electron microscopy

    J. Struct. Biol.

    (2004)
  • J.B. Heymann et al.

    Bsoft: image processing and molecular modeling for electron microscopy

    J. Struct. Biol.

    (2007)
  • Y.C. Kim et al.

    Coarse-grained models for simulations of multiprotein complexes: application to ubiquitin binding

    J. Mol. Biol.

    (2008)
  • A. Kucukelbir et al.

    A Bayesian adaptive basis algorithm for single particle reconstruction

    J. Struct. Biol.

    (2012)
  • T.F. Lerch et al.

    Structure of AAV-DJ, a retargeted gene therapy vector: cryo-electron microscopy at 4.5 Ångstrom resolution

    Structure

    (2012)
  • X. Li et al.

    GPU-enabled FREALIGN: accelerating single particle 3D reconstruction and refinement in Fourier space on graphics processors

    J. Struct. Biol.

    (2010)
  • S. Lindert et al.

    EM-fold: de novo folding of alpha-helical proteins guided by intermediate-resolution electron microscopy density maps

    Structure

    (2009)
  • S.J. Ludtke et al.

    De novo backbone trace of GroEL from single particle electron cryomicroscopy

    Structure

    (2008)
  • J.A. Mears et al.

    A corkscrew model for dynamin constriction

    Structure

    (2007)
  • E.V. Orlova et al.

    Methods for three-dimensional reconstruction of heterogeneous assemblies

    Methods Enzymol.

    (2010)
  • P. Penczek et al.

    A method of focused classification, based on the bootstrap 3D variance analysis, and its application to EF-G-dependent translocation

    J. Struct. Biol.

    (2006)
  • P.A. Penczek

    Image restoration in cryo-electron microscopy

    Methods Enzymol.

    (2010)
  • N. Ranson et al.

    ATP-bound states of GroEL captured by cryo-electron microscopy

    Cell

    (2001)
  • P. Rosenthal et al.

    Optimal determination of particle orientation, absolute hand, and contrast loss in single-particle electron cryomicroscopy

    J. Mol. Biol.

    (2003)
  • B. Rozycki et al.

    SAXS ensemble refinement of ESCRT-III CHMP3 conformational transitions

    Structure

    (2011)
  • S. Scheres et al.

    Maximum-likelihood multi-reference refinement for electron microscopy images

    J. Mol. Biol.

    (2005)
  • S.H.W. Scheres

    A Bayesian view on cryo-em structure determination

    J. Mol. Biol.

    (2012)
  • S.H.W. Scheres

    RELION: implementation of a Bayesian approach to cryo-EM structure determination

    J. Struct. Biol.

    (2012)
  • S.H.W. Scheres et al.

    Modeling experimental image formation for likelihood-based classification of electron microscopy data

    Structure

    (2007)
  • G.F. Schröder et al.

    Combining efficient conformational sampling with a deformable elastic network model facilitates structure refinement at low resolution

    Structure

    (2007)
  • F. Sigworth

    A maximum-likelihood approach to single-particle image refinement

    J. Struct. Biol.

    (1998)
  • F.J. Sigworth et al.

    An introduction to maximum-likelihood methods in cryo-EM

    Methods Enzymol.

    (2010)
  • H.D. Tagare et al.

    An adaptive expectation–maximization algorithm with GPU implementation for electron cryomicroscopy

    J. Struct. Biol.

    (2010)
  • Cited by (54)

    • Weight average approaches for predicting dynamical properties of biomolecules

      2022, Current Opinion in Structural Biology
      Citation Excerpt :

      The atomic structure is then modeled from the 3D density map using a biased MD simulation called the flexible fitting [23]. Although multiple states of the protein are classified during the 3D reconstruction, the obtained density map should still contain effects of protein dynamics, suggesting that an ensemble-based structure determination is required [24]. The metainference introduces a replica average of the density map during the fitting, which enables the determination of weight of each state from the density map [25].

    View all citing articles on Scopus
    View full text