Classical detection theory and the cryo-EM particle selection problem

https://doi.org/10.1016/j.jsb.2003.10.025Get rights and content

Abstract

Particle selection is an essential but tedious step in the determination of macromolecular structures by single particle reconstruction. This paper presents an automatic, multi-reference particle detection scheme that is based on the classical matched filter principle. It makes use of a pre-whitening filter to standardize the noise, a reduced representation of the references by means of principal component analysis, and a statistic for distinguishing particles from image artifacts. Standardizing the noise allows the noise-induced false-positive frequency to be estimated, and also allows the distribution of the discrimination statistic to be calculated a priori. The method is demonstrated with an annotated dataset of cryo-EM images.

Introduction

The identification of individual particles in micrographs is a bottleneck in the high resolution determination of protein structures by electron cryomicroscopy (Nicholson and Glaeser, 2001). There are two goals that must be satisfied by a successful implementation of automatic particle picking. The first is to detect individual particle images in the presence of random noise. This is a problem that is equivalent to the classical problem of detecting symbols in a noisy communication channel. The second goal is to distinguish true particle images from artifacts or images of corrupted particles. This is a more difficult problem because in general its solution requires both information about the desired particles and also information about the characteristics of non-particles.

In this paper an implementation of automatic particle picking is based on the classical “matched filter” or “correlation detector” method. The prerequisite is that some sort of 3D density map of the particle is already available; projections of this map are used to derive references for the detector. In the past, correlation-based schemes using a single reference (Frank and Wagenknecht, 1984; Lata et al., 1995) or a representative set of references (Ludtke et al., 1999; Roseman, 2003; Stoschek and Hegerl, 1997; Wong et al., 2003) have been presented. The approach followed here uses the classical matched filter rather than the modern statistical techniques employed by Stoschek and Hegerl (1997) and by Wong et al. (2003). The main difference from previous work is that the spectrum of the background noise of a micrograph is standardized through the application of a pre-whitening filter. With this standardization, the frequency of finding “false particles” can be estimated, and a statistic for discriminating “true” particles has a known distribution. Also described here is a method to reduce the computational burden of using many references, employing principal component analysis (PCA). A closely related application of PCA is described by Ogura and Sato (2003) for their particle picker that is based on a neural network.

Section snippets

The particle detection problem

To illustrate the algorithms described here, the results will be given with reference to an annotated keyhole limpet hemocyanin (KLH) dataset (Zhu et al., 2003) available at http://ami.Scripps.edu/prtl_data. This dataset consists of 82 micrographs, each 2k × 2k pixels in size, with the pixel size being 2.2 Å. For the processing described here the high-defocus “Exposure 2” micrographs were used, and binning of pixels was employed to increase the pixel size to 8.8 Å. The images show “side” views and

The particle discrimination problem

The correlation detector does not discriminate well between true particles and other objects: any image motif that provides a sufficiently large inner product will be counted as a particle. Stoschek and Hegerl (1997) have demonstrated a correlation detector that can discriminate well among different types of particles. However, in the present case where the objects to be discriminated against cannot be specified, what is needed instead is a test of similarity of the observed image to one of the

Discussion

Described here is an automatic particle selection algorithm that shows good performance on a simple dataset having an excellent signal-to-noise ratio. Its performance on more challenging datasets has not been tested, but there is hope that some of the underlying principles—a standardized noise model, a fast algorithm for multiple correlations, and a discrimination statistic t with a predictable distribution—may be combined with other approaches to yield a truly robust automatic particle

Acknowledgements

I am grateful to Shirley Wang (Yale College) for assistance in programming. I also thank Professors Peter Schultheiss (Yale), Eric Hansen (Thayer School of Engineering, Dartmouth College), and Marshall Bern (Palo Alto Research Center) for advice and discussions. The data used here were provided by the National Resource for Automated Molecular Microscopy (supported by National Center for Research Resources Grant No. RR17573). The author’s work was supported by NIH Grant No. NS21501.

References (13)

There are more references available in the full text version of this article.

Cited by (49)

  • A Zika virus-specific IgM elicited in pregnancy exhibits ultrapotent neutralization

    2022, Cell
    Citation Excerpt :

    Motion correction and CTF calculations estimations were performed using MotionCorr2 (Zheng et al., 2017) and CTFFIND4 (Rohou and Grigorieff, 2015) respectively. Automated particle selection picking (Sigworth, 2004) performed with cisTEM (Grant et al., 2018) selected 34,474 particles. A maximum-likelihood algorithm based 2D classification (Scheres et al., 2005; Sigworth, 1998) was performed using cisTEM.

  • Cryoelectron Microscopy Structure of a Yeast Centromeric Nucleosome at 2.7 Å Resolution

    2020, Structure
    Citation Excerpt :

    Images were filtered based on the detected fit resolution better than 3.75 Å (Figure S4A). Particles were automatically picked using the ‘ab-initio’ algorithm and a circular blob as a template with a radius of 25 Å and an exclusion radius of 55 Å (Sigworth, 2004). 862,840 particles were extracted using a box size of 192 px and subjected to two rounds of reference free 2D classification (each using a target of 500 classes) based on the maximum likelihood algorithm (Figure S4B) (Scheres et al., 2005; Sigworth, 1998).

  • APPLE picker: Automatic particle picking, a low-effort cryo-EM framework

    2018, Journal of Structural Biology
    Citation Excerpt :

    As a result, micrographs have a low signal-to-noise ratio (SNR). An elaboration on the noise model can be found in (Sigworth, 2004). Since micrographs typically have low SNR, each micrograph consists of regions of noise and regions of noisy 2D projections of the macromolecule.

View all citing articles on Scopus
View full text