Optimizing weighting functions for cryo-electron microscopy

The frequency-dependent signal to noise ratio of cryo-electron microscopy data varies dramatically with the frequency and with the type of the data. During different steps of data processing, data with distinct SNR are used for calculations. Thus, specific weighting function based on the particular SNR should be designed to optimize the corresponding calculation. Here, we deduced these weighting functions by maximizing the signal to noise ratio of cross correlated coefficients. Some of our weighting functions for refinement resemble that used in the existing software packages. However, weighting functions we deduced for motion correction, particle picking and the refinement with overlapping densities differ from those employed by existing programs. Our new weighting functions may improve the calculation in these steps.


INTRODUCTION
During the imaging of cryo-electron microscopy (cryo-EM) biological samples with a transmission electron microscope (TEM), proteins may undergo irreversible radiation damage (Baker and Rubinstein 2010). To minimize protein damage, only a limited number of electrons are therefore permitted during imaging. A total dose of ~40 e − /Å 2 is typical for a cryo-EM image, a dose that is much lower than ~2,000 e − /Å 2 used in conventional HRTEM (high resolution TEM) imaging. Low-dose imaging however produces images with low signal-to-noise ratios (SNRs) that become a major problem in subsequent cryo-EM data processing steps.
Most of the electrons scattered by an atom scatter through small angles and thereby contribute to the lowfrequency signals in TEM images. In cryo-EM images, scattered electrons from proteins produce low-frequency signals that are stronger than the high-frequency signals. Additionally, during imaging, the envelope function of the contrast transfer function (CTF) further decreases the signals exponentially with increasing frequency (De Jong and Van Dyck 1993). In contrast, the shot noise in a cryo-EM image acquired in the electron counting mode may be treated approximately as white noise, of which the power spectrum remains constant for all frequencies. Thus, the frequency-dependent SNR of a cryo-EM image decreases rapidly with increasing frequency.
In different cryo-EM data processing stages, such as motion correction, particle selection, and alignments of 2D images with a reference, cross correlation coefficients (CCCs) are usually calculated to compare two images. An accurate calculation of the CCCs has to take the frequency-dependence of SNRs into account. For instance, data with low SNRs should be multiplied by a weighting lower than those with high SNRs. Programs such as Frealign and Jalign use CTF and SNRdominated weighting to calculate the CCCs during refinement (Grigorieff 2007;Sun et al. 2020). The maximum likelihood method also uses data-driven SNRdependent weighting to optimize the likelihood (Scheres 2012). However, in different stages of data processing, cryo-EM data exhibit distinct SNRs. For particle selection, the signal of the protein is dominant because only low-frequency signals are involved. For motion correction, the shot noise is dominant because each frame is generated by doses of only 1-2 e − /Å 2 . In some more complicated instances, such as image alignments against a sub-volume (focused refinement or block-based reconstruction), other protein densities apart from that of the target protein are involved in the calculation of CCCs. These protein densities can be treated as noise but this changes the SNRs. Thus, different weighting functions need to be developed and used in the different stages of data processing. Calculating or optimizing the frequency-dependent weighting functions at different stages of cryo-EM data processing remains unresolved.

≫ ≪
Here, we estimate the spectral SNR (SSNR) of a cryo-EM image and derive different weighting functions according to the different types of SSNR ( 1, ≈1, and 1) by optimizing the SNR of the CCCs. Depending on the type of SNR of the data, the application of corresponding weighting functions may improve motion correction, particle selection, and alignment.

The SSNR of a cryo-EM image
A typical SSNR of a protein in a cryo-EM image ( Fig. 1A and B) was estimated from apoferritin data (see the section of Methods). The SSNR (Fig. 1C) is much larger than 1 at spatial frequencies ranging from 0 to 0.03 Å −1 , and decreases rapidly to ~1 at frequencies ranging from 0.03 to 0.12 Å −1 . A further decrease in signal leads to a SNR below 0.1 at frequencies larger than 0.12 Å −1 .
Because a motion-corrected micrograph is averaged from a set of dose-fractioned frames (usually 30-50 frames), an -fold dose accumulation enhances the SNR by factor (Lee et al. 2014). Hence, the SSNR of a single frame may be evaluated by dividing the SSNR of the averaged micrograph by . Therefore, the SNR of a single frame is much lower than that of the averaged micrograph.
We partitioned the frequency range into three intervals of resolution depending on the value of the frequency-dependent SSNR, specifically (1) the lowresolution interval in which the SSNR is larger than 10, (2) the medium-resolution interval in which the SSNR ≫ 1 ≪ 1 is between 0.1 and 10, and (3) the high-resolution interval in which the SSNR is below 0.1. The assumption that SSNR for the high-resolution interval and SSNR for the high-resolution interval is used in the derivations of formulae below.

Weighting in motion correction
The ice-embedded proteins move during imaging as a consequence of either the irradiation of the electron beam (Brilot et al. 2012) or residues in stage movements. Such movements of the protein degrade the quality of the cryo-EM image. To correct for the motion of proteins, the direct electron detector records proteins in movie mode, a series of frames that record the movement of several proteins. Using motion correction software (Timothy Grant and Grigorieff 2015;Li et al. 2013;Zheng et al. 2017), the trajectory of proteins from frame-to-frame can be estimated and compensated before the averaging of all frames. Currently, the calculation of the movement of proteins in different software depends on the CCCs between frames or the CCCs between a frame and the averaged image without applying any CTF oscillation weights.

≪ 1
The distinguishing characteristic of the SSNR of a frame is that the assumption SNR holds in most regions.
Weighted CCCs between frames are expressed as with denoting the weighting function, the protein structure factor, the CTF oscillation with envelop damping, " " the conjugate operation, and the shot noise of two frames, respectively. Substituting Eqs. 2 and 3 into 1 gives We derive the weighting function by maximizing the SNR of the CCCs, specifically, with Weighting functions for optimizing cryo-EM reconstruction

RESEARCH ARTICLE
When processing a single frame, we assume that the N 2 F 2 noise intensity is far greater than the signal intensity . With this assumption, we obtain Because the data are collected in counting mode, noise at different frequencies is normalized, and hence this weighting function simplifies to When the motion is estimated by calculating the CCCs between a frame and the averaged image, the expression denoting the noise of the frame, and the noise of the average of frames, , and hence the weighting function becomes Currently, during the preprocessing of cryo-EM micrographs, motion correction is performed before the calculation of the CTF parameters. Therefore, the weighting function used for motion correction only contains a term related to the envelope function (Li et al. 2013;Zheng et al. 2017;Timothy Grant and Grigorieff 2015). We propose that an extra weighting term related to the CTF oscillations (Fig. 2) should be applied to obtain a better performance in motion correction after preprocessing.

≫
The high-resolution reconstruction of protein structures by a single-particle analysis often requires a selection of particles from micrographs. Templatebased particle selection with CCC is widely used in single-particle analysis (Chen and Grigorieff 2007;Hall and Patwardhan 2004;Huang and Penczek 2004), which usually involves only low-frequency data (<30 Å −1 ). Thus, we assume SNR 1 for particle selection. We calculated the optimized weighting by maximizing SNR of the CCCs,

M T
Here, (k) denotes the micrograph, which contains particles in various views and noise, and the noise-free low-resolution template. The expression for M(k) is F 1 F n with pertaining to the first particle present in the micrograph, and the n-th particle, and hence the CCC becomes dS NR cc pick w dW = 0 By applying condition , we obtain ≫ F 2 (k) ≫ N 2 (k) Because SNR 1, we have and the raw micrograph may be represented by CTF(k)·F(k). The final optimized weighting approximates a whitening of the power spectrum of the micrograph (Fig. 3). Therefore, we propose that a whitening filter should be applied to both micrographs and reference to achieve a better particle selection.

Weighting in refinement
Accurate refinement is important in achieving highresolution structures. Currently, refinements usually comprise two searches, a global search with coarse step size compared against a medium resolution model and a local search near a set of specific parameters with fine  Fig. 2 Power spectrum of a frame after being applied with weighting function in two different cases of the motion correction: calculating similarity between frames (navy) and similarity between a frame and averaged micrograph (olive) Weighting functions for optimizing cryo-EM reconstruction RESEARCH ARTICLE ≪ step size compared against an improved highresolution model. In local searches, the signal of the 2D images at medium resolutions is not sensitive enough because the 2D references are similar at this resolution. Thus, the signal of a 2D image of high resolution for which SNR 1 plays an important role in determining the angular and translational parameters. This assumption does not hold when the reconstruction has only a medium resolution.
In a typical single-particle reconstruction, extracted particles from cryo-EM images usually only contain the target protein itself. However, for a focused refinement without subtracting surrounding densities or blockbased reconstruction, unexpected densities associated with overlapping proteins intrude into the calculation rather than densities of the target protein. Assuming that these densities are not correlated to the model used for calculating CCCs, we simply treat the densities as noise in the derivation of formulae, specifically, with denoting the signal from the target protein, and the signal from other proteins. Noise in the raw image is far greater than noise of the template . Hence, with this approximation, is deduced by maximizing the SNR of yielding Because equals 0 for a typical single-particle refinement, therefore, is A similar weighting function has been used in the program Jalign (Sun et al. 2020), in defocus refinements (Su et al. 2017;Zivanov et al. 2018), and in the program Cistem (Grant et al. 2018). With high-resolution refinements, is much smaller than , and therefore the weighting function for particles with overlapping densities becomes the same as for the typical case (Fig. 4).
Thus, for local refinements, which lead to highresolution reconstructions, overlapping densities do not change the weighting function. However, if the reconstruction is limited to a medium resolution, we believe that refinements will benefit with the application of the new weighting function.  Fig. 4 Power spectrum of image applied with weightings deduced for refinement: olive for overlapping case and navy for single protein case. Here we consider the sub-volume subjected to focused refinement is 1/3 of the remaining part, thus the overlapping density is set to three times the target protein density We have analyzed the decay of structure factors of proteins with frequency in cryo-EM, and then divided the frequencies into three distinct intervals depending on the SNR value: the low-resolution interval, which appears in particle selection, with SSNR 1; the medium-resolution interval with SSNR ≈ 1, which holds for medium-resolution alignments; and the highresolution interval for high-resolution alignments for which SSNR 1. For motion correction, SSNR 1 holds in most regions. The different stages of cryo-EM data processing correspond to different frequency ranges, and by calculating the maximum SNR for the CCCs, we derived optimized weighting functions for various processing stages. We believe that our optimized weighting functions may improve cryo-EM data processing in some stages.

Estimation of SSNRs of cryo-EM images
To study the variation of the frequency-dependent SNR in a cryo-EM image, an accurate calculation of the SSNR is necessary. The SSNR of an image is estimated from the Fourier ring correlation (FRC) between two independent aligned data sets (Sindelar and Grigorieff 2012), where FRC(k) denotes the 2D analog of the Fourier shell correlation, the number of Fourier pixels contained in the frequency ring, the meansquared value of the soft edged mask, i and j index the location of the grid component in Fourier space, and the black dot "·" signifies the dot-product operation for the corresponding points on the ring of Fourier pixels.
We calculated the FRC between projections of the two half cryo-EM reconstruction of apoferritin. The images have a defocus variation from 0.4 to 2.1 μm to remove the CTF oscillations. After converting the FRC to SSNR using Eq. 23, we achieved the final estimate of the SSNR.