Robust ab initio solution of the cryo-EM reconstruction problem at low resolution with small data sets

Single particle cryo-electron microscopy has become a critical tool in structural biology over the last decade, able to achieve atomic scale resolution in three dimensional models from hundreds of thousands of (noisy) two-dimensional projection views of particles frozen at unknown orientations. This is accomplished by using a suite of software tools to (i) identify particles in large micrographs, (ii) obtain low-resolution reconstructions, (iii) refine those low-resolution structures, and (iv) finally match the obtained electron scattering density to the constituent atoms that make up the macromolecule or macromolecular complex of interest. Here, we focus on the second stage of the reconstruction pipeline: obtaining a low resolution model from picked particle images. Our goal is to create an algorithm that is capable of ab initio reconstruction from small data sets (on the order of a few thousand selected particles). More precisely, we propose an algorithm that is robust, automatic, and fast enough that it can potentially be used to assist in the assessment of particle quality as the data is being generated during the microscopy experiment.


Introduction
In single-particle cryo-electron microscopy (cryo-EM), a purified preparation of particles (proteins or macro-molecular complexes) is frozen in a thin sheet of ice, with each particle held in an unknown orientation. Using a weak scattering approximation, the image obtained can be viewed as a noisy projection of the particle's electron scattering density, which we denote by F (x), with optical aberrations modeled by what is called a contrast transfer function (CTF) [1,2]. The task at hand is to take the individual particle images, determine their orientations, and recover a high resolution characterization of F (x). This is typically accomplished through the Fourier slice theorem [3,4], which states that the two-dimensional Fourier transform of each projection image is a slice througĥ F (k), the three-dimensional Fourier transform of F (x) (modulated by the CTF).
The specific slice obtained from a single image is determined by the orientation of the individual particle. With many particles at many distinct orientations,F (k) is well-sampled and F (x) can be obtained through the inverse Fourier transform. Thus, if the orientations are known, the reconstruction problem is linear and easily solved. The orientations, however, are not known, and the reconstruction problem is both nonlinear and non-convex. In practice, not only are the orientations unknown, but the particles must be correctly centered in order to invoke the Fourier slice theorem. The unknown displacement vector must also be learned for each particle image. We postpone a proper discussion of the mathematics to section 2 and the appendices (Supporting Information).
From an experimental perspective, the initial stages of cryo-EM involve some hours of imaging, resulting in the collection of multiple micrographs, each containing a number of particle images. After sufficiently many micrographs are collected, the next steps involve particle picking (identifying individual particles in the micrographs), and the construction of a low-to medium-resolution molecule for further analysis and refinement. The latter step typically involves class averaging -finding particles that appear to have the same orientation and taking the average in order to improve the signal-to-noise ratio, but some algorithms proceed from the raw particle images themselves (as we do here).
Despite recent advances in sample preparation, imaging and analysis, the cost of data collection can still be quite significant, and the availability of microscope time is a limited resource. In order for the picked-particle images to be useful, they must meet certain criteria: each should contain a single well-centered particle with a minimum of clutter, and the full set must cover a wide variety of viewing angles, sufficient to determine the full three-dimensional structure. Assuming that these criteria are met, these images (hopefully the least noisy and cleanest projection views) can then be used in downstream analyses. If the sample or the images are not of sufficient quality, it may be necessary to terminate the current experiment and repeat with a new sample, hoping to produce higher-quality particle images the next time.
To reduce the time and cost of such experiments, there are many quality-control tasks that would be useful to carry out as early as possible. These include: 1. determining whether any picked-particle image contains a reasonably well-centered and isolated molecule.
2. estimating the viewing angles of the images (and checking that the range of viewing angles is sufficient to reconstruct the molecule).
3. estimating the displacements/shifts required to more precisely center the images. 4. estimating the image quality and the number of particles that will be needed to reconstruct the desired molecule at the desired resolution. 5. estimating the molecular structure itself at low-resolution (which feeds back to the estimates [1][2][3][4] In current practice, many of these issues are addressed after carrying out class averaging, which typically requires dozens of particles per class and tens of thousands of picked particles, although a number of reconstruction algorithms work directly on the raw data. In this paper, we describe an algorithm which can serve as a complement to more standard strategies. Our algorithm operates on a relatively small set of picked-particle images (say, a few thousand), and attempts to directly reconstruct (without class averaging) a low-resolution version of the imaged molecule, along with estimates of the viewing-angles, shifts and quality of the images involved. As we demonstrate below, our method is significantly faster, more accurate and more reliable than many existing de-novo reconstruction algorithms. We hope that it is robust enough to permit the assessment of data quality as the experiment is being carried out. Moreover, our algorithm runs quickly and automatically with no 'tuning' required, allowing for multiple small pools of approximately 1000 picked-particles to be independently assessed without supervision (each corresponding to ∼ 30 micrographs). This would help set the stage for more robust statistical validation techniques such as bootstrap or jack-knife estimation.

Remark 1
We should emphasize that the starting point of our experiments here is a collection of picked particle images (obtained from the EMPIAR database), not the raw movies or entire micrographs that are sometimes deposited as well. Particle picking is itself a non-trivial task. Restricting our attention to molecules for which particles are curated allows us to focus exclusively on the problem of reconstruction from a small data set.
At its core, our algorithm is a modification of the simplest expectation-maximization method. We use the framework of classical iterative refinement -alternating between a reconstruction step and an alignment step. In the reconstruction step, we use the current estimate of the viewing angles for each particle/image and their displacements to approximate the molecule. In the alignment step, we use the current estimate of the molecule to better fit the viewing angles and displacements for each image. We will refer to this approach as alternating minimization (AM).
It is well-known that standard AM does poorly in this setting; an initialization with randomized viewing angles typically fails to converge to a useful low-resolution approximation of the molecule. To address these issues and to guide AM to a better structure, we introduce two changes: (i) we compress and denoise the individual images by computing their two-dimensional Fourier transform on a polar grid and focusing on carefully selected rings in the transform domain with maximal information content, and (ii) we increase the entropy of the viewing-angle distribution used during the reconstruction step (in a manner to be described in detail). These modifications both reduce the computational cost and improve the robustness of AM in the low-resolution low-image-number regime.
In the numerical experiments below, we show that our modified method (alternating minimization with entropy maximization and principal modes or 'EMPM') performs surprisingly well on several published data sets. Our reconstructions are often on par with the best results one could expect from an 'oracle-informed' AM (i.e., standard AM starting with the 'ground-truth' viewing-angles and displacements for each image). Moreover, our results are similar to (or better than) what one might expect from a reconstruction using 'optimized' Bayesian inference (i.e., using a ground-truth informed estimated noise level). Importantly, our strategy is efficient, running about twenty times faster than standard Bayesian inference or the de-novo reconstruction in the widely used, open source package Relion [5,6,7]. We also make direct comparisons with two other open source methods: the neural network approach in cryoDRGN [8] and the recent low-resolution reconstruction pipeline in Aspire [9]. We have not made a direct comparison with the stochastic gradient descent approach in the GPU-based cryoSPARC package [10].
In the remainder of this paper, we briefly summarize the EMPM method and demonstrate its performance on a variety of data sets. The details of our method, as well as some additional motivating examples, are deferred to the Supporting Information. It is currently implemented for multi-core CPUs, but we believe it should be straightforward to implement on GPUs as well, and to integrate into any of the existing software packages for cryo-EM.

Overview of the EMPM iteration
Our general goal is to solve the following problem: given a set of N A picked particle images (2D projections) from a handful of micrographs (each with their own CTF), estimate the unknown viewing-angles τ j ∈ SO(3) and displacements δ j ∈ R 2 for each image, as well as the underlying volumeF (k), where k ∈ R 3 denotes a point in the Fourier transform domain. In addition to determining the volumeF (k), we would also like an estimate of the viewing-angle distribution µ τ and displacement distribution µ δ (taken across the images). If the estimates of the volumeF (k) and the distributions µ τ , µ δ are accurate estimates of the ground truth, they will clearly be useful for quality control, determining (for example) whether the experimental conditions are suitable for larger scale data collection and subsequent high resolution reconstruction.
The main tool we employ is a form of alternating minimization (AM), which carries out some number of iterations of the following two step procedure: Reconstruction: Given the estimated viewing-angles {τ estim j } and displacements {δ estim j } for N A 2d picked-particle images {Â j }, reconstruct the 3D molecular volumeF estim (using the Fourier slice theorem and least squares inversion of the Fourier transform). In a standard AM scheme, the reconstruction step solves forF estim using a least-squares procedure, and the alignment step assigns the {τ estim j } and {δ estim j } using some kind of maximum-likelihood procedure (see sections A. 18 and A.22 in the Supporting Information). While there are many more sophisticated strategies for molecular reconstruction (see, for example, [10,8]), we focus on AM for a few reasons: first and foremost is its simplicity, transparency and ease of automation. AM has modular steps, few parameters, and is easily interpretable.
At the same time, however, standard AM does have serious limitations. First, it can be computationally expensive, particularly if a broad range of frequencies and/or translations are considered in the alignment step. Second, AM can fail to converge to the correct solution, especially during the initial iterations, when the initial guess is far from the true molecule.
To address these two issues, we propose a modified version of AM which we refer to as alternating minimization with entropy-maximization and principal modes (EMPM). The two main modifications we introduce are: Maximum entropy alignment: We use both the traditional maximum likelihood alignment strategy, which finds the best estimate for each image from the current molecular reconstruction, and its opposite. That is, we also use a 'maximum-entropy' alignment strategy: for each uniformly chosen viewing angle, we find the experimental image which best matches the corresponding projection from the current structure. The subsequent reconstruction step makes use of this uniform distribution of viewing angles whether or not it corresponds to the true distribution. Use of this counter-intuitive step is important in stabilizing our low resolution reconstruction from a small set of images (see section A.22 in the Supporting Information).
Principal-mode projection: Each image is transformed onto a polar grid in Fourier space. The data on each circle in the transform domain at a fixed modulus |k| will be referred to as a Fourier ring. Rather than considering all Fourier rings when comparing 2D projections and image data, we consider only a handful of carefully-chosen combinations of such rings, which we refer to as principal modes. This yields significant compression of the data (and computational efficiency) while retaining information useful for alignment (see section A.13 in the Supporting Information).
In the EMPM pipeline, we first estimate the principal-modes using the images themselves. Using these 'empirical' principal modes, we proceed with AM, alternating between maximum-likelihood and maximum-entropy alignment. Once a preliminary volume estimate has been obtained, we use that estimate to recalculate the principal modes. We then restart AM using the new principal images and, again, alternate between maximum-likelihood and maximum-entropy alignment until we have a final volume estimate. The algorithm is summarized below (see Supporting Information for a more detailed description): 1. Use the experimental imagesÂ j to calculate 'empirical' principal modes U empir .
2. Compute the principal images, defined to be the projection of the images onto principal modes, P (U empir ,Â j ). Note that, each time we update the displacements δ empir j and δ estim j , we bound the maximum permitted displacement magnitude by δ max (typically δ max ∼ 0.1 in our dimensionless units). In our studies with small amounts of moderately noisy experimental data, we have observed that the EMPM strategy is more effective than Bayesian inference or maximum-likelihood alignment alone -even with probabilistic alignment where a distribution of viewing angles is assigned to each image. Roughly speaking, we believe this is due the fact that the incorrect viewing angles assigned in the maximumentropy step are washed out by destructive interference, while the correctly assigned ones reinforce the true structure. One simple model problem which illustrates these points is discussed in the Supporting Information (see sections B and C). It is worth repeating, however, that we only propose this approach for obtaining a low-resolution initial guess. Existing refinement methods are still crucial for high resolution.

Set random initial viewing angles
Once we have run our EMPM algorithm, we have the estimated volumeF estim (k) as output. UsingF estim (k) as an approximation ofF (k), we can readily compute other quantities that may be of interest, such as estimates of the true viewing angle distribution in the data set and other types of correlations. We will use these quantities in our numerical experiments below to compare the results of EMPM to other methods, such as Relion's de-novo molecular reconstruction and full Bayesian inference.

Templates, correlations, and ranks
For each centered molecular reconstructionF , and set of Euler angles τ = (α, β, γ), we define the inner product X and normalized inner productX of the reconstruction with the "ground-truth" modelF true by Here (α, β) denotes the viewing angle of an image with α the azimuthal angle and β the polar angle. γ is the final in-plane rotation (see section A.6). R(τ ) •F is the volume obtained fromF using the rotation (the element of SO (3)) defined by τ . We define the volumetric correlation Z(F ,F true ) by This measurement Z(F ,F true ) represents the optimal correlation (over all alignments) between the two volumesF andF true . In order to compare our results with Relion's de-novo reconstruction, we will also need to calculate the 'masked' correlation Z mask (F, F true ), defined as: where G is a sharp spherical mask (i.e., the indicator-function of a ball centered at the origin) with a radius determined by the support of F true .
We will also need to define inner products and correlations of single images and volumetric slices. For this, given a Fourier space volumeF , we generate noiseless 2D projections (in the Fourier domain) that incorporate the CTF, which we refer to as templates. We denote these byŜ(τ, δ; CT F ;F ), where τ is the specific set of Euler angles and δ is the displacement. When assuming the displacement is zero, we will write these templates asŜ(τ ; CT F ;F ), Formally, we havê where CT F (k) is the given pointwise value of the CTF in Fourier space andŜl denotes taking the equatorial slice in the Fourier domain. These CTF-modulated templates are used to calculate the inner products: where T (+δ, k) = e iδ·k is the Fourier signature of displacement by the vector δ. These are used, in turn, to calculate the correlations: For convenience, we generally reduce this to a function of the viewing directionk = (α, β), the first two components of τ , alone: The parameters γ and δ are associated with a particular choice of α and β implicitly through the arg max. When the context is clear, we will often abuse notation and refer to the correlations in (9) by Z(α, β;Â j ). Finally, for each imageÂ j and a set of N S templates defined by the viewing directions (α l , β l ), l = 1, . . . , N S , we can order the values Z(α l , β l ;Â j ; CT F j ;F ), l = 1, . . . , N S .
This yields what we will refer to as the template rank for the image. Conversely, for each template defined by (α l , β l ), one can order the values Z(α l , β l ;Â j ; CT F j ;F ), j = 1, . . . , N A .
This defines the image rank for that template. Using the terminology of linear algebra, we can define the Template-ranking then corresponds to sorting the rows of Z while image-ranking corresponds to sorting the columns.

Results
We will consider a number of examples in this paper, drawn from the electron-microscopy image archive ( We set K = 48 as the bandlimit for our low-resolution reconstruction, corresponding to a 'best' possible resolution of about 20Å. With these image pools, we construct a variety of different 'reference volumes' forF (k).
Ground truth: First, we center and restrict the published volume to the band-limit K. This defines our 'ground truth' volumeF true (k).
Oracle: Next, we measure the correlations Z(τ ;Â j ; CT F j ;F true ) for a sufficiently sampled set of Euler angles τ and displacements and use maximum-likelihood to align the images toF true (k). This assigns 'ground-truth' viewing angles τ true j and displacements δ true j to each image. More precisely, we start by calculating Z using a small displacement disc of δ local = 0.03, and then iterate multiple times, with δ max = 0.25 until the correlation between the reconstructed and true volumes plateaus (this usually takes less than 8 iterations). We then take the final viewing angles and displacements to define τ true j and δ true j . Given these ground-truth-based viewing angles and displacements, we use a least-squares reconstruction to computeF Oracle (k).
Oracle-AM (OAM): Starting with the ground-truth viewing angles and displacements, as well aŝ F Oracle (k), we apply standard AM, iterating until the resulting volume converges (this, again, usually requires less than 8 iterations). We define this final volume to beF OAM (k). We expect the volumeF OAM to be close to the best reconstruction one could realistically achieve with standard AM.
Oracle-Bayesian (OBI): Next, we apply Bayesian inference reconstruction (as described in [7,11]). This reconstruction depends on (i) the initial starting volume, (ii) the estimated noise level σ BI , (iii) the resolution K and range of displacements δ max considered, and (iv) the error tolerance used when calculating the integrals required for estimating the posterior described in [7]. We fix K = 48 and δ max = 0.10 and set the error tolerance to the same value used in the rest of our numerical experiments (typically tol = 0.01). To create a reference volume associated with the ground truth, we useF true as the initial starting volume. The resulting volumedenoted byF OBI (k; σ BI ) -will depend on the noise-estimate σ BI . Note that, when σ BI → 0 we will haveF OBI (k; σ BI ) →F OAM (k), and when σ BI → ∞ we will haveF OBI (k; σ BI ) converge to a spherically-symmetric distribution. In our experience the quality of the reconstructed volume depends quite sensitively on the choice of σ BI . To build a good noise-estimate, we scan over several values of log(σ BI ) ∈ [−5, . . . , 0], choosing the σ BI for whichF OBI (k; σ BI ) has the highest correlation withF true after the first iteration. Fixing σ opt BI to be this optimal value, we defineF OBI (k) :=F OBI (k, σ opt BI ) after iterating to convergence. Thus,F OBI is close to the best reconstruction possible using Bayesian inference, assuming that the optimal estimated noise-level σ BI is known ahead of time.
In addition to the above reference volumes, we also create several de-novo molecular reconstructions. RDN and EMPM below are oracle-free, while for Bayesian inference we use the optimal noise level estimate σ opt BI from above.
Bayesian Inference (BI): We run the Bayesian inference reconstruction described in [7,11] to produceF BI (k). As mentioned above, this reconstruction depends on (i) the initial starting volume, (ii) the estimated noise-level σ BI , (iii) the resolution K and the range of displacements δ max considered, (iv) the error-tolerance used when computing the integrals in the posterior, and (v) the number of iterations applied. We fix the noise level σ opt BI to be the value obtained in the Oracle-Bayesian method above. We fix the resolution at K = 48, set δ max = 0.10, set the error-tolerance to the same tol used in the other methods, and set the number of iterations at 16. For the initial starting volume we assign each of the available images a viewing angle randomly and uniformly, and then use a least-squares reconstruction to generate a 'random' initial starting volume. While random, this starting volume has roughly the same diameter and bandwidth as the true molecule. We use the same procedure when initializing our strategy below.

DRG:
We run the open-source neural-network-based algorithm cryoDRGN described in [8], with the suggested default parameters for ab initio reconstruction.

ASP:
We run the open-source software package ASPIRE [9], which uses a common-lines-based algorithm, with the suggested default parameters for ab initio reconstruction.

RDN:
We run Relion's de-novo molecular reconstruction algorithm (RDN) -a variant of Bayesian inference -to produceF RDN . (For speed, RDN uses only subsets of images in its internal iterations.) We use the parameters listed in section D in the Supporting Information. We set the parameter --particle diameter to be commensurate with the true molecular diameter, rounded up to the nearest 50Å.
EMPM: We carry out the reconstruction using the strategy described in the Methods section above, and discussed in more detail in the Supporting Information to produceF EMPM (k). In addition to setting the maximum frequency K and displacement δ local , many of the computed quantities require a numerical tolerance tol (see section A). We fix K = 48, and scan over tol ∈ [0.001, 0.010] and δ local ∈ [0.01, 0.10]. Our strategy begins with the same random initial viewing-angles for each image as used for Bayesian inference. As we shall see below (see, e.g., Fig. 3), our results are quite robust, provided that tol ≤ 0.01 and δ local ≥ 0.01 or so.
For trpv1, as well as rib80s and ISWINCP, the data sets are quite clean. The first N A = 1024 picked-particle images are well-centered, devoid of clutter and of high quality. Consequently, many of the reconstructions we attempted were successful and RDN reconstruction was also able to achieve Z mask > 0.5. For ps1, LSUbl16dep, MlaFEDB, LetB1, and TMEM16F, however, the individual images are not quite as clean. In the case of ps1 and TMEM16F, many of the images are particularly noisy. For LSubl16dep (and to a lesser extent ps1), a notable fraction of the images are poorly centered. For MlaFEDB (and to a lesser extent TMEM16F), many of the images appear to contain fragments of other particles. For LetB1, the viewing-angle distribution for the images is highly concentrated around the equator. Our reconstructions for these molecules are not as successful. Nevertheless, we still recover volumes that are comparable in quality to Bayesian inference with an optimal estimated noise-level, and usually better than RDN reconstruction (all at a fraction of the computational time).
Results for four molecules are shown in Fig. 1. Focusing on trpv1 for the moment, we see that F EMPM captures the coarse features ofF true , and is not too different fromF OAM in terms of overall quality. Even thoughF EMPM is a coarse estimate ofF true , the reconstruction process has recovered a significant amount of information regarding the viewing-angles of each image. To illustrate this point, we compare the image-template correlations (see Eq. 9) estimated fromF EMPM with the 'groundtruth' image-template correlations measured fromF true . For each molecule of interest, three panels are presented in Fig. 2. The left-most subplot is a scatterplot of the image-template correlations. That is, for each imageÂ j and each template (α l , β l ), we compute a point in two-dimensions with coordinates (Z(α l , β l ;Â j ; CT F j ;F EMPM ), Z(α l , β l ;Â j ; CT F j ;F true )).
With N A = 1024 images and N T = 993 templates, this yields a data set with about one million points. The color in the heatmap corresponds to the log-density of the joint-distribution at that location (see adjacent colorbar). Note that there is a significant correlation between the estimated-trpv1 rib80s ISWINCP ps1 Figure 1: In this figure, we plot level sets from the volumesF true ,F Oracle ,F OAM , andF OBI , as well as one of theF EMPM reconstructions (left to right, respectively). From the top down, the panels correspond to trpv1, rib80s, ISWINCP and ps1. and ground-truth-image-template-correlations. In the middle subplot for each molecule, we show a scatterplot of the template ranks for each image (see (10)), rescaled to [0, 1] and aggregated over all images. In the right-most subplot, we show a scatterplot of the image ranks for each template (see (11)), rescaled to [0, 1] and aggregated over all templates. Once again, the scatterplots are presented as heatmaps, with color corresponding to log-density. This view of the data sheds light on the differences between maximum-likelihood and maximumentropy alignment. Recall that, in standard maximum-likelihood alignment, one assigns τ est j (for a particular imageÂ j ) to the template with the highest template rank. By contrast, in maximumentropy alignment, one does the reverse: assigning the image with highest image rank to each sampled template direction. It is interesting to note that the estimated and ground-truth image ranks are typically more tightly correlated than the estimated and ground-truth template ranks. We believe that this phenomenon contributes to the success of our strategy. Indeed, when the estimated and ground-truth template ranks are highly correlated (as for ISWINCP), the maximum-likelihood alternating minimization method (i.e., Bayesian inference with noise-level 0) succeeds, performing roughly as well as our strategy. However, when the estimated and ground-truth template ranks are poorly correlated (as for ps1), then maximum-likelihoood alternating minimization does very poorly, often converging to a volume that is worse than random.
The illustrations in Figs. 1 and 2 are obtained using just one trial ofF EMPM . We have carried out extensive simulations and these results are fairly typical, with qualitatively similar results with repeated runs. For illustration, we show the values of the volumetric correlations collected over multiple trials in Fig. 3. The values of Z mask (F EMPM ,F true ) shown in magenta are taken over a range of values of tol ∈ [0.001, 0.010], and δ local ∈ [0.01, 0.10]. We did not see a large correlation between either tol or δ local and Z mask , provided that tol is sufficiently small and δ local is sufficiently large (i.e., tol ≤ 0.01 and δ local ≥ 0.01). Note that, for these data sets, our strategy reliably produces reconstructions that are similar in quality toF OAM , and superior to the results obtained with existing open-source packages (DRG, ASP, RDN). Finally, we remark that a higher-quality lowresolution reconstruction can help improve the results of subsequent refinement. As an example, we use Relion's relion refine to produce a medium-resolution reconstruction from the first 8192 images of this dataset using two different initial-volumes. First, we set the initial volume to Relion's de-novo reconstructionF RDN (referenced in blue in Fig. 3). Second, we set the initial volume to the EMPM reconstructionF EMPM shown in Fig. 1. Fig. 4 illustrates the Fourier shell correlations (FSCs) between the resulting medium-resolution reconstructions and the publishedF true . Note that the higher-quality initial volumeF EMPM results in a better medium-resolution reconstruction. Moreover, as we shall discuss below, better low-resolution reconstructions can be useful for other quality-control tasks in the cryo-EM pipeline.
We have focused here on a subset of the proteins listed above. Addition results are deferred to the Supporting material.
Remark 2 Of note is that the EMPM method works reasonably well even when the particle orientations are highly non-uniform (see Fig. 15). One might expect that the entropy-maximization step (which forces images to be assigned to each viewing angle) would cause difficulties. For de novo lowto moderate-resolution reconstructions, however, it appears that EMPM benefits from the prevention of spurious clustering that can develop in pure AM iteration. At higher resolution, we do not recommend the use of EMPM, as the entropy-maximization step would result in a loss of resolution and is no longer necessary once a good initial model has been created.

Remark 3
We have implemented our strategy in Matlab, using standard (double precision) libraries. Even for our relatively naive implementation, the total computation time for each of our trials is 30 − 50 minutes on a desktop workstation. By comparison, the de-novo reconstruction method within Relion typically requires 20 − 25 hours on the same machine, while a trial of Bayesian inference (also in Matlab) requires 36 − 48 hours. It is also worth noting that, when constructingF BI , we chose the optimal noise-estimate σ opt BI . Scanning over multiple values of σ BI would require additional computation time. In summary, we believe that our strategy has the potential to be extremely efficient, especially when properly optimized.

On-the-fly particle rejection
Focusing on the ps1 data set, we now consider whether our algorithm can be used to curate particles from small samples. As shown in Figs. 2 and 11, the estimated image-template correlations Z(α, β;Â j ; CT F j ;F EMPM ) are often quite strongly correlated with the true image-template correlations Z(α, β;Â j ; CT F j ;F true ). Motivated by this observation, we define: to be the maximum image-template correlation for each image. We can use Q EMPM j as a proxy for the 'true image-quality' Q true j . We see from Figs. 2 and 11 that Q j typically lies in the interval [0.1, 0.4]. It is reasonable to conjecture that those images with a low estimated quality do not align well to the reconstructed molecule and could be discarded when attempting to reconstruct the molecule. To investigate this hypothesis, we perform a numerical experiment with the first 1024 · 8 = 8192 images in the EMPIAR data set, dividing these images into 8 batches of N A = 1024. For each batch we produce a reconstruction, using our strategy above. The Z mask for these single-batch runs are shown as magenta-dots in the 'IND' column of Fig. 5. The average Z mask for these single-batch runs is ∼ 0.67 (magenta line), with relatively little variation.
Next, we use the estimated image-quality Q j to sort the images into octiles (each with 1024 images). Those with the lowest quality scores define the first octile and those with the highest quality scores define the eighth octile. We now perform a reconstruction for each octile separately, with the Z mask shown in light-pink-dots. Note that the first few octiles produce reconstructions that are quite a bit worse than the typical reconstructions shown in the 1K column. It is interesting to note that, for this image pool, using exclusively the highest quality octile results in a poor reconstruction, since the highest quality images are concentrated around a small subset of viewingangles. The result obtained using all N A = 8192 images together is shown as the dark-purple dot : In this figure we show the Fourier shell correlation (FSC) associated with a mediumresolution reconstructions produced using Relion's relion refine applied to the first N A = 8192 images from the trpv1 dataset (i.e., the first N A picked-particles from the MRC 1901 file). We run this reconstruction twice: once using as initial volume the RDN reconstructionF RDN obtained using the first 1024 images and once using as initial volume the EMPM reconstructionF EMPM . We plot the FSC curve obtained by comparing the reconstructed volume with the published ground-trutĥ F true . The blue (lower) curve is from the RDN initialization and the magenta (higher) curve is from the EMPM initialization.
in the '1st' column. We then show successive reconstructions using different quality cut-offs. Using the top seven octiles (7168 images), we obtain the improved Z mask shown as a dark-purple dot in the '2nd' column. Using the top six octiles, we get the dark purple dot in the '3rd' column, and so forth. In short, excluding the worst images improves the reconstruction quality, but excluding too many images causes the quality to drop, presumably as a result of losing the coverage from noisier projection directions.

Molecules with poor performance on small data sets
While our results can be considered moderately successful for many of the molecules, our reconstructions for LetB1, LSUbl17dep and MlaFEDB using 1024 particles are typically quite mediocre. For LetB1 and LSUbl17dep, different runs of EMPM yielded reconstructions that were quite variable. We suspect that this inconsistency stems from some underlying heterogeneity in the data set itself (see [12]). For MlaFEDB, on the other hand, different trials of EMPM yielded reconstructionŝ F EMPM that were not close toF true , but were often quite close to one another. This consistency across trials suggests that the structures observed in the typical reconstruction might actually exist in the image pool we are using. A recurring motif in the reconstructions is that of a cylindrical shape with a disjoint structure off to one side. In this case, we suspect that the image pool contains artifacts that are at least partially responsible for the mediocre performance of EMPM (as well as BI, RDN, DRG, ASP) for this molecule.

Discussion
As demonstrated above, the EMPM iteration (a modification of alternating-minimization) can produce a reasonable low-resolution reconstruction from a small set of raw images. As seen in Figs. 3 and 12, it is more robust than many of the currently used de-novo molecular reconstruction techniques, and at least as accurate as Bayesian inference with an optimally chosen estimated noise level. We did not make an exhaustive comparison with all available packages, since our main purpose here is to provide a new approach which is only relevant in the low resolution regime. We believe that the maximum entropy step of EMPM accounts for its success, effectively preventing spurious clustering of assigned viewing-angles. It should also be noted that the principal mode reduction described here can be used to accelerate image alignment and volume registration in more sophisticated and higher resolution reconstruction algorithms.
EMPM also produces useful estimates of image quality and viewing angle distribution -even when the image pool only contains a few thousand images. These estimates could play a role in online quality-control: identifying poor images to be discarded or possibly terminating imaging sessions that seem to be yielding poor quality data.
Because EMPM is efficient (our MATLAB implementation requires less than 5% of the computation time of RDN and Bayesian inference), it should make cross-validation more compelling. Our reconstruction requires about 40 min., RDN requires about 20 hrs, full Bayesian inference requires 40 hrs, cryoDRGN requires 5 hrs and Aspire (the fastest) requires 15 min. (all on the same laptop). Strong correlation between volumes reconstructed from different subsets of images can reasonably be interpreted as an indication that the data set is of high quality. On a more speculative note, the image curation presented here uses only image-template correlations. It is possible that image cross-correlations and the cross-correlations of the residuals of the least-squares reconstruction step could also be valuable in settings involving discrete or continuous heterogeneity.
Because EMPM does not require any fine-tuning of parameters, it should be straightforward to incorporate into existing, widely-used, high-quality software pipelines and we hope to do so in the near future. The code is available from [13].

IND Independent batches
Lowest quality octile Highest quality octile  First we divide these images into batches of N A = 1024, and produce multiple reconstructions (magenta, leftmost column). The average Z mask for these reconstructions is ∼ 0.67 (magenta line). Next, we use the individual estimates of image-quality (see text) to divide the images into 8 'octiles'. We perform a reconstruction using the images within each octile (light-pink). Note that those images of poorest quality (i.e., in the first few octiles) produce an inferior reconstruction. For this image-pool the images with the very best quality (i.e., in the final octile) tend to be attributed to a small subset of viewing-angles; a reconstruction using only those images is also of rather poor quality. We then perform a reconstruction using all the images except those below a particular octile (dark purple). Thus, the reconstruction using all 8192 images is shown in dark-purple within the '1st' column. The reconstruction using the 8192 − 1024 = 7168 images which exclude those in the first octile is shown in dark-purple within the '2nd' column. Note that excluding the worst images improves the reconstruction-quality. As the number of excluded images grows the number of images used for the reconstruction shrinks; the dark-purple circles are not much higher than the light-pink circles when the octile-index is large.

A Supporting Information
Various aspects of the EMPM iteration are described here in greater detail. We begin with the discretization of images and volumes in the Fourier domain, and the techniques used for image alignment and volumetric reconstruction. Many of the techniques used require some specification of a 'global tolerance' tol , which could be considered a user-specified parameter. In practice, it appears to be sufficient to set it to 10 −2 , corresponding to two digits of accuracy. In our experiments, we actually scan over tol ∈ [0.001, 0.01] but find that the quality of the reconstruction does not improve markedly for smaller values of tol . In short, EMPM is not limited by the global tolerance, but presumably by the experimental error and the structural features of the nonconvex landscape over which we are trying to optimize.

A.1 Manipulating images in the Fourier domain
We use x, k ∈ R 2 to represent spatial position and frequency, respectively. In polar-coordinates these vectors can be represented as: for appropriately chosen angles θ and ψ.

Definition 1 The Fourier transform of a two-dimensional function
We recover A fromÂ using the inverse Fourier transform: The inner-product between two functions A, B ∈ L 2 (R 2 ) is written as where z † is the complex conjugate of z ∈ C. We will also use Plancherel's theorem [14], We will ignore the factors of 2π associated with the Fourier transform when they are not relevant. We represent any given picked-particle image as a function A ∈ L 2 (R 2 ), with values corresponding to the image intensity at each location. As a consequence of Plancherel's theorem, any inner-product between A and another image (or 2D function) B can be calculated in either real or frequency space, the latter of which is often more convenient when aligning images to projections of a volume [15,16,17].
With this notation eachÂ(k, ψ) for fixed k and ψ ∈ [0, 2π) corresponds to a 'ring' in frequency space with radius k.

A.2 Rotation and translation
Using the notation above, a rotation R(γ) of an image A by angle γ ∈ [0, 2π) can be represented as: Since rotation commutes with the Fourier transform, we have: In this manner, a rotation of any image by +γ can be represented as an angular-shift of each image-ring by ψ → ψ − γ.
Likewise, a translation T (δ) of an image A by the shift vector δ ∈ R 2 can be represented as: In the Fourier domain, this action can be expressed as where the translation kernel T (δ, k) is

A.3 Fourier-Bessel coefficients
Over the course of our molecular reconstructions, we calculate many image-image inner products (see section 2.1). To ease the computational burden associated with these calculations, it is convenient to use the Fourier-Bessel basis [15,18,16,17].
To define the Fourier-Bessel coefficients of an image, recall that each image-ringÂ(k, ψ) (for fixed k) is a 2π-periodic function of ψ, which can itself be represented as a Fourier series in ψ: for q ∈ Z. The Fourier-Bessel coefficients a(k; q) of the image ringÂ(k, ψ) are given by These coefficients can be represented in a more traditional fashion by recalling that the Bessel function J q (kx) can be written as: which, when combined with the definition of the Fourier transform, implies that: which is the inner product between the original image (in real space) and a Bessel function. Note that, using the Fourier-Bessel representation of an image, the rotation R γ can now be represented as: such that the Fourier-Bessel coefficients of the rotated image ring R γ •Â(k, ·) are given by the original Fourier-Bessel coefficients a(k; q), each multiplied by the phase-factor e −iqγ . Note that (33) easily allows for arbitrary rotations by any γ, requiring only a pointwise multiplication by the appropriate phase factor. This observation allows for the efficient alignment of one image to another (see [16,17]).

A.4 Image discretization
We denote by Ω (1) and Ω(K) the ball of radius 1 and K, respectively, in either real or frequency space: and we will assume that all the images considered are supported in Ω(1). The support constraint A(x) ⊆ Ω(1) implies that the representationÂ(k) will have a bandlimit of 1, so thatÂ can be accurately reconstructed from its values sampled on a frequency grid with spacing O(1) [19]. We also assume that any relevant signal within the images has a maximum effective spatialfrequency magnitude of K; i.e., that the salient features ofÂ are concentrated in k ∈ Ω K . Consequently, we expect that the inversion accurately reconstructs A to the resolution associated with K (ignoring high frequency ringing artifacts). When these assumptions hold we won't lose much accuracy when applying these transformations in a discrete setting (as discussed below). Note that the largest K that we can reasonably expect to attain is related to the number of pixels 'N ' in the original image. More specifically, once we assume that A(x) is supported in x ∈ Ω(1), the pixel-spacing will be ∆x = 2/N , implying the Nyquist spatial-frequency is given by π/∆x = N π/2. Since we are using N A picked-particle images A j to produce a low-resolution estimate of the imaged molecule (denoted F below), we will typically choose K to be significantly smaller than N π/2, often in the range of K ∼ 50. Given the pixel spacing, this corresponds to a low-resolution reconstruction with approximately 20Åresolution. Additionally, the fact that x ≤ 1 implies that the Bessel coefficients a(k, q) will be concentrated in the range |q| k, meaning that the Bessel coefficients a(k, q) across all k ∈ [0, K] will be concentrated in q ∈ [−Q/2, +Q/2 − 1] for Q = O(K) = O(N ).
With the notation above, we can consider an N × N image as a discrete set of pixel-averaged samples within [−1, +1] 2 : for indices n 1 , n 2 ∈ {0, . . . , N − 1}. We approximate the Fourier transformÂ at any k ∈ R 2 via the simple summation:Â where x n1,n2 is the appropriately-chosen pixel-center ∆x n 1 + 1 2 , n 2 + 1 2 . Because we have assumed that the image is sufficiently well sampled (i.e., thatÂ contains little relevant frequency-content above the Nyquist-frequency K), we expect the simple sum above to be accurate.
We will typically evaluateÂ(k) for k on a polar-grid, with k-and ψ-values corresponding to suitable quadrature nodes, using the non-uniform FFT (NUFFT) to computeÂ(k, ψ) at those nodes (see [19,17]). The quadrature in k corresponds to a set of R nodes: k 1 , . . . , k R and radial weights w 1 , . . . , w R so that to high accuracy for a smooth function g(k). The exact radial quadrature nodes we select are usually those inherited from a volumetric grid (see section A.6 below). For convenience, we will sometimes refer to the weight function w(k) as some continuous interpolant of the quadrature-weights, with w(k r ) = w r . The Q angular-nodes ψ 0 , . . . , ψ Q−1 are equispaced in the periodic interval [0, 2π), with a spacing of ∆ψ = 2π/Q, and ψ q = q ∆ψ. These equispaced ψ-nodes allow for spectrally accurate trapezoidal quadrature in the ψ-direction, and we approximate the Fourier-Bessel coefficients of each image-ringÂ(k, ψ) as follows: with the index q considered periodically in the interval [−Q/2, . . . , +Q/2 − 1] (so that, e.g., the q-value of Q − 1 corresponds to the q-value of −1).

A.5 Inner products
The inner-product between an image A and a rotated and translated version of image B is denoted by: and (ignoring factors of 2π) is approximated by: The expression X can be thought of as a bandlimited inner product.

A.6 Volume notation
We use x, k ∈ R 3 to represent spatial position and frequency, respectively. In spherical coordinates, the vector k is represented as: k = k ·k, withk = (cos φ sin θ, sin φ sin θ, cos θ), with polar angle θ and azimuthal angle φ representing the unit vectork on the surface of the sphere S 2 .
Using a right-handed basis, a rotation about the third axis by angle α is represented as: and a rotation about the second axis by angle β is represented as: A rotation R(τ ) of a vector k ∈ R 3 can be represented by the vector of Euler angles τ = (α, β, γ): We represent a given volume as a function F ∈ L 2 (R 3 ), with values corresponding to the intensity at each location x ∈ Ω(1). We will refer toF (k) in spherical coordinates asF (k,k). With this notation, eachF (k, ·) corresponds to a 'shell' in frequency-space with radius k. The rotation of any volume R(τ ) •F (k,k) corresponds to the functionF (k, R(τ −1 ) ·k), where τ −1 corresponds to the vector of Euler angles τ −1 = (−γ, −β, −α).

A.7 Spherical harmonics
Using the notation above, we can represent a volumeF (k,k) as: where Y m l (k) represents the spherical harmonic of degree l and order m: with P m l representing the usual (unnormalized) associated Legendre function, and Z m l the normalization constant: The coefficients f (k; l, m) define the spherical harmonic expansion of the k-shellF (k, ·). Given a rotation τ = (α, β, γ), we can represent the rotated volumeĜ := R(τ )F as: where d l m1,m2 (β) represents the Wigner d-matrix of degree l associated with the second Euler angle β.

A.8 Volume discretization
As with our discretization of images, we assume that the volume F (x) is supported in Ω(1), and that the relevant frequency content is contained in Ω(K).
We then discretize the radial component of Ω(K) ∈ R 3 using a Gauss-Jacobi quadrature for k built with a weight-function corresponding to a radial weighting of k 2 dk; the number of radial quadrature nodes R will be of the order O(K). On each of the k-shells, we build a quadrature for k that discretizes the polar-angle β using a legendre-quadrature on [0, π], and the azimuthal-angle α using a uniform periodic grid in [0, 2π), with associated weights designed for integration on the surface of the sphere; the total number of spherical-shell-quadrature-nodes T (k r ) for any particular k r need only be O(k 2 r ) [16]. Each of the k-shellsF (k r , ·) can be accurately described using spherical harmonics of order l = O(k r ). Thus, the number of spherical-harmonic coefficients required for each shellF (k r , ·) is O(k 2 r ). The total number of spherical-harmonic coefficients required to approximateF (k,k) over Ω(K) is O(K 3 ), with a maximum degree of L = O(K). The associated maximum order will be M = 1 + 2L, which is also O(K).

A.9 Template generation
Given any volumeF (k), we can generate templates efficiently using the techniques described in [16], which make use of the spherical harmonic representation f (k; l, m). For our purposes, we generate the templates described in Eq. 7 for a collection of viewing directions (α t , β t ), for t ∈ {1, . . . , T }. The particular choice of the third Euler angle (γ t ) for each of the templates is dictated by the strategy for template generation in [16], and does not play an important role, since it corresponds to an in-plane rotation of the relevant slice. We discretize the set of viewing directions to correspond with thek discretization on the largest k-shell in our volumetric (spherical) quadrature grid: that is, T := T (k R ). These viewing directions will be used to enumerate the templates (i.e., N S = T ). Each of the templates are stored on a polar or Fourier-Bessel grid identical to those used to store the images.

A.10 Image noise model
We assume that each of the picked-particle imagesÂ j is a noisy version of a 'signal' corresponding to a 2-dimensional projection of some (unknown) molecular electron-density-function F true : where τ true j and δ true j correspond to the true (but unknown) viewing-angle and displacement associated with imageÂ j . In this context we say the image's 'true viewing direction' is defined by the first two Euler angles in τ true j , namely the azimuthal and polar angles α true j and β true j . We will model the noise in real space as independent and identically distributed (iid), with a variance of σ 2 on a unit scale in two-dimensional real space. This is a simple model of detector noise [20,21,22, 23], and does not take into account more complicated features, such as structural noise associated with image preparation (see [24]). We assume that the image values are the sum of the signal plus noise: where the signal and noise are represented by the arrays A signal n1,n2 and A noise n1,n2 , respectively. Because each entry in A noise n1,n2 corresponds to an average over the area-element ∆x 2 associated with a single pixel, we expect the variance of each A noise n1,n2 to scale inversely with ∆x 2 . That is to say, we'll assume that each element of the noise array A noise n1,n2 is drawn from the standard normal distribution N 0, σ 2 ∆x 2 , with zero mean and a variance of σ 2 /∆x 2 . With these assumptions, downsampling the image by averaging neighboring pixels will correspond to an increase in the effective pixel-size and a simultaneous reduction in the variance of the noise associated with each (now larger) pixel.
Given the assumptions above,Â will be modeled by: whereÂ noise is now complex and iid. Because the noise-term A noise is real, the noise-termÂ noise will be complex, with the conjugacy constraint thatÂ noise (+k) =Â noise (−k) † .

A.11 Variance Scaling
We defineσ 2 = π 2 σ 2 ∆x 2 as the variance on a unit scale in two-dimensional frequency space. We expect that the noise termÂ noise (k) integrated over any area element ∆k 2 will have a variance ofσ 2 ∆k 2 . Recalling our polar quadrature described above, we typically sample k along R radial quadrature nodes k 1 , . . . , k R and Q angular quadrature nodes ψ 0 , . . . , ψ Q−1 , with radial and angular weights w r and ∆ψ, respectively. In this quadrature scheme, each quadrature node k rq = (k r , ψ q ) is associated with the area element w r ∆ψ, which approximates the 'kdk' integration weight. Consequently, we expect that the noise termÂ noise (k rq ) evaluated at (k r , ψ q ) on our polar quadrature grid will have a variance ofσ 2 wr∆ψ . Given the radially-dependent variance associated with each degree of freedom in the images, we will typically consider rescaled imagesÂ(k r , ψ q ) √ w r ∆ψ, rather than the raw image valuesÂ(k r , ψ q ). Fro simplicity, we will often write this asÂ(k) w(k), where we assume that ∆ψ is the same for every k-value, and w(k) is some functional interpolant of the quadrature weights with w(k r ) = w r . This rescaling allows us to equate the standard vector 2-norm with the L 2 integral (up to quadrature error): This rescaling will allow our definition of the radial objective function below to correspond to a standard log-likelihood (see (55) in section A.14). In a similar fashion, the least-squares formulation we will use for volume reconstruction corresponds to a maximum-likelihood estimate, with similar properties holding for the inner products used for alignment (see sections A.18 and A.22 below).

A.12 Radial Principal Modes
In these sections, we briefly review the notion of radial principal modes for image compression, described in [25]. The premise is simple: given an imageÂ(k, ψ), not all the frequency ringsÂ(k, ·) are equally useful. By selecting the appropriate linear combinations of frequency rings, we can compress the images while retaining information useful for alignment. We choose the coefficients of this linear combination by solving an eigenvalue problem, which is why we refer to these linear combinations as principal modes.
This strategy has two benefits. The first is to make the overall computation more efficient; rather than retaining all R ≈ 50 radial quadrature nodes, we can often compress the majority of relevant information into a smaller number of principal modes (typically [10][11][12][13][14][15][16]. The second benefit is that, by retaining only those principal modes which are useful for alignment, we explicitly avoid those degrees of freedom within the original images that are dominated by noise and not useful for alignment. The same set of modes can be used for the experimental images and the estimated volumes. We make use of this compression for both the alignment and reconstruction steps of our EMPM iteration.

A.13 Principal image rings and principal volume shells
Let us assume that we are given a unit vector u = [u 1 , u 2 , . . . , u R ] ∈ R R . We define the linear combination [u Â ](ψ q ) as: where the rescaling factor √ w r ∆ψ is designed to equalize the variances associated with the different k-values (see section A.11).
In a moment, we will choose our u to be one of the eigenvectors of an R × R matrix, and we will refer to the linear combination [u Â ](ψ q ) as the 'principal image ring' associated with u. Note that, based on our noise model, we assume that the image noise is iid across image-rings. Consequently, the variance of the noise [u Â noise ](ψ q ) is equal to r u 2 rσ 2 , which is equal to u 2σ2 . Because u is a unit vector, this last expression is simplyσ 2 , which is the same as the variance of any individual termÂ noise (k r , ψ q ) for any particular k r . Moreover, if we consider two orthonormal vectors u 1 and u 2 , then [u 1 Â noise ](ψ q ) and [u 2 Â noise ](ψ q ) will be independent random variables, each drawn from N (0,σ 2 ).
The same strategy can be used to define a principal volume shell [u F ](k) as: using the same weights as those used for the principal image rings. In most situations, we will assume that the angular discretization ∆ψ is uniform across all the image rings, allowing us to ignore the √ ∆ψ factor in the rescaling-factor above.

A.14 Choice of principal modes
To select the actual principal modes used above, we construct the following objective function for the vector u: whereŜ(τ, δ; CT F ; F ) is the usual noiseless image template associated with viewing-angle τ , displacement δ and the contrast-transfer-function CT F (k). The integral in (55) is taken over the distributions of viewing angle, displacement and CTF function (i.e., µ τ , µ δ and µ CT F , respectively). Given a volume F , as well as distributions µ τ , µ δ and µ CT F , the objective function C(u) is (up to an affine transformation) equal to the negative log-likelihood of mistaking one noiseless principal image ring (constructed using principal-vector u) for a randomly rotated version of another (also constructed using u). Those vectors u for which C(u) is high correspond to linear combinations of frequencies which are useful for alignment (i.e. for which the typical principal image rings look very different from one another). Conversely, those vectors u for which C(u) is low correspond to linear combinations of frequencies which are not useful for alignment (i.e., for which the typical principal image rings look quite similar to one another).
The objective function C(u) is quadratic in u, which means that it can be optimized by finding the dominant eigenvector of the Hessian C := ∂ uu C, defined as the R × R matrix: Moreover, the corresponding eigenvalue will be equal to the objective function C for that eigenvector. To actually compute the Hessian C in practice, we use one of two strategies.

A.15 Empirical principal modes
At first, when we have no good estimate of either F or the distributions µ τ , µ δ or µ CT F , we simply use the images A j themselves to form a Monte Carlo estimate of the integral in (55). This boils down to the following simple approximation: where we have replaced the noiseless imagesŜ(τ, δ; CT F ; F ) andŜ(τ, δ; CT F ; F ) in the integrand of Eq. 55 with randomly selected imagesÂ j andÂ j from the original image pool, and used the empirical distribution of the images themselves to stand in for the unkown µ τ , µ δ and µ CT F (after averaging over the in-plane rotations associated with γ and γ ). The associated kernel C empir then reduces to: (57)

A.16 Estimated principal modes
Once we have a reasonable estimate of the imaged volume F estim , we can use it (along with the current estimated distributions µ estim τ and µ estim δ ) to calculate C estim via (55). In practice, we typically assume that µ estim τ is uniform and µ estim δ is an isotropic Gaussian centered at the origin with a standard deviation estimated using the current δ estim j . With these assumptions the associated kernel C estim can be computed analytically from the spherical harmonic coefficients f (k; l, m): as Q → ∞, where the CTF-modulated weight W r takes the form: and the terms E + and E − denote and wherek r = k r σ δ /2, and I q refers to the modified Bessel function of first kind of order q.

A.17 Multiple principal modes
We select the principal modes u 1 , . . . , u H to be the dominant H eigenvectors of the kernel C. We typically choose H such that the (H + 1)-th eigenvalue of C is less than the global tolerance times the first (dominant) eigenvalue of C. When we set the global tolerance to be tol ∼ 1e − 2 in our numerical experiments below, this selection criterion typically corresponds to selecting 12 − 16 principal modes from C empir , and slightly fewer (10 − 14) from C estim . We will refer to the matrix U as the collection of H principal modes: and refer to the collection of respective principal images as U Â : such that U Â (h, ψ) = u h Â (ψ).
We use the same convention for principal volumes: such that U F (h,k) = u h F (k).
We will also refer to the Bessel and spherical harmonic representations of the principal images and principal volumes as U a and U f , respectively.

A.18 Volume reconstruction
Most approaches to estimating molecular volumes in cryo-em involve some kind of reconstruction process which is performed multiple times (over the course of iterative refinement). For standard AM, the inputs to the reconstruction process are the collection of N A picked particle imagesÂ j (k) and associated CTF-functions CT F j (k), along with estimates of the viewing angles τ estim j and displacements δ estim j . The output is the estimateF estim (k). In our case, we will make use of both the standard least-squares reconstruction, as well as a more computationally efficient approximate Fourier inversion which we refer to as 'quadrature-backpropagation' (section A.20 below).

A.19 Least-squares reconstruction
The standard strategy for reconstruction typically involves solving a linear system in a least-squares sense. In real-space this least-squares system looks like: whereas in the Fourier domain it is written as: or, more conveniently, In this expression, the N A viewing angles τ j and displacements δ j are assumed to be given as input, as are the image-or micrograph-specific CTF functions CT F j (k) and imagesÂ j (k). The unknown (to be determined) is the volumeF (k). This least-squares problem is standard in maximum-likelihood estimation, and versions of this problem are solved in many software packages, often via conjugate gradient iteration applied to the normal equations.
As written, the least-squares problem above has a block-structure: different k values are independent from one another. Thus, the problem can be solved by determining the k-shellF (k,k) from the k-ringsÂ j (k, ψ) for one k value at a time. However, the problem stated in (66) is slightly misleading, as the residuals for each of the k values are not directly comparable due to the fact that different k-shells have different variances associated with them (see section A.11). To account for this, we rewrite (66) as: where the function w(k r ) is the radial quadrature weight. With this formulation, the residual (calculated using the standard vector 2-norm) is (up to a quadrature error) equal to an integral representing the log-likelihood that the volumeF (k) could have produced the observed imageŝ A j (k) (under the assumptions about the noise-model described in section A.10).

A.20 Principal-mode least-squares reconstruction
The same general methodology above can be applied to the principal images directly. To describe the modified least-squares system, we first define the low-rank decomposition of the radial component of the CTF-functions. Let CT F (k, j) correspond to the radial component of the CTF-function CT F j (k). Using the discretized analog of a functional SVD we represent CT F (k, j) by which we can truncate at rank H in accordance with the global tolerance. In the numerical experiments we present below, the image pools typically span 30 or so micrographs, usually resulting in H about 2 − 3.
Using this H -rank decomposition of the CTF-array (across the image pool), we can approximately represent the noiseless principal images as follows: After using the radial principal modes to compress both this approximation and the associated component of the right hand side of (67), we obtain: By defining the CTF-modulated principal volume: we can now write the compressed least-squared problem as: where the radial reweighting of w(k) in (67) is accounted for by our definition of U . Note that, when H = H = R the equation above is equivalent to (67). However, when H < R and H < R this will correspond to a projected version of (67).
Note also that, with this representation, we need not actually solve forF (k). Instead, we can use the principal images (on the right hand side) to solve the above problem (in a least-squares sense) for the collection of CTF-modulated principal volumesĜ(k; h ,F ). Solving (71) can be less computationally costly than solving (67) when the total number of degrees of freedom within the variousĜ is less than the total number of degrees of freedom withinF (i.e., when H × H < R). Even when this is not the case, the CTF-modulated principal volumesĜ (the solution to (71)) can still be more accurate (in terms of comparison to the ground truth) than the solutionF to (67). We see this improvement most acutely when the principal mode reduction U effectively removes noise from the imagesÂ j (e.g., if there are many frequency rings which contain little to no information).
We remark that, when dealing with least-squares reconstruction and principal mode reduction, 'the diagram commutes'. That is, because the principal mode projection U is orthonormal, the principal mode projection U CT F ·F (i.e., the principal projection of the full volume which solves the original least-squares problem in (67)) will be the same as the solution h v σ Ĝ to the principally-projected least-squares problem in (71), provided of course that the H -rank decomposition of the CTF-array is accurate (e.g., if H is taken to R). Thus, if H × H is indeed larger than R, then instead of solving (71), we can obtainĜ more easily by just solving the original least-squares problem forF and then projecting down to the CTF-modulated principal volumesĜ.
Note also that, due to linearity, the noiseless images associated with the CTF-modulated principal volumesĜ will approximately equal the noiseless principal images associated with the volumeF , with errors controlled by the truncation H of the CTF-values. That is to say, a principal mode compression of (69) can be written as: This relationship can be used to align the principal images U Â j directly to the solutionĜ, without necessarily needing to reconstructF (see section A.22).
Reconstruction via quadrature back propagation (QBP) In our experience, an accurate approximation of the solution to the least-squares problem above can require a significant amount of computation time when the conjugate gradient method requires many iterations to converge. As a more efficient (but typically less accurate) alternative, we employ a slightly different strategy. We refer to this alternative as 'quadrature-back-propagation' (QBP). The basic observation underlying QBP is that, assuming that there was no image noise and no CTF-attenuation, then the volume and images would solve: with each templateŜl • R(τ j ) •F corresponding to a cross-section of the volumeF (k). Each of these cross-sections can be viewed as a sample of the overall volumeF (k), with values given by the appropriate entry on the right-hand-side of the equation above. With sufficiently many crosssectional samples, a functional representation of the volumeF can be reconstructed via quadrature in the spherical harmonic basis via a spherical harmonic transform. That is, we have the approximate formula where {k p , w p } correspond to a quadrature rule on the sphere. The number of such nodes is of the order [O(k)] 2 on the spherical k-shell. (Equispaced points in the azimuthal direction and Legendre points in the polar angle, for example, are spectrally accurate.) Because the data points will not generally coincide with the necessary quadrature nodes, we interpolate from the given data to the nodes in some local neighborhood. A simple rule would be to take the local average over all data points that lie within some distance of the node. Note that this strategy is not iterative and is typically an order of magnitude (or so) faster than a least-squares reconstruction.
In the absence of image-noise, the error in the reconstruction of f (k; l, m) will come from the 'interpolation error' associated with interpolating from the cross-sectional sample data to the quadrature nodes. When the image-noise is nonzero and the interpolation regions are fixed in size, then the approximation toF (k,k p ) will exhibit two sources of error. In addition to the interpolation error, there is a sampling error due to the image noise at each cross-sectional sample-point. In practice, we generally choosing the local neighborhood of the quadrature nodes to be sufficiently large to contain roughly O(10 1−2 ) sample points. This is usually sufficient for two digits of accuracy (coinciding with our usual global error tolerance).
The above strategy can easily be generalized to deal with image-specific CTF-values. The QBP approach above essentially treats each sample within the local interpolation region of each quadrature node as a noisy observation ofF (k,k p ). The average can be thought of as a local maximum-likelihood estimate. Since different data points might correspond to different CTF functions, we can simply compute the CTF-weighted local averages instead. This strategy coincides with the CTF-weighting of Bayesian inference described in [7,11].

A.21 Remark regarding further template compression
If the ranks H and H are both sufficiently small (e.g., 2 − 4 and 1 − 2, respectively), then the dimensionality of the space of CTF-modulated principal templates (as measured via the numerical rank) is very small, and usually quite a bit smaller than the actual number of principal templates (i.e., the number of viewing-directions). This would allow for additional compression, using a reduced set of 'basis'-templates (which span the space of templates) rather than the actual templates themselves. In our numerical experiments we do not implement this compression because our H and H are typically too large for it to be efficient.

A.22 Image alignment
Just as for the reconstruction step, image alignment is often performed multiple times during molecular refinement. For standard AM, the inputs to the alignment process are the estimated molecular volumeF estim (k), as well as the picked-particle imagesÂ j (k) and associated CTF-functions CT F j (k). The outputs are the estimated viewing angles τ estim j and displacements k estim j for the various images.
In our case we will make use of both the standard maximum-likelihood alignment, as well as a more stable 'maximum-entropy' alignment (described below).

A.23 Alignment via maximum likelihood
For this, we calculate the quantities X ,X and Z, described in section 2.1, for a collection of T = O(K 2 ) viewing directions α t and β t corresponding to our spherical shell quadrature grid fork at k = k R ≈ K. For each of these viewing-directions (α t , β t ), we consider a range of O(K) values for γ in [0, 2π). For the displacements, we typically limit them to lie in a small disk of radius δ local ∼ 0.01 or so, taking advantage of the low numerical rank of the set of translation-operators within this disk [17,25].
Given the correlations Z(α t , β t ;Â j ), a common strategy for updating the viewing angles τ estim j and displacements δ estim j (referred to as 'maximum-likelihood' alignment), involves selecting the τ estim j (and, as a result, the δ estim j ) for each image j to maximize Z(α t , β t ) over the T viewing directions for that particular image j. In the ideal situation where the image formation model is accurate, this choice corresponds to the viewing-angle that is most likely to have produced the particular image Â j , given that the molecule is indeedF estim . In practice we do not define τ estim j to be the very best α and β for each image, instead randomly selecting an α t and β t for which Z(α t , β t ) is within the top 5 th -percentile of its values for that image [16]. The results of our numerical-experiments are quite insensitive to this value of 5%; any choice from 1% to ∼ 10% achieves similar performance for maximum-likelihood alternating minimization.
Let us define the function Sort[·] which maps lists of real-numbers to lists of positive integers such that, for any vector a ∈ R L , the value of Sort[a] l is the (integer) rank of a l (ranging from 1, . . . , L as it stands within the entries of a). As noted in section 2.1, we can see that the template ranks are obtained by applying Sort[·] to each 'column' of the array Z(α t , β t ;Â j ) (fixing j), while the image ranks are obtained by applying Sort[·] to each 'row' of the same array (fixing t). The maximumlikelihood alignment described above depends on the template ranks, while the maximum-entropy alignment (described below) depends on the image ranks.
In this context the image displacements and CTF-modulation are typically ignored, and we assume (i) the imaged molecule itself is translationally invariant in the x 3 -direction, and (ii) the imaged molecule is only observed from the x 3 -axis. Because of (i), the function F (x) will be determined by F (x 1 , x 2 ), andF (k) will be restricted to the equatorial-plane, as denoted byF (k, ψ). Additionally, due to (ii), each 'image'Â j will be a noisy version ofF (k, ψ), subject to an unknown in-plane rotation γ j :Â If we further assume that the true volumeF involves only a single k-shell, then each image is restricted to a single k-ring, and can be considered a periodic function on [0, 2π].
The MRA problem has been well studied [26,27,28], and can be used to gain insight into the behavior of many reconstruction strategies. It can be used, for example, to investigate the sensitivity of Bayesian inference (BI) to mismatches between the estimated and true noise levels (see Fig 6).

C Multi-Slice Alignment (MSA)
An 'orthogonal' sub-problem to MRA, which we refer to as Multi-Slice Alignment (MSA), can be described as follows. As in MRA, we ignore displacements and the CTF and assume that the volume is invariant in the x 3 -direction. However, in MSA, we assume that the imaged molecule is only observed from the equatorial plane. With this assumption the images will correspond to projections of F (x) onto one-dimensional lines in the x 1 -x 2 -plane (i.e., samples of the two-dimensional Radon transform of F (x 1 , x 2 )). Thus, retaining the notationÂ j , each image corresponds to a one-dimensional slice ofF (k, ψ), expressed as a function of k ∈ R for a specific (but unknown) ψ j : To simplify the scenario further, we restrict each image to the ray of positive k: A j (k) =F (k, ψ j ) + noise for k > 0.
If we further assume that the true volume F can take on arbitrary complex values, and thatF involves only a single k-shell, thenF corresponds to a closed-curve in C, written as the function G(ψ), parametrized by ψ. Each image is now simply a point, corresponding to a noisy observation ofĜ(ψ). If we assume that the true volumeF involves a discrete set of R k-values {k 1 , . . . , k R }, thenĜ will be a closed curve in C R (once again parametrized by ψ), and each image will be a vector in C R corresponding to a noisy observation ofĜ(ψ j ). An example illustrating the MSA problem for R = 1 is shown in Fig 7. Numerical experiments: We believe that MSA can be a useful testbed for studying the performance of various strategies for cryo-EM reconstruction. As a simple example, let R ≡ 1 above and defineĜ true : [0, 2π) → C using a fixed number of Fourier modes: with the shape ofĜ true determined by the 2Q + 1 coefficients g true q .
In the experiments below, we generateĜ true randomly by drawing each of the g true q from a (complex) Gaussian distribution with variance 1.
Having drawn a particularĜ true , we sample this molecule at N A points, producing a collection of simulated data pointsÂ j defined by:Â where the ψ j are drawn independently and uniformly from [0, 2π), and each j is an independent complex random variable drawn from a normal distribution with variance σ. Now, given the collection of N A images {Â j }, we can try to reconstructĜ true using the strategies discussed in the main text. Due to the simplicity of MSA, much of the apparatus described in the main text simplifies dramatically. For example, reconstruction involves a subsampled and/or non-uniform Fourier transform, and alignment involves calculating the distance between each image point and the curveĜ as a function of ψ.
Many features of this toy problem are expected. For example, when σ = 0 the images lie exactly on the curveĜ true . If we begin the alignment process with an initial guessĜ init =Ĝ true for the molecule, then each image will be aligned exactly to its correct viewing-angle ψ true j (up to a global phase-shift). Subsequent reconstruction steps will produceĜ estim =Ĝ true , so long as the number of observed points N A is at least as large as the number of modes 1 + 2Q, and the reconstruction operator is well-conditioned (i.e., if the image-points are not too closely clumped together). Put more simply, if σ = 0 and N A ≥ 1 + 2Q then, generically speaking, a maximum-likelihood alternating minimization starting withĜ init =Ĝ true as an initial-function will be stable.
However, even in this simple scenario (i.e., with σ = 0), if we were to attempt maximum-likelihood alternating minimization with a different initial function then we are not guaranteed to converge tô G. Indeed, if N A is not too large, there will typically be many different functions with the same bandlimit of Q that pass close to the observed pointsÂ j . Which of these functions we converge to will depend on the details of the reconstruction process. The situation becomes more complicated once we add image noise; when σ > 0 even an initial guess ofĜ init =Ĝ true will not necessarily lead to convergence to a functionĜ estim that is close toĜ true .
To probe this phenomenon, we perform a set of numerical-experiments. We initialize our reconstruction with a function of the form: whereĜ rand is a random function drawn the same way as (but independent from)Ĝ true , and λ is a parameter describing how closeĜ init is to the ground truth. Given a particularĜ init , we perform the reconstruction, iterating until the resultingĜ estim converges. For each trial we can then measure the error E betweenĜ estim andĜ true .
In order to reward trials whereĜ estim andĜ true are close to one another, even if they are parametrized differently, we define the error E as follows: First we define d(Ĝ estim ←Ĝ true (ψ)) to be the distance betweenĜ true (ψ) (i.e., a point in C) and the curveĜ estim . We then define the error E(Ĝ estim ←Ĝ true ) as the integral of d 2 (Ĝ estim ←Ĝ true (ψ)) over ψ. Similarly, we define d(Ĝ true ←Ĝ estim (ψ)) to be the distance betweenĜ estim (ψ) and the curveĜ true , and define the error E(Ĝ true ←Ĝ estim ) as the integral of d 2 (Ĝ true ←Ĝ estim ) over ψ. Finally, we define the overall error E = max(E(Ĝ true ←Ĝ estim ), E(Ĝ estim ←Ĝ true )) to be the larger of the two. Figure 8 illustrates the average E (accumulated over 512 random trials) for three different reconstruction strategies across a range of λ and σ values. On the left we show the results for Bayesian inference, where we set the estimated noise level equal to the true noise level (i.e., σ) when reconstructing the molecule. Note that Bayesian inference is quite accurate when the initial guess is accurate (i.e., when λ = 1). However, when λ < 1 and there is a mismatch between the initial guess and the true function, Bayesian inference typically performs quite poorly.
Similarly, we can apply standard maximum-likelihood alternating minimization (AM). This is equivalent to Bayesian inference with an estimated noise level of 0. The results for AM are identical to those shown on the left when σ = 0. However, as soon as a little image noise is added (σ > 0), AM typically converges to aĜ estim which is wildly different fromĜ true , with errors that are an order of magnitude larger than those shown for Bayesian inference (which was run with an estimated noise-level of σ). We do not display the (large) errors observed for standard AM.
In the middle, we show the results for a simple alternating minimization scheme using maximumentropy alignment at each step. On the right we show the results for our preferred EMPM strategy: namely, alternating between maximum-likelihood and maximum-entropy alignment. Note that, while our preferred strategy is less accurate than Bayesian inference when σ is small andĜ init =Ĝ true , it is more accurate in this regime when σ > 0 and λ < 1. Figure 9 illustrates the results of a similar numerical experiment, where the images are drawn from a non-uniform distribution of viewing angles. For this numerical experiment, we consider viewing angle distributions of the form exp(β cos(ψ))/(2πI 0 ), controlled by a parameter β. When β = 0 the viewing angle distribution is uniform, and as β increases, the viewing angle distribution becomes more and more sharply peaked around ψ = 0. Results for λ = 0 are shown in Fig 9. The results with β = 0 are the same (in distribution) as the λ = 0 scenario in Fig 8. Note that EMPM is typically more accurate than Bayesian inference in this regime.

D Relion De-Novo parameters:
When using Relion's de-novo molecular reconstruction, we use the following parameters. Additionally, we set the --particle diameter to be commensurate with the true molecule diameter (rounded up to the nearest 50 or 100Å).

E EMPM parameters:
When performing the EMPM reconstructions shown in the main text, we fixed K = 48, corresponding to a maximum resolution of ∼ 20Å, as is typical for low-resolution estimates. Our EMPM algorithm also requires two other user-specified parameters: the overall tolerance tol and the width of the local displacement disk δ local . As shown in Fig 13, the typical quality of our results does not change appreciably if these parameters are varied across an order of magnitude. We also run our algorithm for 64 iterations. As shown in Fig 14, the median quality of our results does not deteriorate appreciably if we reduce the number of iterations by a modest factor.
As mentioned in the main text, the overall computation time for our algorithm is roughly constant per iteration (i.e., between 30 − 50 minutes on a desktop workstation for all 64 iterations). In principle, we expect the overall computation time of our algorithm to grow as tol becomes very small or δ local becomes very large. The reason we do not see a strong dependence on computation time for the range of parameters shown here is that, within this regime, the array dimensions involved are not so large (i.e., ∼ 10 2 − 10 3 per dimension). As a consequence, roughly half of the overall computational effort is spent on memory movement rather than floating-point operations. σ true σ BI Figure 6: In this figure we illustrate the typical errors observed when using Bayesian inference to tackle the MRA problem. For this numerical experiment we setĜ true to be a random function with Q = 8 (i.e., 1 + 2Q = 17 degrees of freedom). The real (red) and imaginary (blue) parts ofĜ true are shown on the left. We set N A = 2048, with ψ true j uniformly distributed in [0, 2π]. We vary the true noise-level (horizontal) and the estimated noise-level (vertical). On the right-hand-side we show the L 2 -error betweenĜ true andĜ estim . We calculateĜ estim via bayesian-inference, starting with a random function with the same number of degrees of freedom asĜ true , and iterating until convergence. The L 2 -error on the right is scaled so that '1' corresponds to the error betweenĜ true and a constant function with the same mean. Note that when the true noise-level is moderate (e.g., around 0.5) there is only a narrow range of estimated noise-levels which admit accurate reconstructions (i.e., when σ estim is close to σ true ). Outside this narrow band the reconstruction can be quite poor, even when σ estim is only 20% higher or lower than σ true . Figure 7: In this figure we illustrate the MSA problem with a single shell at k = 1. In the first subplot we show one example of a functionĜ true (ψ) plotted as a closed curve in C (black). N A = 32 images are sampled fromĜ(ψ), with each ψ true j drawn independently from a uniform distribution on [0, 2π] (circles, colored periodically according to ψ true j ). The true noise-level used when generating the images is σ true = 0.5. Starting with the ground-truthĜ init =Ĝ true , we apply Bayesian inference (BI) with the estimated noise level σ BI set to equal σ true . The resultingĜ BI are shown in the second subplot, with early iterations in cyan and later iterations in magenta. We also apply the EMPM strategy, again starting with the ground-truth; the resultingĜ EMPM are shown in the third subplot. Errors for BI (black) and EMPM (gray) are shown in the fourth sub-plot. Note that the error for EMPM oscillates as the iterations themselves alternate between maximum-likelihood-and maximum-entropy-alignment. In Fig 8 below we use as the error for EMPM the higher of the two alternating values. The data here correspond to one trial with σ = 0.5 and λ = 1 in   Figure 8: In this figure we illustrate the recovery of Bayesian inference (left), maximum-entropy alternating minimization (center) and EMPM as applied to the MSA problem (see text). For each trial in this numerical experiment we define the true molecule using aĜ constructed with Q = 8, corresponding to 1 + 2Q = 17 degrees of freedom (each of the coefficients ofĜ are drawn independently from a normal distribution). We sample N A = 32 images fromĜ (i.e., roughly 2 times the number of modes) and add noise with a magnitude given by σ ranging from 0 to 1 (defined relative to the norm ofĜ) (horizontal). For each noise level σ we perform each reconstruction starting with an initial function of the form λ ·Ĝ + (1 − λ) ·Ĝ noise , withĜ noise drawn randomly (and independently fromĜ) and λ ranging between 0 and 1 (vertical). For each value of (σ, λ), we simulate 512 randomly generated trials; the heatmaps above average over these trials. Bayesian inference (left) is run with estimated noise level equal to the true noise level. In the middle, we show results for a simple alternating-minimization using only maximum-entropy alignment, rather than maximum-likelihood. (Maximum-likelihood-based AM fared even more poorly.) On the right we show the results for EMPM, with the error measured at the final maximum-entropy alignment step (which is typically higher than the error after the last maximum-likelihood alignment step). Figure 9: As in Fig 8, we compare various reconstruction methods for the MSA problem, but here we focus on viewing angle distribution. For this, we introduce a new parameter β ∈ [0, 3], where β controls the nonuniformity of the viewing angle, with samples ψ j drawn from the distribution: exp(β · cos(ψ))/(2πI 0 (β)). Examples of this distribution (multiplied by 2π) are shown on the far right. When β = 0 the viewing angles are uniformly distributed (cyan). When β = 1 the viewing angle distribution ranges in density (from highest to lowest) by a factor of e 2 . When β = 3 the viewing angle distribution ranges in density by a factor of e 6 , and is mostly supported in an interval with diameter ∼ π (magenta). We fix λ ≡ 0 (i.e., we start with a random function), and illustrate the errors using the same scale as in MlaFEDB LetB1 TMEM16F Figure 12: As in Fig. 3. we illustrate the aligned correlation Z mask between our various reconstructions and the ground-truth. Here we examine reconstructions for LSUbl16dep, MlaFEDB, LetB1 and TMEM16F. The different horizontal bars indicate the values of Z mask for different reconstruction strategies (coded by color) with different trials from the same strategy adjacent to one another (sorted by Z mask ). At the top of each panel (centered) is the "Oracle" low resolution reconstruction for the indicated molecule. To the right in each panel, we show an EMPM reconstruction and to the left, we show a reconstruction obtained using one of the other pipelines which achieves poorer correlation scores. Note that the Z mask values do tend to correlate with the image reconstruction quality. letB1 has a highly non-uniform distribution of ground-truth Euler angles. The distributions in the other cases are more uniform. (See Fig. 15.) Figure 13: In this figure we illustrate the sensitivity of our results to perturbations in tol (left) and δ local (right). In each case we collect the results from all our reconstructions (across 8 molecules), transform each Z mask into a normalized z-score (i.e., subtracting the mean Z mask for that molecule, and then dividing by the standard-deviation across reconstructions for that molecule), and then plot the normalized z-score as a function of the selected parameter. Each dot corresponds to a reconstruction (633 in total), with the red line illustrating a linear fit. While there is a slight benefit to larger values of δ local , and an even slighter benefit to smaller values of tol , the variation in quality across reconstructions is more than an order of magnitude larger than the variation that can be attributed to the parameters (see R 2 values). , then transform each (unmasked) Z into a normalized score (i.e., subtracting the final (unmasked) Z for that reconstruction, and then dividing by the standard-deviation of the final Z taken across reconstructions for that molecule). We then plot the 50 th , 15 th and 5 th percentile for the distribution of normalized scores as a function of the even iterations (i.e., those corresponding to an entropymaximizing alignment). Note that the former 32 iterations correspond to 'phase-one', where the principal-mode compression uses the empirical principal-modes, while the latter 32 iterations correspond to 'phase-two', where the principal-mode compression uses the volumetric principal-modes estimated after phase-one. Note that the median performance plateaus well before the final iteration of phase-two. If we were to halt our program earlier (after, say, only 16 iterations of phase-two) our results do not change considerably. Figure 15: We illustrate the distribution of polar viewing-angle β (ranging from [0, π]) for the images used in reconstructing LetB1 (left) and trpv1 (right). In each case, the left-hand histogram shows the ground-truth distribution, while the right-hand histogram shows the distribution estimated from the reconstructions shown earlier. This distribution is normalized by sin(β) so that a uniform distribution of viewing-angles on the sphere will appear constant as a function of β (rather than being proportional to sin(β)). Note that, even after this normalization, the viewing-angles for LetB1 and trpv1 are strongly concentrated in certain regions of the sphere. The ground-truth distribution of β for LetB1, for example, has more than 90% of its mass concentrated within π/14 of the equator.