An expectation–maximization approach to quantifying protein stoichiometry with single-molecule imaging

Abstract Motivation Single-molecule localization microscopy (SMLM) is a super-resolution technique capable of rendering nanometer scale images of cellular structures. Recently, much effort has gone into developing algorithms for extracting quantitative features from SMLM datasets, such as the abundance and stoichiometry of macromolecular complexes. These algorithms often require knowledge of the complicated photophysical properties of photoswitchable fluorophores. Results Here, we develop a calibration-free approach to quantitative SMLM built upon the observation that most photoswitchable fluorophores emit a geometrically distributed number of blinks before photobleaching. From a statistical model of a mixture of monomers, dimers and trimers, the method employs an adapted expectation–maximization algorithm to learn the protomer fractions while simultaneously determining the single-fluorophore blinking distribution. To illustrate the utility of our approach, we benchmark it on both simulated datasets and experimental datasets assembled from SMLM images of fluorescently labeled DNA nanostructures. Availability and implementation An implementation of our algorithm written in Python is available at: https://www.utm.utoronto.ca/milsteinlab/resources/Software/MMCode/. Supplementary information Supplementary data are available at Bioinformatics Advances online.


Introduction
In recent years, a variety of single-molecule imaging methods have been developed that are capable of resolving sub-diffraction limited, nanometer scale cellular features. These techniques include (d)STORM (Heilemann et al., 2008(Heilemann et al., , 2009Rust et al., 2006), (f)PALM (Betzig et al., 2006;Hess et al., 2006) and DNA PAINT (Jungmann et al., 2010), among others, and are collectively referred to as single-molecule localization microscopy (SMLM). SMLM relies upon harnessing the stochastic blinking of fluorescent labels to sparsify a fluorescent signal. The coordinates of each labeled molecule can then be precisely determined and, from these localizations, a detailed image rendering can be constructed.
In addition to generating super-resolved images, the localization datasets acquired by SMLM contain the necessary information for counting single-molecules. If interpreted correctly, these datasets would reveal quantitative properties of macromolecular complexes, such as their abundance and stoichiometry or oligomerization state. And these could be inferred for proteins expressed at much higher densities, or for complexes within a much closer spatial proximity to one another, than is possible with alternative single-molecule approaches, such as counting photobleaching steps (Das et al., 2007;Ulbrich and Isacoff, 2007).
Since most cellular pathways are regulated by protein complexes, being able to determine the composition, structural organization and dynamics of these complexes is essential to understanding biological function (Ishikawa, 2021;Marsh and Teichmann, 2015). For example, protein stoichiometry is a critical modulator of biological processes from stem cell division (Clark et al., 2020), to viral entry by HIV during infection (Brandenberg et al., 2015), to initiating signaling by G proteincoupled receptors (GPCRs). GPCRs, which are the largest family of transmembrane proteins in the human genome and a major pharmacological target, provide a powerful example of the need to develop better tools for measuring protein stoichiometry. Despite at least two decades of intense study, whether these proteins exist as monomers, dimers or other higher order oligomers, and the functional importance of these dynamic complexes, remains controversial to this day (Asher et al., 2021;Felce et al., 2017Felce et al., , 2018Tian et al., 2017).
Utilizing the empirical fact that many photswitchable fluorophores emit a geometrically (i.e. discrete exponentially) distributed number of blinks before photobleaching (Fricke et al., 2015;Lee et al., 2012;Nieuwenhuizen et al., 2015;Wang et al., 2016), we previously developed a statistical model for molecular counting with SMLM (Nino et al., 2017(Nino et al., , 2019. The advantage of this approach is that one only needs to calibrate for the characteristic number of blinks k defining the blink distribution for a particular fluorophore. Unfortunately, it is not always practical to perform this calibration, which limits the utility of the method. For example, we have shown that organic dyes can display an acute sensitivity to their environment, requiring repeated calibration from sample to sample within the cellular milieu under investigation (Nino et al., 2019).
In this manuscript, we show how the blinking distribution can be inferred directly from SMLM datasets by applying a learning algorithm known as expectation-maximization (EM) to a statistical mixture model of an ensemble of fluorescently labeled, macromolecular complexes. This advance obviates the need for a separate calibration of k, so greatly expands the range of experimental conditions under which qSMLM-based molecular counting can be performed. We first present the theory behind the learning algorithm, generalize it to account for labeling inefficiency, and then illustrate the accuracy of the method on both simulated datasets and experimental datasets assembled from dSTORM images of Alexa-647 labeled DNA nanostructures.

Methods
For N fluorophores that emit B blinks, the geometric, single-fluorophore blink distribution generalizes to a negative-binomial distribution of the form where we assume that each fluorophore detected emits at least one blink (B ! N) (Nino et al., 2017(Nino et al., , 2019. Now consider a mixture of macromolecular complexes built from a maximum of h monomeric subunits, with each monomer labeled by a single, photoswitchable fluorophore. The probability of observing B blinks from this ensemble is given by the weighted sum of negative-binomial distributions: with weights p k ¼ ½p 1 ; p 2 . . . p h defining the fraction of monomers, dimers, etc., within the ensemble. Since the number of blinks B is the experimental observable, Equation (2) defines a mixture model for the probability p(B). It turns out that the unobserved, or latent, parameters k and weights p k can all be estimated from the experimental data via EM. The EM algorithm is a popular learning algorithm for determining latent variables within a statistical model (Dempster et al., 1977). The EM-algorithm iterates between two modes, until convergence, to perform a maximum likelihood estimation. For our purposes, these are composed of an E-step to estimate the values of the mixture weights p k , and an M-step that estimates k by maximizing the likelihood of the model.
For each complex, we image, up to N T in total, one might observe B ð1Þ ; B ð2Þ . . . ; B ðNT Þ blinks. For a given value of the latent variable k, we define the posterior probabilities, or expectations, r ðiÞ k ¼ PrðN ðiÞ ¼ kjB ðiÞ Þ as the probability that complex i has N ðiÞ ¼ k monomeric subunits given that it emitted B ðiÞ blinks. The expectations may be written as follows: which follows from Bayes' theorem. The weights p k are then simply estimated as p k ¼ ð1=N T Þ P NT i¼1 r ðiÞ k , which defines the E-step of the algorithm.
Next, for the M-step of the algorithm, we need to compute the maximum likelihood estimate of k. Toward this end, we define the log-likelihood function for our mixture model as The maximum likelihood estimate for k can be calculated by maximizing the log-likelihood (i.e. dL=dk ¼ 0). From Equation (3), this may be written in terms of the expectations r ðiÞ k found in the previous E-step which may be solved for k after substitution of Equation (1): We have so far assumed a perfect one-to-one labeling of each monomer. To account for imperfect labeling, consider a protein complex with h subunits. If each subunit may accommodate at most one active fluorophore, the probability of having N active labels on the complex is given by a binomial distribution: where h is defined as the labeling efficiency. For covalent technologies that label proteins with organic dyes, such as SNAP-tags, h can usually be obtained through appropriate control experiments, with values often exceeding $0:9 (Calebiro et al., 2013). Fluorescent proteins, on the other hand, may be directly co-expressed with a protein of interest, but h must now account for incomplete maturation and the photoconversion efficiency. Fortunately, for many fluorescent proteins, h may be measured a priori and appears to be a fairly robust property of the fluorophore (Baldering et al., 2019;Durisic et al., 2014;Renz and Wunder, 2018). Now let us incorporate the labeling efficiency into our mixture model of Equation (2), which may be rewritten as where the normalization term in the denominator arises because pðBjNÞ is defined for N ! 1, whereas qðNjhÞ spans N ! 0. Equation (8) is equivalent to our original mixture model, Equation (2), with the weights of the latter scaled as follows: To account for fluorophore labeling inefficiencies, we simply need to perform an EM-algorithm on the original mixture model to obtain the weights ½p 1 ; p 2 . . . p h , then invert Equation (9) to obtain the corrected fraction of monomers, dimers, etc. (i.e. ½p 0 1 ; p 0 2 . . . p 0 h ). For h ¼ 2, a mixture of monomers and dimers, the scaling is and for h ¼ 3, a mixture of monomers, dimers and trimers, the scaling is where, CðpÞ ¼ p 3 ½qð1j1Þðqð2j2Þ À qð2j3ÞÞ þ qð1j2Þqð2j3Þ À qð1j3Þqð2j2Þ þqð3j3Þ½p 2 ðqð1j1Þ À qð1j2ÞÞ þ p 1 qð2j2Þ: 3 Results To assemble a control against which to benchmark the performance of this method, we acquired a series of dSTORM images of single Alexa-647 dyes centered within a DNA origami grid (to mitigate surface effects) and sparsely bound to a glass coverslip (see Supplementary Materials). The number of emitted blinks from 1095 individual dyes were first extracted from the image stacks. The blinking statistics were well described by a discrete exponential and the characteristic number of blinks was found to be k ¼ 3:660:2. Single-dye blinks were then assembled into subpopulations of monomers, dimers and trimers, of relative target weights p 1 ; p 2 ; and p 3 , respectively. For each mixture, 1000 protomers were selected (each assembled from dyes chosen at random, but allowing for repetition) and the analysis was repeated 50 times to estimate both the mean and the uncertainty. Figure 1 displays the results of this experimental benchmark. For the purely monomeric case [ Fig. 1A (i)], although overparameterized, both the monomer/dimer (M/D) and the monomer/ dimer/trimer (M/D/T) models estimate that the majority of the population are monomers, while admitting a small fraction of higher order protomers. Note, a simple one-state monomer model is not shown since p 1 ¼ 1 by default. In Figure 1A (ii), for a 0.8/ 0.2/0 mix (M/D/T), we find that both models accurately estimate the monomeric and dimeric fractions. Once again, this is despite the over-parameterization of the M/D/T model, which tends to also yield a small trimeric fraction. In Figure 1A (iii) and (iv), we challenged the algorithm to a target population at ratios 0:3=0:35=0:35 and 0:4=0:4=0:2, respectively. In both cases, the under-parameterized M/D model fails by estimating an abundance of dimers, while the M/D/T model accurately estimates each population.
In Figure 1E, we show the mean relative error in estimating the dimeric fraction p 3 by the M/D/T model over a range of protomer fractions. Because p 1 þ p 2 þ p 3 ¼ 1, we can specify all three weights in this two-dimensional heat-map, where the diagonal lines represent increments of the trimeric fraction p 3 . Indicated within the  Figure 1A (i-iv). We see that, on average, the algorithm is able to learn p 3 to within roughly 10% of the true value across the majority of the parameter space. In the insert to Figure 1B, we display the associated coefficient of variation (CV), which is the standard deviation normalized to the mean. The CV quantifies the relative uncertainty in the estimation of the trimeric fraction p 3 for an individual measurement. As expected, the CV grows substantially as p 3 ! 0. For the two examples of a system containing a trimeric population shown here (iii and iv), hp 3 i ¼ 0:36 and 0.23 with a CV of 0.21 and 0.35, respectively.
In many stoichiometric measurements, the actual degree of oligimerization is unknown. In these situations, the Akaike information criterion (AIC) may serve as a guide to selecting the appropriate probabilistic model for the data (Burnham et al., 1998). The AIC uses the maximum likelihood estimate of a model as a measure of fit, while penalizing the fit for each additional independent variable within the model. It is defined as follows: where k is the number of independent variables and Lðp; kÞ is the likelihood as given by Equation (4). Note, while the variables here are the weights p 1 ; p 2 . . . p h and k, since the weights sum to 1, k ¼ h and not h þ 1. Given multiple candidate models, the preferred model is the one with the lowest AIC score. In Figure 1C, we calculated the AIC scores across all values of p 1 ; p 2 ; and p 3 , then indicate the ranges in which the AIC score supports the M/D, M/D/T or a purely monomeric model. The AIC selection for the cases illustrated in Figure 1A (i-iv) are explicitly labeled within the figure. The AIC selects the appropriate model for fractions (i-iii); however, for the mixture with the smaller trimeric fraction (iv), the AIC inaccurately selects for the M/D model. This is despite the fact that the M/D/T model accurately estimates the protomeric fractions. Like any data limited method, the AIC score is accurate over a larger range of parameters with increasing sample size or decreasing k (see Supplementary Materials). In practice, if the AIC score between two models is close, acquiring more data may help select between models, although the amount of data needed may not always be feasible. So, while the AIC score may be used to support a particular model, it is not a definitive selection criterion.
We next consider the effects of imperfect labeling. Clusters are again assembled from the experimentally acquired dSTORM localization data of single Alexa-647 dyes, but this time, we remove dyes at random dependent upon the labeling efficiency h. The analysis is now repeated 200 times for each mixture to generate a convergent mean and uncertainty in the estimate. In Figure 2, we consider the M/D/T population fractions 0.3/0.35/0.35 and 0.4/0.4/0.2. The algorithmic accuracy at estimating the mean of the fractions p 1 ; p 2 and p 3 converges to within 20% of the true fraction down to a labeling efficiency of h ¼ 0:8, and is often well below that. Below h ¼ 0:8, the algorithm begins to struggle with the 0.3/0.35/0.35 population, although stays within 20% for the 0.4/0.4/0.2 population. The uncertainty of any single measurement is also strongly affected by h. For the smallest trimeric fraction considered here, p 3 ¼ 0:2, the CV is already $0:4 for perfect labeling increasing to $0:7 at h ¼ 0:7 (Fig. 2C). However, the larger trimeric fraction, p 3 ¼ 0:35, displays a CV that sharply grows from $0:3 when perfectly labeled to $0:7 at h ¼ 0:7 (Fig. 2B). As an aside, the AIC consistently favors the incorrect model at h ¼ 0:9 or below.
Finally, we wish to understand how k, which describes a chosen fluorophore's propensity for blinking, affects the performance of this method. While the photoswitching of Alexa-647 can be tuned to a certain degree, we found it easier to increase k than to decrease it much below $4, so we instead resort to a series of numerical simulations (see Supplementary Materials). In Figure 3, we simulated the photoswitching of an M/D/T population at various fractions p 1 ; p 2 and p 3 for k ¼ 2; 4 and 8. For each set of parameters, we performed 50 simulations of a population composed of 1000 protomers in total. The mean relative error in the estimates of p 3 is shown in the figures, while the inserts display the relative uncertainty (CV) in these estimates. Note, for k ¼ 4 the results are similar to our experimental results (Fig. 1B) with the estimates falling within $10% of the target values across most of the parameter space. This helps support any inferences obtained through these simple simulations. We find that decreasing (increasing) k decreases (increases) the errors both in the estimates of p 3 as well as the uncertainty in those estimates. While results for only the trimeric fraction p 3 are shown, this trend holds for the monomeric and dimeric fractions p 1 and p 2 .
In Figure 4, we incorporate the added effects of an imperfect labeling efficiency. Here, we only consider the M/D/T model for mixtures of 0.3/0.35/0.35 or 0.4/0.4/0.2, and contrast k ¼ 4 and k For each set of parameters, we performed 200 simulations of a population composed of 1000 protomers in total to obtain converged values for the means and uncertainties of the estimates. Once again, a reduced labeling efficiency degrades the estimations of p 1 ; p 2 and p 3 , but has its most pronounced effects on the uncertainty in the measurement. We find that while a smaller k may reduce the uncertainty in a well labeled system, these gains are modest and unable to compensate for the growing uncertainty that results with decreased labeling efficiency.

Conclusion
In conclusion, we have presented an EM approach to quantifying the monomer, dimer and trimer fractions of an oligomeric protein from SMLM datasets. This is a significant extension of our previous method for SMLM based molecular counting that was built upon the geometrically distributed number of blinks emitted by many photoswitchable fluorophores. Both the novelty and power of this new method is that the algorithm is capable of learning the photophysical parameter k needed to convert the observed number of blinks of a photoswitchable fluorophore to an estimate of the protomer fractions. We found that the resulting statistical estimates of these fractions are quite accurate when the molecules are well labeled, but the uncertainty in these estimates increase significantly with increasing labeling inefficiency. This would suggest that the method is more suited for SMLM measurements on proteins covalently labeled with organic dyes (e.g. dSTORM), where the labeling efficiency can be well over 90%, but is less relevant when working with fluorescent proteins (e.g. PALM) that often display much lower labeling efficiencies. While some of the uncertainty introduced by labeling inefficiency can be modulated by working with fluorophores that blink less, which is typically the case for fluorescent proteins, we found that reducing k yielded only modest gains.
We end by stressing that, when applying this method, it is important to validate the underlying photophysical assumptions that the technique is built upon. Current organic dyes used for dSTORM are notoriously sensitive to their environment, and the intracellular milieu could alter the photophysical properties of the dyes in a heterogeneous manner so that the blinking statistics no longer follow a single, discrete exponential distribution. Likewise, even fluorescent proteins, which are often thought to be shielded by the b-barrel, have shown environmental sensitivity (Nienhaus and Nienhaus, 2016). This can only be checked by incorporating the proper controls, say by considering a purely monomeric or dimeric population, as should be done in any molecular counting experiment. The signal from a monomeric control would directly validate the assumption of a geometric blinking distribution. And a dimeric control would further validate the methodology as the number of dimers should be approximately k=2. Fortunately, new photoswitchable fluorophores are continually being developed for SMLM, both FPs that display more consistent and improved maturation and photoconversion efficiency and dyes whose blinking properties are better shielded from the environment (Gong et al., 2019;Grimm et al., 2017). In conjunction with such technical advances, the advances presented here should make quantifying protein stoichiometry with SMLM more accessible and practical within a wider range of biological systems.