Eigen-analysis reveals components supporting super-resolution imaging of blinking fluorophores

This paper presents eigen-analysis of image stack of blinking fluorophores to identify the components that enable super-resolved imaging of blinking fluorophores. Eigen-analysis reveals that the contributions of spatial distribution of fluorophores and their temporal photon emission characteristics can be completely separated. While cross-emitter cross-pixel information of spatial distribution that permits super-resolution is encoded in two matrices, temporal statistics weigh the contribution of these matrices to the measured data. The properties and conditions of exploitation of these matrices are investigated. Con-temporary super-resolution imaging methods that use blinking for super-resolution are studied in the context of the presented analysis. Besides providing insight into the capabilities and limitations of existing super-resolution methods, the analysis shall help in designing better super-resolution techniques that directly exploit these matrices.

• All emitters are similar, with the same photon counting statistics for the frame acquisition time T. Every emitter's photon emission distribution has mean value of μ and standard deviation of σ. • Each emitter's photon emission is independent of any other emitter's behaviour. As an implication, the covariance of the photon emission of any two emitters is zero. • The number of frames is large enough that the mean and standard deviations of the actual photo emissions from the emitters converge to μ and σ, respectively. Additionally, the emitters do not photo-bleach or transit into irreversible or extremely dark states during image stack acquisition. • Imaging system is diffraction limited. The point spread function of the system spans at least a few pixels.
These assumptions are quite practical and implicitly used in most super-resolution imaging methods that exploit blinking. Important mathematical notations used in this paper are listed in Supplementary Table S1. We perform eigen-analysis of the matrix = J II (1) T where I is a matrix of size N × K containing the image stack of M emitters located at ′ r m , m = 1 to M at N pixels centered at r n , n = 1 to N in K frames (see supplementary information S1 on imaging model) and the superscript T is the transpose operator. Using assumptions A1-A3, J may be written as (see supplementary information S2 for derivation from eq. (1) to eq. (2)): and G is a matrix of size N × M and its mth column G m column represents the image of an emitter. G m is related to the optical PSF (as shown in supplementary information S1).  and  denote an identity matrix and an all-ones matrix, respectively. Their dimensions are specified in their subscript. O is a symmetric circulant matrix with all diagonal elements equal to σ 2 + μ 2 and all non-diagonal elements equal to μ 2 . Eigen-analysis of O, given in supplementary information S3, reveals that eigenvectors of O can be interpret as Fourier transform operators 23 . Thus, eigen-decomposition of J comprises of spatial frequencies of G. Upon substitution of eigen-decomposition of O in eq. (2) and further algebraic manipulation (details in supplementary information S4), we obtain: G is proportional to the mean image of the image stack as discussed in supplementary information S1. An extension to the case of non-independent emitters (assumption A2 relaxed) is given in supplementary information S5 and we note that only the coefficients c 1 and c 2 change in this situation.
The above result provides the insight that the spatial distribution of the pixels and the temporal distribution of blinking can be separated from each other. While the matrices C {1,2,3} contain information pertaining the spatial distribution only of the emitters, the temporal blinking characteristics weigh the contribution of these matrices and thus modulate the content pertaining super-resolution in the measured image stack. Spatial distribution and the roles of C {1,2,3} in super-resolution. In the following, we analyse the implications of C {1,2,3} on super-resolution. We refer to 〈f(n, m)〉 m as 'raw moment over emitters' for verbal simplicity instead of the more rigorous 'raw moment of the bivariate function f(n, m) over the variable m. Similarly, we refer to 〈f(n, m)〉 n as 'raw moment over pixels' . Further, we refer to f(n, m) f(n′, m) as 'cross-pixel' term and f(n, m) f(n, m′) as 'cross-emitter' term. The elements C {1,2,3} (n′, n) of matrices C {1,2,3} comprise of the following raw moments: m m m 3 , Thus, C 1 corresponds to cross-pixel product of first order raw moment over emitters, C 2 corresponds to second order cross-pixel raw moment over emitters, and C 3 corresponds to second order cross-pixel cross-emitter raw moment over emitters. Specifically, C 1 corresponds to cross-pixel product of mean image of image stack (as also noted in Fig. 1). It essentially emulates the effect of all emitters emitting the same number of photons simultaneously. Thus, C 1 does not support super-resolution. However, the diagonal elements of C 1 (i.e. n = n′) give auto-correlation of the mean image with itself and through this, C 1 supports contrast enhancement. C 2 corresponds to the raw moment over emitters of cross-pixel product of image of each emitter individually. As shown in Fig. 1, the cross-pixel terms in C 2 are computed before the averaging over emitters is performed. Computationally, it makes the effective PSF as squared of the optical PSF, and thus 2 times sharper than the optical PSF. Through this effect, it contributes towards resolution better than the optical resolution. It essentially supports super-resolution through autocorrelation terms with respect to the emitters. C 3 contains cross-pixel cross-emitter terms, which are then averaged over all emitter pairs (m, m′); ≠ ′ m m . Thus, in comparison to C 2 , this term sharpens the PSF with respect to not only the pixels, but also the emitters. Thus, it supports super-resolution through cross-correlation terms with respect to the emitters. Also note that While C 2 has higher spatial frequencies than C 1 due to narrower PSF, C 3 suppresses the high frequency component from C 2 to yield C 1 . Thus, C 3 has a high frequency component antisymmetric to C 2 . Lastly, while a certain additive combination of C 2 and C 3 suppresses high frequency component as noted in eq. (12), the subtraction C 2 − C 3 suppresses the lowest frequency component.

Examples illustrating C {1,2,3}
. We further illustrate the roles of C {1,2,3} using two synthetic 1-dimensional examples. Simulation details are provided in supplementary information S6) and it suffices to say that the full width at half maximum (FWHM) of the imaging system is 187 nm and the resolution limit according to Rayleigh criterion 24 is 222 nm. Since the matrices are independent of emission kinetics, simulation of the properties of matrices does not require simulation of emission kinetics. In the first example (example 1), an ideal one-dimensional camera with infinitesimally small pixel size is considered. The emitters are separated by distance Δy = 100 nm. The pixel array is along the y-axis and symmetric to the emitters. The matrices C 1 , C 2 , C 3 , and C 2 − C 3 are shown in Fig. 2(a-d) respectively. It is seen that C 2 and C 3 indicate the presence of more than one emitters; C 3 is antisymmetric, and; C 2 (n, n) − C 3 (n, n) is small at pixels between the emitters. Figure 2(e) plots the mean image  G (Q1) and the diagonal elements (n = n′) of matrices C 1 , C 2 , C 3 , and C 2 − C 3 . It is seen that the plot of C 3 is the sharpest, followed by C 1 , C 2 , and  G , respectively. Figure 3(c) shows their FWHM as a function of Δy. It confirms that the FWHM of Q2 (C 1 ) is smaller than that of Q3 (C 2 ) when Δy is so small (<118 nm) that neither C 1 nor C 2 can resolve the emitters (i.e. Q2 and Q3 have only one maximum each). This indicates better sharpening of image through C 1 but not the resolvability of the emitters.
The width corresponding to C 3 (Q4) is smaller than both Q2 and Q3, thus corroborating sharpening due to both cross-emitter and cross-pixel terms. As the distance between emitters become large, the cross-emitter   components attenuate and Q4 becomes even sharper. In the limiting case, Δy→0, the widths (135 nm) of Q2-Q4 are all 2 times the width of the PSF and the width of the mean image (both equal to 187 nm). Sharpening of PSF in C 2 is illustrated through minima to maxima intensity ratio R defined in Fig. 3(b) and plotted in Fig. 3(d) as a function of Δy. Here, it becomes obvious that C 1 merely enhances the contrast of the mean image  G of emitters that are already resolvable by the optical system whereas C 2 actually improves the resolution by a factor of 2 .
Lastly, we note that C 2 − C 3 incorporates higher spatial frequencies than C {1,2,3} and information about resolvability of emitters, as noted in both the Q5 and P5 plots in Fig. 2(e,d) in comparison to Q1-Q4 and P1-P4 plots.
In the second example (example 2), we include the effect of practical pixel size. The image is taken along the y-axis using 10 pixels, each of size 100 nm, placed symmetric to the emitters. We consider two values of Δy and show G 1 , G 2 ,  G , C 1 , C 2 , C 3 , and C 2 − C 3 in Fig. 4. For Δy = 100 nm ( Fig. 4(a)), the nature of the matrices in unaffected except discretization due to finite pixel size. Even for Δy = 10 nm ( Fig. 4(b)), the nature of C 2 − C 3 is the same although its maximum value (referred to as magnitude of matrix for simplicity) is significantly smaller than the magnitudes of C 1 , C 2 , C 3 , as seen in Fig. 4(b). This implies that its contribution may be negligible in the presence of noise.
We examine the magnitudes of these matrices as a function of Δy in Fig. 5(a). Finite pixel size of 100 nm is used. Beyond the resolution limit of 222 nm, C 2 , C 3 , and (C 2 − C 3 ) converge to 0.5 whereas C 1 converges to 0.25. This is neither surprising nor interesting since the emitters are resolved anyway. The point Δy = 118 nm is interesting, since the minima to maxima intensity ratio R for Q3 (corresponding to C 2 ) falls below 1 at Δy = 118 in Fig. 3(d), indicating the minimum Δy such that C 2 can resolve the emitters. In Fig. 5(a), C 1 , C 2 , and C 3 have the same magnitudes up to Δy = 118 nm, after which the magnitude of C 1 is smaller than the magnitudes of C 2 and C 3 . C 2 and C 3 have an inflection point at Δy = 118 nm. The inflection point of C 1 and  G occurs at Δy = 172 nm which corresponds to the fall of the value of R for Q2 and Q1 below 1 in Fig. 3(d).
Since, C 2 − C 3 incorporates all the content pertaining super-resolution that cannot be captured by the optical system without blinking (see eq. (4)), we plot the ratio of the magnitude of C 1 to the magnitude of C 2 − C 3 in Fig. 5(c). It indicates that typically the magnitude of C 1 is orders of magnitude larger than C 2 − C 3 . Thus, direct exploitation of C 2 − C 3 is difficult in the presence of noise. Super-resolution techniques that use either C 2 or C 3 individually or linear combination aC 2 + bC 3 such that a, b are real numbers and b/a is not equal to −1 or M − 1 (see eq. (12)) can support super-resolution despite the presence of noise. We observed an interesting characteristic reported in Fig. 5(d) that the logarithm of ratio of the magnitude of C 1 to the magnitude of C 2 − C 3 decreases linearly with the logarithm of Δy. The observation may provide additional insight in the future.
In order to consider the effect of noise on the comparative magnitudes of matrices C {1,2,3} , we include Poisson noise with maximum signal-to-noise ratio (SNR) equal to 16 to the matrix G containing the images of individual emitters. We use this noisy matrix G to compute the matrices C {1,2,3} . We perform this operation for hundred independent executions of noise. The mean and standard deviations corresponding to the plots in Fig. 5(a) are plotted in Fig. 5(b). It is seen that noise has significant effect on the magnitudes of the matrices. The inflection points in the case of SNR = 16 are shifted by about 40 nm. We include the effect of noise in Fig. 5(c,d) as well. It is interesting to note that noise reduces the ratio of the magnitude of C 1 to the magnitude of C 2 − C 3 . This is because C 2 − C 3 retains the high frequency component characterizing the noise. This is evident in the increase in the magnitude of C 2 − C 3 in the presence of noise, as noted in Fig. 5(b). Thus, although the ratio is improved, C 2 − C 3 is not conducive to super-resolution because of the noise retained. Further, the effect of noise in degrading resolution is not negligible due to the shift in the inflexion points of C 2 and C 3 as noted in Fig. 5

(b).
Examples of super-resolution supported by C 2 and C 3 . Here, we show the ability of C 2 and C 3 and the inability of C 1 to support super-resolution. Similar to the previous example, we consider 1-dimensional sample and image regions. Image region has 40 pixels of 100 nm each. Sample has 8 emitters placed symmetrically along the y-axis at a separation of 10 nm between them. The other details are the same as the previous examples. We compute the one-dimensional eigenimages of C 1 and apply MUSICAL 21 on them. In order to study only the effect of the matrix, we do not use sliding window and the soft window function. This is equivalent to a single sliding window and Gaussian soft window function of variance equals to infinity. Further, we use the non-linear power factor of MUSICAL α = 1, so that α does not introduce additional non-linearity. Similar to C 1 , we compute MUSICAL images using C 2 and C 3 as well. The results are shown in Fig. 6(a). It is seen that MUSICAL image of C 1 cannot resolve any emitter, whereas MUSICAL images of both C 2 and C 3 are able to resolve all the emitters correctly.
We also include an example of eight emitters placed at a distance of 100 nm. Poisson noise with peak SNR 20 is used to create noisy matrix G, which is then used for computing the matrices C {1,2,3} . The results are shown in Fig. 6(b). It is seen that MUSICAL image of C 1 cannot resolve any emitter. Eight peaks are detected using both C 2 and C 3 . Additional peaks at approximately y′ = 700 nm are noted for the MUSICAL curves corresponding to C 1 and C 2 , which are due to noise. A peak at y′ = −700 nm is observed in MUSICAL curve for C 3 , which corresponds to the antisymmetry of C 3 .
Blinking statistics and the role of c 1 and c 2 in super-resolution. The ratio of c 1 and c 2 determine the ratio mean image (through C 1 ) to the content pertaining super-resolution (through C 2 and C 3 ). It is notable from Equation (5) that c 1 , c 2 > 0 and c 2 /c 1 < 1 in any situation, even if a single emitter is present. This implies that the mean image always forms the major component in the image stack. Nevertheless, the higher the ratio c 2 /c 1 , the better is the scope for super-resolution. The ratio c 2 /c 1 is characterized by the photon-counting statistics (which determines μ and σ) and the density of emitters through M. Smaller values of μ 2 /σ 2 and M are desirable for larger value of c 2 /c 1 .
First, we consider the ratio μ 2 /σ 2 . We use the blinking kinetics of quantum dots as an example for this purpose. The modeling and simulation of photon counting statistics of quantum dots 8 are given in the supplementary information S7. Characteristic photon-count distributions for three forms of photon-counting statistics due to the maximum time scales of dark states (τ d max, ) relative to maximum time scales of bright states (τ b max, ) are given in Fig. 7(a). Value of τ τ / b d max, max, less than 1 indicates longer dark states than bright states and vice versa. Thus, the ratio τ τ / b d max, max, indicates the temporal sparsity of blinking and translates to spatial sparsity of emissions if the frame acquisition time T is sufficiently low.
Further, the effect of the time scales of blinking with respect to T is illustrated in Fig. 7(b). The condition τ ∼ T b max, implies that blinking occurs at time scales comparable to the frame acquisition time.
implies that blinking occurs at time scales larger than the frame acquisition time, i.e. at slower rates in comparison to the frame acquisition rate. As a consequence, one stretch of bright state may be sampled by more than one frames. If blinking occurs at time scales comparable to the frame acquisition time, μ 2 /σ 2 is less than 0.1 irrespective of the relative lengths of the dark and bright states. Thus, if M is not too large, c 2 is comparable to c 1 . However, if blinking occurs slower than the frame rate, μ 2 /σ 2 varies significantly with the relative lengths of the dark and bright states. We include a study in which we investigate μ 2 /σ 2 as a function of the acquisition time per frame T for a given set of τ b max, and τ d max, in supplementary information S8. Localization methods explicitly exploit the condition of long dark states (see the curve for τ τ Fig. 7(a) for example). The reason is that the ratio μ 2 /σ 2 is significantly smaller than 1, as seen in Fig. 7(b). Consequently, the image stack is implicitly biased favorably towards the content pertaining super-resolution. In the explicit sense, the emitters remaining in dark state most of the time implies spatial sparsity of emissions in  each frame which allows for localization of the emitters. The implicit bias is the reason that long dark states are conducive for non-localization methods as well.
Another extreme condition corresponds to almost no dark states (such as τ ). This condition is unsuitable for localization microscopy because it implies that the emitters are emitting in most frames and thus spatial sparsity of emitters required in each frame for localization microscopy is unavailable. It is seen in Fig. 7(b) that the μ 2 and σ 2 are comparable in such situation if the blinking occurs at larger time scales in comparison to the frame rate.
In most quantum dots, τ b max, and τ d max, are comparable (see Fig. 7

(a) for example
. While such states are undesirable for localization microscopy unless multiple emitter localization techniques such as DAOSTORM 25 are employed, methods that use fluctuations only can still be conveniently applied to these techniques.
We note that the above example considered the simplified photon emission model of quantum dots whereas diverse effects at multiple time scales participate in the actual photo-kinetics 10,26 . Fluorescence correlation spectroscopy (FCS) is a useful tool for studying the associated rate constants as well as statistics such as mean (μ) and standard deviation (σ) of single molecule photo-kinetics 27 . Thus, in practice, single molecule techniques such as FCS can be used to understand and choose situations which yield smaller value of μ 2 /σ 2 and consequently a larger value of c 2 /c 1 . Now, we discuss the role of M, which increases linearly with the density of emitters. The value of c 1 increases linearly with M, thus scaling up the contribution of the mean image. Here, we note that M is the number of emitters in the focal volume, which in context of optical system is approximately the size of one PSF. Also, unlike the convention in localization microscopy, where the emitter density is defined as emitters per unit surface (or volume) per frame, here the density of emitters is the number of emitters per unit surface or volume irrespective of their blinking dynamics. Consider the benchmark examples, MT0.N1.LD and MT0.N1.HD, from the single molecule localization microscopy symposium challenge 2016 28 . The emitter densities per frame for these examples are 0.2 emitters per μm 2 per frame and 2 emitters per μm 2 per frame, respectively. However, the value of M for these examples is more comparable at 298 and 364 emitters respectively (see supplementary information S9 for details).

Discussion
Here, we discuss the contemporary super-resolution methods in the framework of the presented eigen-analysis. [16][17][18]25 , use fluorophores and photo-chemical environment that sustain long dark states and comparatively short bright states (see the curve for τ τ Fig. 7(a) for example). It is seen in Fig. 7(b) that μ 2 /σ 2 is less than 1 for small values of τ τ /  29 , the correct localizations for a given noise minimize the distance between the diagonal elements of C 2 and ∼ C 2 . Similarly, maximum likelihood estimation (MLE) 30,31 , primarily uses ∼ C 3 in addition to other non-linear terms consisting of powers of G(n, m) (see supplementary information S10).

Localization Methods. Localization methods
We now discuss 3B 15 , which does not impose the restriction of spatio-temporal sparsity strictly. MLE provides an initial localization in 3B. In the absence of sufficient spatio-temporal sparsity, MLE in the initial stage estimates fewer emitters than actually present. From supplementary information S10, it implies that the component ∼ C 3 is not close to C 3 and contributes to poorer maximum value (local maximum) of the log likelihood term maximized in MLE. 3B then simulates blinking as a random Markov process for the previous localizations. The simulation of blinking helps in off-setting this issue as described here. Although one localization may be estimated using MLE instead of few actual emitters, the simulation of temporal blinking statistics of the estimated emitter does not match the blinking statistics of the actual emitters. This indicates the need of localizing more emitters than used in the previous MLE estimation. Thus, in subsequent iterations ∼ C 3 becomes closer to C 3 if the new localizations are more accurate. Typically, time lag κ is set to zero. In this case, the SOFI image of order 2 is given as the diagonal elements of the

Methods based on statistical moments.
3 (derivation in supplementary information S11). SOFI reduces the contribution of C 1 from c 1 to c 2 and thus increasing the weight of super-resolution supporting matrices C 2 and C 3 . However, it does not use cross-pixel terms since it uses diagonal components only. Cross-correlated second-order SOFI 12 and balanced SOFI 13 incorporate cross-pixel correlations and thus improves the resolution. ESI 14 as well as higher order cumulants used in all versions of SOFI [11][12][13] , include other non-linear terms which appear as powers of G(n, m). Nevertheless, the primary term in ESI and higher order SOFI is common with SOFI of second order and zero time lag. The error of combination of central moments and cumulant is limited by the cumulant of the lowest order 13 . Thus, the roles of C 1 , C 2 , and C 3 remain the same as second order SOFI.
Balanced SOFI 13 additionally estimates average emitter blinking characteristics (related to S(m, k), max, , and M) using non-linear functions of multiple orders of cumulants and applies deconvolution on cumulants to form the super-resolved image. Fourier SOFI 12 applies a non-linear reweighing function to the Fourier transform of cumulants. Discussion on the effect of non-linear operations is out of the scope of this paper.
Methods that use eigen-decomposition directly. MUSICAL 21 computes the eigenvectors of J (referred to as eigenimages in ref. 21) and partitions the eigen-space into signal subspace  and null subspace  based whether the singular value s j (which is the square root of eigenvalue) corresponding to an eigenvector u j is above a threshold s 0 or not, respectively. The spatial frequencies of the eigenvectors of J are the same as the eigenvectors of the circulant matrix O since the eigen-decomposition of O yields a Fourier transform operator 23 . Further, the spatial frequencies of the eigenvectors increase as the singular value decreases. The eigenvector with the largest eigenvalue is  G . It is the only eigenvector of C 1 and the leading eigenvector of C 2 and C 3 . Irrespective of s 0 , it is relegated to . Thresholding relegates some of the higher spatial frequency components to  . In the presence of noise and large density of emitters (condition M > N in ref. 21), such relegation is almost inevitable. However, if M < N and noise power is small, s 0 can be selected such that no eigenvector of C 2 and C 3 is relegated to  .
MUSICAL computes in denominator and α further increases the bias, but the essential exploitation of C 2 and C 3 occurs through the separation of the spatial frequencies in  and  . SCORE 19 , unlike MUSICAL uses only . Thus, while it exploits the spatial frequencies of C 2 and C 3 present in , it rejects the spatial frequencies in  . In the case of high emitter density, the spatial frequencies of C 2 and C 3 relegated to the null space have significant role in resolution as discussed above for MUSICAL.

Concluding remarks.
Through the eigen-analysis and the discussion, we establish the roles of emitter distribution, emitter density, and blinking characteristics of emitters in supporting super-resolution. We also identify the component matrices that are exploited by the existing super-resolution algorithms. The presented linear analysis framework can be used to identify the core conceptual drivers of super-resolution and their limiting conditions in the super-resolution techniques that employ blinking. A comprehensive framework that can accommodate non-linear components may be designed in the future. Nevertheless, we expect that new experimental and computational methods may directly exploit these cross-pixel cross-emitter components and push the envelope of super-resolution further. Data Availability. The codes used for generating the data in this manuscript are available at https://sites. google.com/site/uthkrishth/musical after the acceptance of the manuscript.