An alternative framework for fluorescence correlation spectroscopy

Fluorescence correlation spectroscopy (FCS), is a widely used tool routinely exploited for in vivo and in vitro applications. While FCS provides estimates of dynamical quantities, such as diffusion coefficients, it demands high signal to noise ratios and long time traces, typically in the minute range. In principle, the same information can be extracted from microseconds to seconds long time traces; however, an appropriate analysis method is missing. To overcome these limitations, we adapt novel tools inspired by Bayesian non-parametrics, which starts from the direct analysis of the observed photon counts. With this approach, we are able to analyze time traces, which are too short to be analyzed by existing methods, including FCS. Our new analysis extends the capability of single molecule fluorescence confocal microscopy approaches to probe processes several orders of magnitude faster and permits a reduction of photo-toxic effects on living samples induced by long periods of light exposure.

Here we provide supplementary materials and technical details that complement the main text. These include: (i) Additional analysis results that demonstrate the estimation of molecular brightness and background photon emission rates, joint posterior probability distributions, molecule locations, and additional results for multiple diffusive species. These results are repeated for simulated and experimental data.  The FCS estimate is shown by a magenta dashed line. c The joint probability distribution of the diffusion coefficient and the molecular brightness. d The posterior probability distribution of the molecular brightness. e The joint probability distribution of the diffusion coefficient and molecular brightness. f The joint probability distribution of the molecular brightness and background photon emission rates. g The posterior probability distribution of the background photon emission rate and the 95% confidence intervals of the posteriors are highlighted in cyan. The experimental fluorescent intensity trace was produced with a concentration of 100 pM of Cy3 in a 94% glycerol/water mixture. Figure 7. Estimated joint posterior probability distribution. a Experimental fluorescent intensity trace used in Fig. 5f with a length of 1000 data points and time step 100 µs. b The posterior of the diffusion coefficient. The FCS estimate is shown by a magenta dashed line. c The joint probability distribution of the diffusion coefficient and the molecular brightness. d The posterior probability distribution of the molecular brightness. e The joint probability distribution of the diffusion coefficient and the molecular brightness. f The joint probability distribution of the molecular brightness and background photon emission rates. g The posterior probability distribution of the background photon emission rate and the 95% confidence intervals of the posteriors are highlighted in cyan. The experimental fluorescent intensity trace produced is with a concentration of 1 nM of Cy3 in a 94% glycerol/water mixture. Supplementary Figure 9. Comparison with FCS and PCH. a Targeted experimental fluorescent intensity trace. The time step is 10 µs with a total time of 5 min. b PCH curve and the theoretical fit. c FCS curve and the best theoretical fit. d The portion of the trace analyzed by our method rebinned at 100 µs. e The concentration of Cy3 in the effective volume with =1, arising from the trace in (d). The experimental concentration is shown by the green line and the PCH estimated is shown by the pink line. g The posterior probability distribution of the molecular brightness with the PCH estimated of the molecular brightness shown by a solid green line. h The posterior probability distribution of the background photon emission rate with the PCH estimate of the background photon emission rate shown by a solid green line. f The posterior probability distribution of the diffusion coefficient obtained by analyzing the trace in (d). The FCS estimate of the diffusion coefficient obtained by analyzing the total time trace, shown in (a), is denoted by a pink solid line. The targeted experimental trace is generated by free diffusive Cy3 in a mixture of water and glycerol with 75% glycerol, a laser power of 100 µW and a concentration of Cy3 at 1 nM, excitation wavelength, NA and refractive index used are 532 nm, 1.42 and 1.4, respectively.

Supplementary
Supplementary Figure 10. Experimental traces of free Cy3B dyes using an elongated confocal volume. a Experimental fluorescent intensity trace used in FCS. The time trace is generated by 2.5 nM Cy3B dyes in glycerol/water mixture with 70% glycerol and laser power of 100 µW. b Auto-correlation curve of the trace in (a) and best theoretical fit. c Portion of the trace in (a) to be used as the input to FCS and our method. d Auto-correlation curve of trace in (c). e Posterior probability distribution over the diffusion coefficient estimated from the trace in (c). Traces shown in (a) and (c) are acquired at 100 µs for a total of 10 s and 0.1 s respectively. The laser power use to generate the signal (a) is 100 µW (measured before the beam enters the objective). The estimation of the diffusion coefficient as the results of autocorrelation fitting in (a) matched with Stokes-Einstein prediction, equal to 18.79 µm 2 s −1 and in (d) is 145.75 µm 2 s −1 .  Figure 11. Testing the experimental setup for effects of saturation. a Fluorescence count rate shown here is plotted with respect to the laser intensity. Laser powers are, conservatively, measured at the output of the laser. As such, we expect these to be lower at the entrance of the objective. b Auto-correlation curves of the time traces used in (a). Only three of the six acquired FCS decays are shown for clarity (they all overlap within experimental error). Fluorescent intensity time traces were obtained using a solution of Cy3B dyes in glycerol/water mixture with 70% glycerol at six different laser powers in the 10-100 µW range. Despite a slight nonlinearity in (a), there is no effect of saturation within the experimental error in (b).   Supplementary Note 3: Detailed methods description 1. Representation of molecular diffusive motion Consider a particle moving in 1D diffusion. The probability distribution p(x, t) of the particle's location obeys Fick's second law [1][2][3] and is given by the diffusion equation

Analysis of additional data for multiple diffusive species
where D is the particle's diffusion coefficient. Assuming the particle is located at x k−1 at a time t k−1 , i.e. assuming the initial condition p(x, t k−1 ) = δ(x − x k−1 ), and a free space boundary, i.e. lim x→±∞ p(x, t) = 0, we can solve this equation to obtain p(x, t) for any later time t. The solution is which equals to the probability density of a normal random variable with mean x k−1 and variance 2(t − t k−1 )D, see Supplementary Table 4. At time t = t k , we therefore have

Supplementary Equation 3
Similarly, solving the diffusion equation for particles following isotropic 3D diffusion in free space, we have which constitute the molecular motion model used throughout this study.

Description of Stokes-Einstein model
For the experimental data, we benchmark our estimates of the diffusion coefficient against the Stokes-Einstein prediction [2,3]. Namely, for a spherical particle in a quiescent fluid at uniform temperature where, D is the diffusion coefficient, k is Boltzmann's constant, T is the solution's absolute temperature, r is the hydrodynamic radius of the particle and η is the solution's dynamic viscosity [4].

FCS formulation
The formulation we used in this study to autocorrelate the synthetic and experimental time traces is where the I(t) is the number of detected photons at time t. The computational implementation uses the Wiener-Khinchin theorem [5].

Supplementary Equation 7
and, for the 2DGL PSFs, is where N is the average number of molecule in the effective volume, D is the diffusion coefficient, τ F is the triplet state relaxation time and F is the fraction molecules populating the triplet state.

Definition of molecular brightness
As the definition of molecular brightness in Eq. 2, we use the emission rate of detected photons of a single fluorophore. For a fluorophore located at (x, y, z) this is formulated as where, µ 0 is the maximum excitation intensity which occurs at the center of the confocal volume, ϕ d is the efficiency of the photon collection at the center of the confocal volume, ϕ qe is the quantum efficiency of the detector, ϕ f is the quantum efficiency of the fluorophore (i.e. quantum yield), σ is the absorption cross-section of the fluorophore, EXC(x, y, z) is the excitation profile and CEF(x, y, z) is the detection profile, i.e. collection efficiency function, which equals the fraction of the photons collected by the detector to the total photons emitted by a point source [13].

To obtain Eq. 2, we cast Supplementary Equation Supplementary Equation 10 in the simplified form
where µ mol = µ 0 ϕ d ϕ qe ϕ f σ, which we term molecular brightness at the center of the confocal volume [14], and PSF(x, y, z) = EXC(x, y, z) CEF(x, y, z), which we term the PSF.
To relate the parameter µ mol to the average photon count rate, which is commonly estimated in bulk experiments [15][16][17], we consider the spatial average of µ(x, y, z) as follows µ(x, y, z) = µ mol PSF(x, y, z) .

Supplementary Equation 12
For the specific choice of a 3DG PSF (see below), the average is computed as follows where V eff denotes the effective volume of 3DG PSF [8,18] and it is given by

Supplementary Equation 15
In other words, the molecular brightness is, by definition, approximately 2.8 times larger than the average photon count rate of a single molecule [15][16][17].
The definition of the PSF for the 3DG case is while, the definition of the PSF for the 2DGC case is

Supplementary Equation 17
For both cases, ω xy and ω z are the semi-axes lateral and parallel to the optical axis. These are represented in terms of the excitation wavelength λ exc , solution refraction index n sol , and numerical aperture NA of the microscope as ω xy = 0.61λ exc /NA and ω z = 1.5n sol λ exc /NA 2 ; for example see [27,28]. For more realistic representations, ω xy and ω z can be estimated directly based on calibration experiments with known diffusion coefficients; for example see [29]. The definition of the PSF for the 2DGL case is where ω xy , λ exc , and n sol are similar to the 3DG or 2DG cases and z R = n sol πω 2 xy /λ exc .

Description of the data simulation
To generate fluorescence intensity time traces that mimic a realistic confocal setup, we simulate molecules moving [1,30] through an illuminated 3D volume. The number of moving molecules N is prescribed in each simulation. To maintain a relatively stable concentration of molecules near the confocal volume, and so to avoid generating traces where every molecule eventually strays into un-illuminated regions, we impose periodic rectangular boundaries to our volume. The boundaries are placed at ±L xy perpendicular to the focal plane and ±L z perpendicular to the optical axis.
We assess the locations of the molecules x n k , y n k , z n k , where k = 1, . . . , K label time levels and n = 1, . . . , N label molecules, at equidistant time intervals t 1 , t 2 , . . . , t K . The time interval between successive assessments δt = t k − t k−1 , as well as the total trace duration T total = t K − t 0 , are prescribed.
Molecule locations at the first assessment x n 1 , y n 1 , z n 1 are sampled randomly from a uniform distribution with limits equal to the boundaries ±L xy and ±L z of our pre-specified volume. Subsequent locations are generated according to the diffusion model described above under a prescribed diffusion coefficient D.
Finally, we obtain individual photon emissions w k by simulating Bernoulli random variables of success probability q k = 1 − e −µ k δt , where the rate µ k gathers single photon contributions from the background and the entire molecule population according to where both background and molecular brightness, µ back and µ mol , are prescribed. The PSF model is also prescribed. To avoid artifacts induced by the periodic boundaries we impose in our volume, we ensure that L xy ω xy , L z ω z , or L z z R , where ω xy , ω z and z R characterize the geometry of the confocal volume, see Supplementary

Definition of normalized distance and numbers of molecules
As we need to estimate the positions of the molecules with respect to the center of the confocal volume, which is the point of origin, in order to ultimately estimate the number of molecules as a proxy for molecule concentration, for example Supplementary Figure 3 and Supplementary Figure 8, we must address difficulties associated with symmetries of the confocal PSF with respect to rotations around the optical axis or the focal plane. [31] For this, for a molecule at (x n k , y n k , z n k ), when the 3DG PSF is used, Supplementary Equation 16, we rely on where d n k is the normalized distance with respect to the optical axis of molecule n at time k.
These distances are obtained by setting the respective PSFs equal to exp(−(d n k ) 2 ) and are unaffected by the aforementioned symmetries, i.e. x n k → −x n k , y n k → −y n k , and z n k → −z n k .
For a given normalized distance , we define the number of molecules N k as the number of estimated (active) molecules within the corresponding distance. That is where H is the Heaviside step function, b n is the load of molecule n, and V is the volume of a designated effective region chosen to agree with the effective volume V eff used in FCS [8].

Description of the time trace preparation
The initial time trace consists of single photon arrival times which are computationally too expensive to analyze. Our method instead operates on photon intensity traces which are either obtained directly during an experiment or obtained from individual photon arrival time traces after binning. To transform single photon arrival time traces into intensity time traces, we use time bins of fixed size (main size) that typically span multiple photon arrival times. To speed up the computations further, as some bins have none of very few photons, over certain portions of the trace we use larger bins (auxiliary size). Briefly, the user specifies a minimum number of photons per bin as a lower threshold. As illustrated in Supplementary Figure 14, those bins, preselected at the main size, containing fewer photons than the specified threshold are enlarged uniformly in order to achieve an average of at least as many photons as specified by the threshold. This occasional adaptation, from the main to the auxiliary bins, becomes important in the analysis of traces from experiments held near single molecule resolution where molecule concentrations are low so that on average only one molecular passage through the confocal volume happens. Consequently, photon intensities are low, and thus the bulk of computational time otherwise would had been spent processing trace portions of poor quality (i.e. with few or no photons).
To carry our the necessary computations, as we detail shortly, we use the Anscombe transformation [32] to approximate the Poissonian likelihoods of photon intensities (see below). This approximation is robust as long as bins contain on average 4 photons or more. Thus, as a minimum requirement, we also use the aforementioned threshold to ensure the validity of the approximations.
Supplementary Note 4: Detailed description of the inference framework 1

. Description of prior probability distributions
The model parameters in our framework that require priors are: the diffusion coefficient D; the molecular brightness and background photon emission rates µ mol and µ back ; the initial molecule locations x n 1 ,y n 1 ,z n 1 ; and load prior weights q n . As we already mentioned in the main text, a prior on the population of diffusing molecules is implicitly defined by the prior on both b n and q n . Our choices are described below.

Prior on the diffusion coefficient
To ensure that the D sampled in our formulation attains only positive values, we place an inverse-Gamma prior

Supplementary Equation 24
Besides ensuring a positive D, this prior is also conjugate to the motion model we use which facilitates the computations (see below).

Priors on molecular brightness and background photon emission rates
To ensure that µ mol and µ back sampled in our formulation attain only positive values, we place Gamma priors on both

Supplementary Equation 25
Due to the specific dependencies of the likelihood (that we will discuss shortly) on the photon emission rates, conjugate priors cannot be achieved for µ mol and µ back . So, the above choice offers no computational advantage (see below) and could be readily replaced with more physically motivated choices if additional information on molecular brightness becomes available.

Priors on initial molecule locations
Due to the symmetries in the confocal PSF, i.e. a molecule at a location (x, y, z) emits the same average number of photons as a molecule at locations (±x, ±y, ±z), we are unable to gain insight regarding the octant of the 3D Cartesian space in which each molecule is located. To avoid imposing further assumptions on our framework that may determine each molecule's octant uniquely, but may limit the framework's scope to specific experimental setups, we place priors on the initial locations that respect these symmetries. Accordingly, in our framework, at the onset of the measuring period, molecules are equally likely to be located at any of the positions (±x n 1 , ±y n 1 , ±z n 1 ). To facilitate the computations (see below), we place independent symmetric normal distributions, see Supplementary  Table 4, on each Cartesian coordinate of the model molecules

Supplementary Equation 26
We want to emphasize that the symmetric priors above do not affect our estimates. According to the motion model we employ, no matter where molecules are initiated, they may subsequently move freely and eventually switch to a different octant if warranted by the data. Our symmetric priors merely indicate that for each individual molecular trajectory considered, there are another 7 symmetric trajectories that are equally likely to have occurred.

Priors and hyperpriors for molecule loads
To facilitate the computations (described next), we use a finite, but large, model population consisting of N molecules containing both active and inactive molecules. These model molecules are collectively indexed by n = 1, 2, . . . , N . As explained in the main text, estimating how many molecules are actually warranted by the data under analysis is equivalent to estimating how many of those N molecules are active, i.e. b n = 1, while the remaining inactive ones, i.e. b n = 0, have no impact whatsoever and are instantiated only for computational purposes.
To ensure that each load b n takes only values 0 or 1, we place a Bernoulli prior of weight q n . In turn, on each weight q n , we place a conjugate Beta hyperprior b n |q n ∼ Bernoulli(q n ) Supplementary Equation 27 q n ∼ Beta (A q , B q ) .

Supplementary Equation 28
To ensure that the resulting formulation avoids overfitting, we make the specific choices A q = α q /N and B q = β q (N − 1)/N . Under these choices [33][34][35][36], and in the limit that N → ∞; that is, when the assumed molecule population is allowed to be large, this prior/hyperprior converge to a Beta-Bernoulli process. Consequently, for N 1, the posterior remains well defined and becomes independent of the chosen value of N . In other words, provided N is large enough, its impact on the results is insignificant; while its precise value has only computational implications (see below). w k |{x n k , y n k , z n k , b n } n , µ mol , µ back ∼ Poisson (µ k ) , k = 1, . . . , K Supplementary Equation 40

Summary of model equations
For molecules diffusing in a confocal volume that is extremely elongated over the optical axis, the PSF approaches a cylindrical one. In this case, it is safe to eliminate the

Description of the computational scheme
The joint probability distribution of our framework is p(D, µ mol , µ back , {q n , b n , x n , y n , z n } n |w), where molecular trajectories and intensities (measurements) are gathered in Due to the nonlinearities in the PSF and the non-parametric prior on q n and b n , analytic evaluation or direct sampling of this posterior is impossible. For this reason, we develop a specialized Markov chain Monte Carlo (MCMC) scheme that can be used to generate pseudo-random samples [10,11,[37][38][39]. This scheme is explained in detail below.
In order to terminate the MCMC sampler, we need to determine when a representative number of samples has been computed. To do so, we divide the samples already computed into four portions and compare the mean values of the diffusion coefficient of the two last ones Supplementary Equation 48 where, η 1 and η 2 are the mean values of the two last portion of the sampled diffusion coefficients denoted D i and I is the total number of computed MCMC samples thus far. Following [37,38], we terminate the sampler when |η 1 −η 2 | < thr , where thr is a pre-specified threshold. Also, to avoid incorporating burn-in samples in the calculations, we ensure a minimum number of iterations I of no less than 10 4 . A working implementation of the resulting scheme in source code and GUI forms, see Supplementary Figure 15, are available through the provided source code.

Overview of the sampling updates
Our MCMC exploits a Gibbs sampling scheme [10,37,38]. Accordingly, posterior samples are generated by updating each one of the variables involved sequentially by sampling conditioned on all other variables and measurements w.
Conceptually, the steps involved in the generation of each posterior sample (D, µ mol , µ back , {q n , b n , x n , y n , z n } n ) are: (1) For each n in the molecule population  Figure 15. A working implementation of the framework described in this study is available through the provided source code. Along with this implementation, we provide a graphical user interface (GUI) that can be used to analyze intensity traces from confocal microscopy.
(5) Update jointly the loads b n for all model molecules (6) Update jointly the molecular brightness and background photon emission rates µ mol and µ back , respectively These steps are described in detail below.

Sampling of active molecule trajectories
For a given active molecule n, we update the trajectory x n by sampling from the corresponding conditional p(x n |D, µ mol , µ back , {b n , y n , z n } n , {x n } n =n , w), which we achieved through backward sampling [40][41][42].

Supplementary Equation 49
According to this factorization, we sample x n , starting from x n K and move backward towards x n 1 . To start the sampling steps, we need to determine each one of the individual densities p(x n K |D, µ mol , µ back , {b n , y n K , z n K } n , {x n K } n =n , w) and p(x n k |x n k+1 , D, µ mol , µ back , {b n , y n k , z n k } n , {x n k } n =n , w). We do this in a forward filtering approach [31,[40][41][42][43][44] which is described in detail below. where w 1:k is an abbreviation for w 1 , . . . , w k and excess parameters are shown by dots. Since the density p(x n k+1 |x n k , D) is already known, to sample x n k in backward sampling, we only need to determine the filter density p(x n k |D, . . . , w 1:k ). To be able to apply forward filtering and compute p(x n k |D, . . . , w 1:k ) efficiently [45], we use an approximate model [46], where Supplementary Equation 40, is replaced with T data (w k )|{x n k , y n k , z n k , b n } n , µ mol ∼ Normal (T mean (µ k ), 1) , k = 1, . . . , K.

Supplementary Equation 51
Here, µ k stems from Supplementary Equation 41 for 3D models and Supplementary Equation 43 for 2D models; while, T data (w) and T mean (µ) denote Anscombe transformed [32] variables defined as follows T mean (µ) = 2 µ + The Anscombe transform exploited here offers a way of transforming Poisson random variables into (approximately) normal ones [32] which facilitates the filtering process described next. The approximation we employ is highly accurate for µ 1, while acceptable accuracy is maintained so long as µ > 4 photons. Increasing the accuracy of the above approximation is achieved with either longer data acquisition times or higher laser powers.
More specifically, because the mean of the transformed observation distribution, T mean (µ k ) is a nonlinear function of the location x n k , to apply the Kalman filters we need to approximate the transformed observation distribution in such a way that its mean becomes a linear function of the location x n k . To do so, we use two common approaches: (i) extended Kalman filter (EKF) [47,[54][55][56][57], which locally approximate the transformed observation distribution around selected values; and (ii) unscented Kalman filter (UKF) [48][49][50], which globally approximate the transformed observation distribution.
As explained in detail in [31], the linearization alone is not sufficient to properly approximate the filter. This is because both EKF and UKF assume that the filter is a normal density. This assumption is problematic for our particular case which is symmetric across the origin, i.e. observations provide equal probabilities for the molecule to be in negative or positive side of the center of the PSF, i.e. ±x n k . Due to this symmetry across the yz-plane, the filtering distribution consists of two modes centered symmetrically across the origin [31]. Therefore, we compute an approximate filter distribution of the form p (x n k |D, . . . , w 1:k ) ≈ SymNormal (x n k ; m n k , c n k ) Supplementary Equation 54 where SymNormal (m n k , c n k ) denotes the symmetric normal distribution (see Supplementary Table 4). The filter's parameters m n k and c n k can be computed recursively according to p (x n k |D, . . . , w 1:k ) ∝ p w k |x n k , y n k , z n k , µ mol , µ back , {b n , x n , y n , z n } n which, for our model, reduces to p (x n k |D, . . . , w 1:k ) ∝ Normal (T data (w k ); T mean (µ k ), 1) SymNormal x n k ; m n k−1 , c n k−1 + 2D (t k − t k−1 ) Supplementary Equation 56 and, in turn, is approximated as p (x n k |D, . . . , w 1:k ) ≈ SymNormal (x n k ; m n k , c n k ) .

Supplementary Equation 57
To summarize, in the forward pass of the FFBS, we compute m n k and c n k of the filter of the molecule n, for all time levels k = 1, . . . , K, by linearizing the approximate model around x n 1 = µ xy for k = 1, and around x n k = m n k−1 for k = 2, . . . , K. Since our observation is nonlinear, to calculate the filter, we opt between two different methods: (i) Extended Kalman filter (EKF) and (ii) Unscented Kalman filter (UKF).
In the EKF, we linearize the observations to obtain a closed form for the filter (local approximation) and in the UKF we approximate the joint probability distribution of observations and locations with a multivariate normal distribution (global approximation). The reason to use either of these filters is that the EKF is computationally cheaper but less accurate. According to our analysis it may fail to provide unbiased estimates of the background photon emission rate. On the other hand, the UKF is more robust and provides background emission rate estimates, but these benefits come at an increased computational cost.
In this study, we provide both filters and allow the user to choose between them.

Extended Kalman filter
Within the EKF approximation, the normal probability distribution preceding the symmetric normal of Supplementary Equation 56 is linearized in order for their product to become a symmetric normal one.
In this case, we linearize the mean of the observation density Normal (T data (w k ); T mean (µ k ), 1), around the modes of the filter in the previous time step

Supplementary Equation 58
where the first term linearizes around x n k = −m n k−1 and the second term linearizes around x n k = +m n k−1 . Under these approximations, Supplementary Equation 56 attains an analytical solution. In detail Normal (x n k ; −m n k , c n k ) + 1 2 Normal (x n k ; +m n k , c n k ) = SymNormal (x n k ; m n k , c n k ) .

Supplementary Equation 59
The same calculations apply also for k = 1, where the starting density is replaced with the prior of Supplementary Equation 26. In this case

Supplementary Equation 61
Unscented Kalman filter The unscented Kalman filter [48,49] tries to fit the joint probability distribution of the observations and locations globally with a multivariate normal distribution to cope with the nonlinearity in Supplementary Equation 56. Specifically the product of Supplementary Equation 56 is approximated as follows Normal (x n k ; +m n k , c n k ) = SymNormal (x n k ; m n k , c n k ) Supplementary Equation 62 Since we are faced with a filter which has two symmetric modes, we calculate the filter's mean m n k and variance c n k for one of the modes only, while we recover the other mode's mean and variance by reflection.
The means, auto-and cross-covariances in one mode of the Supplementary Equation 62 are given by

Supplementary Equation 63
where q(x) = Normal x; m n k−1 , c n k−1 + 2D (t k − t k−1 ) is the probability density of one mode of the filter. The same formula applies to the other mode too.
To calculate the mean value m n k and variance c n k of each normal contributing to the symmetric normal shown above, we need to specify a set of sample points, termed sigma points in the UKF literature [48][49][50][58][59][60], to estimate the mean values and covariance matrix of the bivariate normal on which m n k and c n k depend. To specify sigma points, we first calculate sigma points x sn i and their weights g * i for a standard normal Normal(x; 0, 1) as following according to [61]. We then transform x sn i that will be used in this Normal(x; m k−1 , c k−1 + 2D(t k − t k−1 )). The transformed sigma points are

Supplementary Equation 64
Finally, given g * i , x * i , we calculate the mean and covariance of the bivariate normal previously introduced by

Supplementary Equation 65
After computing the means X n k and W n k and auto-covariances and cross-covariances xx Σ k , ww Σ k , xw Σ k , wx Σ k , the mean and variance of each mode of the filter are given by

Supplementary Equation 68
b. Backward sampling After we compute the filter densities p (x n k |D, . . . , w 1:k ) in the forward filtering step, through the EKF or UKF, we are able to sample the location x n k by using backward sampling as in Supplementary Equation 50. Specifically, given a computed filter, we sample sequentially x n k according to x n K ∼ p x n K |{x n k } k <K , D, µ mol , µ back , {b n , y n , z n } n , {x n } n =n , w Supplementary Equation 69 x n k ∼ p x n k |x n k+1 , {x n k } k <k , D, µ mol , µ back , {b n , y n , z n } n , {x n } n =n , w , k = 1, . . . , K − 1.

Supplementary Equation 70
Due to the specific choices of our problem these reduce to x n K ∼ SymNormal (m n K , c n K ) Supplementary Equation 71 x n k ∼ SymNormal (m n k , c n k ) × Normal x n k+1 , 2D(t k+1 − t k ) , k = 1, . . . , K − 1 Supplementary Equation 72 where m n k and c n k are the parameters of the filter which are calculated in the forward filtering step.

Sampling of inactive molecule trajectories
After updating the trajectories of the active molecules, we update the trajectories of the inactive ones. For this, we sample from the corresponding conditionals p({x n , y n , z n } n:b n =0 |D, µ mol , µ back , {q n , b n } n , w). Since the locations of inactive molecules are not associated with the observations in w and hyper-priors {q n } n , these conditionals simplify to p({x n , y n , z n } n:b n =0 |D, {b n } n ) which can be readily simulated jointly in the same manner as standard 3D Brownian motion.

Sampling of the diffusion coefficient
Now that we have updated the locations of active and inactive molecules, we update the diffusion coefficient D by sampling from the corresponding conditional p(D|µ mol , µ back , {q n , b n , x n , y n , z n } n , w), which, due to the spe-cific dependencies of the variables in our formulation, e.g. Supplementary Equation 24, Supplementary Equation 37, Supplementary Equation 38 and Supplementary Equation 39, simplifies to p(D|{x n , y n , z n } n ). Because of conjugacy, the latter reduces to D|{x n , y n , z n } n ∼ InvGamma (α , β ) Supplementary Equation 73 where α and β are given by 3.5. Sampling of the molecule prior weights and loads We update the load prior weights q n by sampling from the corresponding conditional p({q n } n |D, µ mol , µ back , {b n , x n , y n , z n } n , w), which simplifies to p({q n } n |{b n } n ).
For this, we use Supplementary Equation 33 and Supplementary Equation 32, and because of conjugacy, the latter distribution is sampled by sampling each q n separately according to

Supplementary Equation 75
Once the weights q n are updated, we update the loads b n by sampling from the corresponding conditional p({b n } n |D, µ mol , µ back , {q n , x n , y n , z n } n , w). We perform this sampling using a Metropolis-Hastings update with proposals of the form Supplementary Equation 76 In this case, by choosing the proposal distribution similar to the prior distribution, the acceptance ratio becomes where (b n ) old denotes the existing sample.

Joint sampling of the molecular brightness and background photon emission rates
Finally, after we updated the locations of molecules, and loads, we update the molecular brightness and background photon emission rates µ mol and µ back by sampling from the corresponding conditional p(µ mol , µ back |D, {q n , b n , x n , y n , z n } n , w), which simplifies to p(µ mol , µ back |{b n , x n , y n , z n } n , w). We carry over this sampling using a Metropolis-Hastings update where proposals for µ mol and µ back are computed according to where µ old mol and µ old back denote the existing samples. The acceptance ratio is

Supplementary Equation 79
We should emphasize, due to the weakness of the extended Kalman filter as compared to the unscented Kalman filter, we consider the background photon emission rate is fixed for EKF. So, in this case we update the molecular brightness µ mol only by sampling from the corresponding conditional p(µ mol |D, {q n , b n , x n , y n , z n } n , w), which simplifies to p(µ mol |{b n , x n , y n , z n } n , w). Again, we carry over this sampling using a Metropolis-Hastings update where proposals for µ mol are computed according to

Supplementary Equation 80
and the acceptance ration will reduces to Gamma(α, β)