Pitching single focus confocal data analysis one photon at a time with Bayesian nonparametrics

Fluorescence time traces are used to report on dynamical properties of biomolecules. The basic unit of information drawn from these traces is the individual photon. Single photons carry instantaneous information from the biomolecule, from which they are emitted, to the detector on timescales as fast as microseconds. Thus from confocal microscope it is theoretically possible to monitor biomolecular dynamics at such timescales. In practice, however, signals are stochastic and in order to deduce dynamical information through traditional means–such as fluorescence correlation spectroscopy (FCS) and related techniques–fluorescence signals are collected and temporally auto-correlated over several minutes. So far, it has been impossible to analyze dynamical properties of biomolecules on timescales approaching data acquisition as this demands that we estimate the instantaneous number of biomolecules emitting photons and their positions within the confocal volume. The mathematical structure of this problem demands that we abandon the normal (“parametric”) Bayesian paradigm. Here, we exploit novel mathematical tools, namely Bayesian nonparametrics, that allow us to deduce in a principled fashion the same information normally deduced from FCS but from the direct analysis of significantly smaller datasets starting from individual photon arrivals. We discuss the implications of this method in helping dramatically reduce phototoxic damage on the sample and the ability to monitor out-of-equilibrium processes.

Fluorescence time traces are used to report on dynamical properties of biomolecules. The basic unit of information drawn from these traces is the individual photon. Single photons carry instantaneous information from the biomolecule, from which they are emitted, to the detector on timescales as fast as microseconds. Thus from confocal microscope it is theoretically possible to monitor biomolecular dynamics at such timescales. In practice, however, signals are stochastic and in order to deduce dynamical information through traditional means-such as fluorescence correlation spectroscopy (FCS) and related techniques-fluorescence signals are collected and temporally auto-correlated over several minutes. So far, it has been impossible to analyze dynamical properties of biomolecules on timescales approaching data acquisition as this demands that we estimate the instantaneous number of biomolecules emitting photons and their positions within the confocal volume. The mathematical structure of this problem demands that we abandon the normal ("parametric") Bayesian paradigm. Here, we exploit novel mathematical tools, namely Bayesian nonparametrics, that allow us to deduce in a principled fashion the same information normally deduced from FCS but from the direct analysis of significantly smaller datasets starting from individual photon arrivals. We discuss the implications of this method in helping dramatically reduce phototoxic damage on the sample and the ability to monitor out-of-equilibrium processes.

I. INTRODUCTION
Methods to capture static biomolecular structures, such as super-resolution microscopy [33,48,62], provide only snapshots of life in time. Yet life is dynamical and obtaining a picture of life in action-one that captures * Second.Author@institution.edu diffraction-limited biomolecules as they move, assemble into and disassemble from larger bimolecular complexesremains an important challenge [55]. In fact, the creative insights directly leading to fluorescence correlation spectroscopy (FCS) [27,64]-and related methods such as FCS-FRET [78,95] and FCCS [84]-have shown that deciphering dynamical information from biomolecules does not demand spatial resolution or spatial localization.
The key idea behind these methods is to illuminate a small region of space inhomogeneously. As fluorescently-labeled biomolecules diffuse across this illuminated volume, they emit photons (i.e., they fluoresce) in a way that is proportional to the laser intensity at their respective locations [52]. Single photon detectors, often photomultiplier tubes or avalanche photodiodes, are then used to record these photons. In principle, with the appropriate electronics, photons can be recorded within µs-ms (depending on the detector, laser intensity, label emission rates and the concentration of labeled biomolecules) and this suggests that information on biomolecular diffusion could be drawn from the data on timescales approaching data acquisition.  FIG. 1. Confocal methods have the potential to reveal dynamical information on biomolecules on fast, photon-detection, timescales. (A) Schematic of an illuminated confocal volume (blue) aligned along the z-axis with labeled biomolecules emitting photons based on their location within that volume. A slice perpendicular to the z-axis is shown in green with the instantaneous position of biomolecules shown in red. (B) Sample synthetic photon arrival trajectory with diffusion coefficient of 1 µm 2 /s and its representation in terms of inter-arrival times. The synthetic trace is produced using 4 single biomolecules for a total time of 30 ms (with a total of about 1500 photon arrivals) and background and particle photon emission rates 10 3 photons/s and 4 × 10 4 photons/s respectively. (C) Comparison between diffusion coefficient estimates using our current method, explained later, and FCS as a function of the number of photons used in the analysis. (D) Auto-correlation curve, G(τ ), of sample synthetic trace in (B). Normally, in FCS analysis, much longer traces would be used to generate much smoother G(τ )'s that would then be fitted to determine diffusion coefficients.
The fundamental quantity measured in a single focus confocal setups are individual photon arrival times, from which photon inter-arrival times, i.e., the intervals between adjacent photon arrivals, can be readily obtained. If we were imaging biomolecules fixed in space and under homogeneous (uniform) illumination, these inter-arrival times-excluding other experimental and label artifacts such as detector noise, background photons, and label photo-physical kinetics-would be identically distributed and so uncorrelated with each other. However, interarrival times derived from conventional single focus con-focal experiments encode the number of biomolecules in the vicinity of the confocal volume, their diffusional dynamics, their position with respect to the confocal volume center in addition to experimental artifacts. Consequently, inter-arrival times are correlated with each other, and in principle these correlations can be exploited to infer dynamical quantities of the underlying biomolecular system.
In practice, these correlations are exploited by collecting billions of photons over several minutes [79] and temporally auto-correlating the fluorescence intensity measurements [14,27,32,64]. For sufficiently long time traces, the stochasticity in the number of labeled biomolecules contributing photons as well as their positions in the illuminated volume (and thus their photon emission rates) are averaged out. As such, the expression for the fluorescence intensity time-autocorrelation takes a simple form that-under strong assumptions on the illuminated confocal volume's geometry and the biomolecule's point spread function (PSF)-can be summarized in simple analytic formulas. As we will see, our method can treat any known confocal volume geometry.
In these formulas [14,27,32,64], for long enough time traces-and provided dynamical processes under observation are stationary over the long data acquisition timesonly the average number of illuminated biomolecules over the entire trace matters. This average number appears as a pre-factor in the fluorescence intensity time autocorrelation. This observation suggests that, if we were to analyze single photon arrivals, then we would need to determine the stochastic number of biomolecules contributing photons.
A critical disadvantage of auto-correlative methods across all of FCS and related methods remains the stark timescale separation between data collection and the timescale required to deduce dynamical information; see Fig. (1). A method that could take direct advantage of single photon arrivals could reveal dynamical information on timescales several orders of magnitude faster than traditional FCS analysis. As a result, rapid or nonequilibrium processes could be studied or phototoxicity damage could be reduced dramatically [62,65,77,94] simply by reducing the time during which the sample is kept under illumination. This would be especially relevant for in vivo FCS applications [25,74,85,96]. Alternatively, data collected on timescales far exceeding those typically required by auto-correlative analysis methods in FCS could be used to deduce subtle dynamical changes or slight differences in diffusion coefficients if multiple species diffuse simultaneously through the confocal volume.
Previously proposed methods to analyze single photon measurements [2,38,39,43,70,76,99] make assumptions that render them inappropriate for imaging biomolecules (or particles, more broadly speaking) moving through inhomogeneously illuminated volumes [38]. For example, they may assume that the biomolecules are fixed in space. Thus the photon inter-arrival times reflect the biomolecular conformational transitions in fluorescence resonance energy transfer (FRET) [38][39][40] but not the non-stationarity in the inter-photon arrival statistics inherently introduced by biomolecular diffusion across an inhomogeneously illuminated volume [23,24,37,39]. By contrast, to be able to estimate the diffusion coefficient of labeled biomolecules used in a confocal experiment, we must be able to determine the number of biomolecules responsible for the observed photon arrival time trace. So far, the tools to deduce this information from photon time traces have been unavailable. Otherwise, naively, many biomolecules with small diffusion coefficients emitting photons at the periphery of the illuminated confocal volume could be mistaken for fewer biomolecules with higher diffusion coefficients in the most illuminated region (the center) of the confocal volume. Misidentifying the number of biomolecules, say, or incorrectly assessing their positions may give rise to incorrect estimates of quantities such as diffusion coefficients; see Fig. (2).
More concretely, formulating a likelihood to estimate the diffusion coefficient [7] from photon arrival data demands that we know the number of biomolecules contributing photons as well as their locations across time. As the number of biomolecules instantaneously located within the confocal volume are unknown, all reasonable possibilities would need to be considered and rankordered using expensive pre-or post-processing model selection heuristics [55,92]. This has not been achieved yet, in part, because it is computationally infeasible to consider all possible biomolecule numbers across all possible positions with any possible diffusion coefficient.
Analyzing single photons arrivals from a confocal setup to derive dynamical information therefore demands fundamentally new tools. The single photon analysis methods mentioned earlier use parametric representations (i.e., fixed and known biomolecule numbers). Yet, those are precisely the assumptions we need to lift in analyzing single photon level data from confocal experiments.
The conceptually new mathematical framework that we propose can winnow down infinite possibilities (i.e., infinite populations of biomolecules potentially contributing photons) to a finite, manageable, number in a mathematically exact manner. Such a framework avoids compromising temporal resolution and allows us to directly deduce dynamical information from single photons. This framework is Bayesian nonparametrics (BNPs) [31], a powerful set of tools still in its infancy and largely unknown to the Physical Sciences. Within BNPs, we use mathematical devices, such as the beta-Bernoulli process, that have only been proposed in the last few years [1,15,71].
BNPs are poised to reshape our interpretation of inherently noisy biophysical data as they directly deal with model uncertainty (in this case, it is the uncertainty in the number of biomolecules). Contrary to traditional (parametric) Bayesian methods [34,46,55,98], BNPs begin by assuming models infinite in size. That is, they Estimates of diffusion coefficients from photon arrival traces strongly depend on the number of biomolecules we believe are contributing to the trace. Thus we need systematic tools to estimate biomolecular numbers on-the-fly if we are to deduce diffusion coefficients from mere thousands of photons. Here, in generating synthetic data we use a 3D Gaussian volume of size ωxy = 0.3µm nm and ωz = 1.5µm. The synthetic trace is produced using a total of 4 biomolecules for a total time of 30 ms (with a total of about 1800 photon arrivals) and generated by using a diffusion coefficient of 1µm 2 /s and background and particle photon emission rates of 10 3 photons/s and 4 × 10 4 photons/s respectively. To estimate the D parametrically, we start with the following number of biomolecules, N =1 (A), N=2 (B), N=3 (C), N=4 (D), and N=5 (E). The good fit provided by (D)-and the mismatch in the peak of the posterior distribution over the diffusion coefficient and correct value of the diffusion coefficient (dotted line) in all others-underscores why it will be critical for us, or any method analyzing single photon data in the context of confocal experiments, to correctly estimate the number of biomolecules contributing to the trace in order to deduce dynamical parameters such as diffusion coefficients. In the supplementary information, we redo the analysis shown here with a higher diffusion coefficient.
begin by assuming an infinite number of biomolecules contributing photons. Then, among other quantities, they deduce full posteriors over model sizes from the data available. Concretely, for the case of our single photon time traces otherwise analyzed using FCS (see description in caption of Fig. (1)), they assign posterior probabilities over a number of quantities including all possible number of biomolecules responsible for the data and their associated locations at each photon arrival. Thus, BNPs turn the otherwise difficult problem of model-selectionthat is, determining how many biomolecules, say, contribute photons-into a parameter estimation problem.

II. MATERIALS AND METHODS
Here we describe the mathematical formulation of our method.
We begin with the overall input which consists of photon inter-arrival times, ∆t = (∆t 1 , ∆t 2 , . . . , ∆t K−1 ) where ∆t k denotes the time interval between adjacent observations of photons which occur at times t k and t k−1 , for k = 1, . . . , K − 1. We also use as an input the confocal volume shape and background photon emission rate. The confocal volume shape is usually calibrated (normally under the restriction of a Gaussian PSF) experimentally by analyzing species of known diffusion coefficient (determined from other methods), while the background photon emission rate is determined by analyzing a time trace drawn from a control confocal experiment void of fluorophores within the sample.
To derive estimates for the diffusion coefficient, we need to determine intermediate quantities which include: i) particle photon emission rates of biomolecular labels; and, most importantly, ii) the unknown number biomolecules contributing photons to the trace as well as their location with respect to the center of the illuminated confocal volume. Below, we explain each one of these steps in some detail. More details and an implementation of the whole method are available in the Supplementary Information (SI). In addition, source code and a GUI version of our implementation are provided online through the SI.

A. Model Description
We begin with the distribution according to which the k th photon observation, ∆t k , is derived Accordingly, ∆t k follows an exponential probability distribution [40,75] with rate µ k . In fact, the rate µ k gathers the photon emission rates of all biomolecules which depend on their respective locations relative to the confocal volume center (see below) [12]. In addition to the biomolecule photon emissions rates, µ k also includes background photons where n µ n k is the photon emission rates µ n k gathered from individual biomolecules, that we index with n = 1, 2, . . . , and µ back is a background photon emission rate assumed uniform over time. In fact, µ n k and µ back are the emission rates of detected photons, which due to  Here, R n k = (x n k , y n k , z n k ) indicates the location of biomolecule n in Cartesian space at time t k ; µpart marks the particle photon emission rate obtained by individual biomolecular labels. During the experiment, only a single observation (inter-arrival time) ∆t k is recorded, thereby combining photon emissions between t k and t k−1 from every biomolecular label and background at each step. The diffusion coefficient D specifies the evolution of the biomolecular locations which influence the photon emission rates and eventually the recorded photon inter-arrival times ∆t k . The load, or auxiliary variables b n , and corresponding prior weights q n , are introduced in order to infer the unknown biomolecule population size. In the graphical model, the known or measured data is highlighted by dark shaded circles and model variables, which require prior probability, are showed by blue circles. For details about the graphical model see SI.
optical and detector limitations, are typically lower than the total photon emission rates [66,67].
Next, we incorporate the dependency of the emission rate on location [4,20,102] with other effects such as camera pinhole shape and size, the laser intensity, laser wavelength, and quantum yield [12] into a characteristic PSF that combines excitation and emission PSFs and, just as with traditional FCS applications [14,27,32,64], we use a 3D Gaussian geometry [52] µ n k = µ part exp −2 where the parameter µ part indicates the particle emission rate of detected photons for a single biomolecule, which happens when the biomolecule is at the center of the confocal volume where illumination is highest. We mention here that, in the Results section, we report by common convention the emission rates which differ from the particle emission rates (i.e., those achieved at the center of the confocal volume) by a factor of √ 8. The difference is explained in the SI.
In (3), ω xy and ω z are semi-minor and semi-major axes of the confocal volume.
Finally, the dynamics of the system are assumed purely diffusive where D is the diffusion coefficient, which we assume identical for all biomolecules. As we explain in the SI, these probabilities result directly from the solution to the diffusion equation with open boundaries. A graphical summary of the entire formulation is shown on Fig. (3).

B. Model Inference
All quantities which we infer-such as the diffusion coefficient, D, biomolecular locations through time, (x n k , y k n , z k n ) and the particle photon emission rate µ partare introduced as model variables in the preceding formulation. We estimate these variables within the Bayesian paradigm [34,55,92]. The model parameters such as D and µ part require priors. Additionally, we have to consider priors on the initial biomolecular locations, i.e., at the time of the very first photon arrival, (x n 1 , y n 1 , z n 1 ). Options for these priors are straightforward and, for computational convenience, we adopt the distributions described in the SI.
Before we proceed any further, we revise (7) as The variables b n , defined for each model biomolecule, take only values 1 or 0 where we have b n = 0 when the n th biomolecule does not contribute photons to the measurements. This indicator variable allows us to define an arbitrarily large population of model biomolecules, technically an infinite number. The ability to recruit, from a potentially infinite pool of biomolecules, the precise number that contributes to the measured trace is the chief reason we need to abandon the parametric Bayesian paradigm.
After introducing b n , we can estimate the number of biomolecules that contribute photons, i.e., those biomolecules where b n = 1, simultaneously with the rest of the parameters simply by treating each b n as a separate parameter and estimating its value.
To estimate b n , we consider a Bernoulli prior with a Beta hyper-prior in order to learn how many biomolecules are active where A q and B q are (hyper-hyper-)parameters specifically chosen to allow n → ∞. Both steps can be combined and are termed the Beta-Bernoulli process [1,15,71]; see SI for more details.
With these priors, we can now form a joint posterior probability p(D, µ part , (x n k , y n k , z n k , b n , q n ) n k |∆t) including all unknown variables which we want to determine. The nonlinearities in the PSF, in terms of variables (x n k , y n k , z n k ) n k , and the nonparametric prior on (b n , q n ) n exclude analytic forms for our posterior. For this reason, we adopt Markov Chain Monte Carlo (MCMC) computational schemes [34,53,80] that can be used to generate pseudo-random samples from this posterior. The scheme is described in the SI and a working implementation is given in the single biomolecule. We obtain the synthetic data presented in the Results by standard pseudo-random computer simulations [6,30,42,45,49] that simulate Brownian motion of point biomolecules through a typical illuminated confocal volume. We provide finer details and complete parameter choices in the SI. Biomolecule solutions were made by suspending Cy3 dye in glycerol/buffer (pH 7.5, 10 mM Tris-HCl, 100 mM NaCl and 10 mM KCl, 2.5 mM CaCl 2 ) at various v/v, to a final concentration of either 100 pM or 1 nM. The biomolecule solution was placed in a glassbottomed fluid-cell, assembled on a custom designed confocal microscope [58] and a 532 nm laser beam was focused to a diffraction-limited spot on the glass coverslip of the fluid-cell using a 60x, 1.42 N.A., oil-immersion objective (Olympus). Emitted fluorescence was gathered by the same objective and focused onto the detection face of a Single Photon Avalanche Diode (SPAD, Micro Photon Devices) that has a highest count rate of 11.8 Mc/s. A bandpass filter was set in front of the detector relayed only the fluorescence from Cy3 and blocked the back-scattered excitation light. TTL pulses, triggered by the individual photon arrivals on the SPAD, were timestamped and registered at 80 MHz using a field programmable gate array (FPGA, NI Instruments) using custom LabVIEW software [82]. The excitation source was a supercontinuum fiber laser Fianium WhiteLase SC480 (NKT Photonics, Birkerod, Denmark) operating at a repetition rate of 40 MHz. The excitation wavelength (550 nm) was selected by an acousto-optic tunable filter (AOTF), and the exiting beam was collimated and expanded by approximately a factor of three to slightly overfill the back aperture of the objective lens. The light was reflected into the objective lens (Zeiss EC Plan-Neofluar 100x oil, 1.3 NA pol M27, Thornwood, NY, USA) by a dichroic mirror (Chroma 89016bs). The same objective was used to collect the fluorescence from the sample, and passed through a band pass filter (Chroma ET575/50m) before being focused into a position motorized pinhole wheel set at 25 µm. The output of the pinhole was focused on a multimode hybrid fiber optic patch cable (M18L01, Thorlabs, NJ, USA) which was coupled to a single-photon avalanche diode (SPCM AQRH-14, Excelitas Technologies, Quebec, Canada). The detected photons were recorded by a TimeHarp 200 time-correlated single photon counting board (PicoQuant, Berlin, Germany) operating in T3 mode. The sample (≈50 µL) was contained in a perfusion chamber gasket (CoverWell) adhered on a glass coverslip.
The sample was 20 nM 5-Carboxytetramethylrhodamine (5-TAMRA, purchased from Sigma-Aldrich, USA) dissolved in doubly distilled water at room temperature. Acquisition time was 10 minutes; 4.61×10 6 photons were recorded in the experiment. Data was analyzed using the Sympho Time 64 software from PicoQuant (Berlin, Germany). The autocorrelation function was calculated from the photon arrival data and background and after pulsing correction were performed as implemented in the software using the microscopic delay times (measured relative to the onset of the excitation pulse).

III. RESULTS
Our goal is to characterize quantities that describe biomolecular dynamics at the data-acquisition timescales of conventional single focus setup in order to obtain diffusion coefficient estimates.
Our overall input consists of: i) the measured photon inter-arrival times ∆t = (∆t 1 , ∆t 2 , ..., ∆t K−1 ), where ∆t k−1 = t k −t k−1 with t k denoting the arrival time of the k th photon; ii) the geometry of the illuminated volume specified through a characteristic PSF; and iii) the background photon emission rate. The latter are determined through a pre-calibration step [12].
In order to estimate the biomolecules' diffusion coefficient, D, we also estimate intermediate quantities (namely, particle photon emission rates, biomolecule positions over time and biomolecule numbers) detailed in the methods section. These intermediate quantities demand that we use Bayesian nonparametrics to determine quantities that a priori may be arbitrarily large such as the number of the diffusing biomolecules contributing photons.
Within the Bayesian paradigm [55,98], our estimates take the form of posterior probability distributions over unknown quantities. These distributions combine parameter values, probabilistic relations among different parameters, as well as the associated uncertainties. To quantify this uncertainty, we compute a posterior vari-ance and use this variance to construct error-bars (i.e., credible intervals). According to the common statistical interpretation [34,98], the sharper the posterior, the more conclusive (and certain) the estimate.
Here, we first validate our method on synthetic data where ground truth is readily available. We then test our method on experimental data collected in two labs utilizing different FCS setups. For the latter cases, we compare our analyses to the results obtained from autocorrelative methods used in FCS.

A. Method Validation using Simulated Data
To demonstrate the robustness of our approach, we simulate raw traces under a broad range of: i) trace lengths, Fig. (4); ii) concentrations of labeled biomolecules, Fig. (5); iii) diffusion coefficients, Fig. (6); and iv) particle photon emission rates, Fig. (7). The parameters not varied are held fixed at the following values: diffusion coefficient of 1 µm 2 /s which is typical of slower in vivo conditions [5,63,81,96], particle photon emission rates of 4 × 10 4 photons/s [57,76], and 4 as the number of labeled biomolecules contributing to the trace. We chose a small number of biomolecules (as opposed to a larger number of biomolecules) as this scenario presents the greatest analysis challenge (as very few photons, and thus thus little information, is gathered). It is important to emphasize here that the particle photon emission rate is defined here at the center of the confocal volume (which is the maximum number of emitted photons per label emitted from a single fluorophore that happen to be located at the center of the confocal volume). For more detail please see the Supplementary Information (SI).
A critical and recurring point throughout this section will be that the traces we analyze are shorter than those that could be meaningfully analyzed using FCS as illustrated in Fig. (1). While we focus on the diffusion coefficient estimation here, we note that our framework supports more detailed parameter estimation. In particular, in the SI we illustrate simultaneous estimation of particle photon emission rates; locations of biomolecules relative to the confocal volume center; as well as numbers of biomolecules within given regions around the confocal volume center.

Trace Length
We evaluate the robustness of our method with respect to the length of the trace (i.e., the total number of photon arrivals) at fixed number of biomolecules, diffusion coefficient, and particle photon emission rates. The first important finding is that, for the values of parameters selected, we need 2 orders of magnitude less data than FCS; see Fig. (1C). For instance, to obtain an estimate of the diffusion coefficient within 10% of the correct result, we require ≈ 10 3 photons (directly emitted from the labeled biomolecule), while FCS requires ≈ 10 5 photons. To determine this percentile we chose the mean value of the diffusion coefficient posterior, and measure the percentage difference of this mean to the ground truth known for these synthetic traces.
In general, the difference in the photon numbers demanded by our method and traditional FCS analysis depend on a broad range of experimental parameter settings. This is the reason, we explore different settingsholding all other settings fixed-in subsequent subsections as well as the SI.
Another important concept-illustrated in Fig. (1C) and Fig. (4) that will keep re-appearing in subsequent sections-is the concept of a photon as a unit of information. The more photons we have, the sharper our diffusion coefficient estimates. This is true, as we see in Fig. (1C) and Fig. (4), for increasing trace length. Similarly, as we will see in subsequent subsections, we also collect more photons as we increase the concentration of labeled biomolecules (and thus the number of biomolecules contributing photons to the trace), increase the particle photon emission rates of biomolecular labels (or dyes in the case of free dye experiments), or decrease diffusion coefficients of biomolecules. In the latter case, slower diffusion coefficients provides more time for each biomolecule to traverse the illuminated region.

Labeled Biomolecule Concentration
To test the robustness of our method under different concentrations of labeled biomolecules at fixed diffusion coefficient, and particle photon emission rates, we simulate diffusing biomolecules with average concentrations of 10 biomolecules/µm 3 (Fig. (5A, B)), 4 biomolecules/µm 3 (Fig. (5C, D)), all the way down to 1 biomolecule/µm 3 (Fig. (5E, F)), diffusing at 1 µm 2 /s for a total time 30 ms. The particle photon emission rates per label of labeled biomolecules and background are taken to be 4 × 10 4 photons/s and 10 3 photons/s respectively, which are typical of confocal microscope [76]. Our method is general and can readily treat arbitrary confocal volume geometries such as a Lorentzians [10] or Airy patterns [105] though, for simplicity, in generating synthetic data we use a 3D Gaussian volume of size ω xy =0.3 µm and ω z =1.5 µm [54]). Fig. (5) summarizes our results and suggests that posteriors over diffusion coefficients are broader-and thus the accuracy with which we can pinpoint the diffusion coefficient drops-when the concentration of labeled biomolecules is lower. Intuitively, we expect this result as fewer biomolecules within the confocal volume provide fewer photons and each photon carries with it information that helps refine our diffusion coefficient estimate. In the opposite extreme, when biomolecular numbers are high and detectors themselves are saturated, both the experimental setup and one's choice of single-photon analysis methods are inappropriate.  3)) and the bottom panel is its related photon arrival trace that comes from summing the emission rates over all biomolecules plus the background emission. This trace is acquired for a total time of 30 ms (with a total of about 10 3 photon arrivals) and generated using a diffusion coefficient 1µm 2 /s with background and labeled particle photon emission rates of 10 3 photons/s and 4 × 10 4 photons/s respectively with about 1000 photons. The 80% dash line shows 80% of the original trace with about 800 photons. Similarly for the 50% and 30% dashed lines which include 500 and 300 photons respectively to their left. (B1-B4) Correspond to posterior probability distributions drawn from traces of differing length. The horizontal axis is shown in logarithmic scale for convenience.

Diffusion Coefficients
We repeat the simulations of the previous subsection to demonstrate, using synthetic data, the robustness of our method with respect to the diffusion coefficient magnitude; see the basis of the fact that photons carry information-we expect that faster moving species will give rise to broader posterior distributions as these species emit fewer photons, and thus provide less information, as they traverse the confocal volume.

Particle Photon Emission Rates
Fig. (7) illustrates the robustness of our method with respect to particle photon emission rates (i.e., set by the laser power used in the experimental setting or the choice of fluorescent label or dye) by fixing the num- ber of biomolecules, diffusion coefficient (1 µm 2 /s), and background emission, 10 3 photons/s. To accomplish this, we simulate increasingly dimmer biomolecules until the biomolecule signature is effectively lost as compared to the background. As expected, weaker traces lead to poor estimation and broader posterior estimates over diffusion coefficients since these traces are associated with greater uncertainty.

B. Estimation of Physical Parameters from Experimental Data
To evaluate our approach on real data, we used experimental data collected under a broad range of conditions. That is, we used measurements from two different experimental setups and different fluorescent biomolecules, namely Cy3 and 5-TAMRA dyes. Additional differences between the setups include different numerical apertures (NA), laser powers, and overall detection instrumentation as detailed in the methods section.
Figs. (8)-(10) were collected using the Cy3 dye and these results were used to benchmark the robustness of our method on dye concentration, diffusion coefficients, and laser power. The last figure, Fig. (11), was collected using a 5-TAMRA dye in order to test the robustness of our method on a different fluorophore.

Benchmarking on experimental data using Cy3
We began by verifying our method on mixtures of water and glycerol. In spite of the fact that we only utilize short segments in our analyses, the collected traces are long enough (≈5 min each) to be meaningfully analyzed by traditional auto-correlative analysis used in FCS. The result of the analysis of the full trace by FCS yielded a diffusion coefficient that we treated as an effective ground truth. We can then ask how long of a trace our method requires, as compared to FCS, in order for our diffusion coefficient estimate to converge to this ground truth.
The latter strategy addresses the following complication: we anticipate that the PSF may be distorted from the idealized shape assumed especially with increasing amounts of glycerol [36]. However, the same (possibly incorrect) PSF is assumed in both FCS and our method in order to compare both methods head-to-head. Thus, concretely, we will be asking: how many photons do we need to converge to the same result as FCS (irrespective of whether the FCS result is affected by PSF distortion artifacts)?
In part, we also do this as auto-correlative analysis in FCS cannot easily treat complex PSF shapes. Indeed, the fluorescence intensity time auto-correlation function loses its analytic form otherwise afforded by assumptions of PSF Gaussianity. We add that while we do not here learn the PSF shape using our method (instead, we assume its shape), our method can naturally treat any PSF shape provided it is known, i.e., has been pre-calibrated (see SI section 1).
Data were obtained under a range of conditions to test our method, namely different: i) dye concentrations, Fig. (8); ii) diffusion coefficients, Fig. (9); and iii) laser power, Fig. (10). As before, longer traces, higher concentrations, lower D's, and higher laser power result, on average, in sharper estimates with results on diffusion coefficient estimates still converging with at least 2 orders of magnitude less data than FCS for equal accuracy in Figs. (8), (9), and Fig. (10) respectively. We mention "on average" as individual traces are stochastic. Thus, some traces with higher concentrations of labeled biomolecules, say, may happen to have fewer biomolecules contribute photons to the traces than experiments with lower concentrations. Fig. (8) recapitulates our expectations derived from the synthetic data shown earlier (Fig. (5)), when dye concentrations are low yielding wider posterior for our diffusion coefficient and correspondingly sharper posteriors for the higher concentration. Here, similarly to Fig. (1) we compare our method's diffusion coefficient estimate to FCS as a function of the number of photons used in the analysis, Fig. (8A) and Fig. (8B), both in good agreement with FCS estimates, produced by the entire traces (i.e., 1000× longer).
Similar to the analysis of the synthetic data, by comparing different diffusion coefficients, the slower a diffusing biomolecule is, the more time it spends within the confocal volume, the more photons are collected providing us with a sharper posterior estimate of its diffusion coefficient (see Fig. (9)).
Similarly to the synthetic data shown earlier (Fig. (7)), Fig. (10) now considers the robustness of our method to lower laser power which, as expected, yields a wider posterior for our diffusion coefficient and correspondingly sharper posteriors for higher laser power. Here, we compare our method's diffusion coefficient estimate to FCS as a function of the number of photons used in the analysis, Fig. (10A) and Fig. (10B), both in good agreement with FCS estimates, produced from the entire trace.

Benchmarking on experimental data using 5-TAMRA
Here we switch to a different dye, different setup and acquisition electronics as detailed in the Methods section. Our sample now contains 20 nM of 5-TAMRA dissolved in water. We evaluate the diffusion coefficients versus the value obtained from both FCS and NMR [35], see Fig. (11). Although we used short segment for the anal- ysis, we compare our results to analyses on much longer (≈10 min) traces from FCS.

IV. DISCUSSION
The photon is the basic unit of information in all spectroscopic and imaging applications [68]. Our method takes a Bayesian nonparametric approach to tackling single photon arrival data to learn dynamical quantities from as few as hundreds to thousands of photons from confocal imaging data. This is by contrast to conventional auto-correlative methods used in FCS [27,64,78,95] that require dramatically more data, i.e., datasets several orders of magnitude larger in either total duration or total number of photon arrivals, to characterize dynamical quantities with similar accuracy. These quantities include diffusion coefficients, instantaneous number of biomolecules, the position of biomolecules within the illuminated confocal volume, and particle photon emission rates for each biomolecular label.
There have been partial solutions to the challenge of interpreting data at the single photon level often outside FCS applications. Indeed, existing methods make assumptions that would render them inapplicable to diffusion in an inhomogenesouly illuminated volume. For example, they assume uniform illumination [38,76], bin the data and thereby reduce temporal resolution to ex-  ploit existing mathematical frameworks such as the Hidden Markov model [2,16,39,70,97], or focus on immobile biomolecules [23,24,37,39]. As a result, correlative methods continue to dominate confocal data analysis as they have for the past 40 years [52]. To take full advantage of the single photon data- Diffusion coefficient estimates are obtained using 5-TAMRA dye.
Here we use 5-Carboxytetramethylrhodamine (5-TAMRA) at a fixed concentration (20 nM). Similarly to Fig. (1), we compare our method's diffusion coefficient estimate (circle green dots) to FCS (blue stars) as a function of the number of photons used in the analysis. For sake of comparison, the diffusion coefficient above was estimated from FCS (red dash line) using the entire trace (≈ 10 min with 6 × 10 6 photons). For more details on how the FCS analysis is achieved see the SI. and the inherent non-stationarity between photon arrivals both due to biomolecular diffusion in an inhomogeneously illuminated volume and the stochastic number of biomolecules contributing photons to the tracenew Mathematics are required. In particular, analyzing data derived from mobile biomolecules within an illuminated confocal region breaks down the perennial parametric Bayesian paradigm that has been the workhorse of biophysical data analysis [16,44,55,60,70,92,96]. We argue here that BNPs-which provide principled extensions of Bayesian logic [31,93]-show promise in Biophysics [46,55,87,88,90,92] and give us a working solution to prior parametric challenges.
Within BNPs, the mathematical device on which we most heavily rely, the Beta-Bernoulli process, is only half a decade old.
Our new tools open up the possibility to explore at the single photon level non-equilibrium processes resolved on fast timescales [3,69], say ms, that have been the focus of recent attention. In addition, armed with a rigorous framework, it is now possible to extend the proof-of-principle framework we put forward to treat different effects that lie beyond the current scope of this work. In particular, we can think of extending our framework to treat multiple colors [22], triplet effects, complex biomolecule photophysics [41] (such as biomolecular blinking [89,104] and photobleaching [56,91]), more complex biomolecule motion models [103] other than free diffusion [50], complex distorted PSF models [29], or even incorporate chemical reactions among the biomolecules [8,100].

SUPPLEMENTARY MATERIALS
In this supplement, we present additional analyses and technical details expanding upon the material presented in the main text. These include: i) additional analysis of synthetic and experimental traces that include the estimation of diffusion coefficient, average photon emission rates and the estimation of biomolecular locations/concentrations; ii) additional details on the theoretical approaches used; and iii) a complete description of the inference framework developed that includes choices for prior probability distributions and a computational implementation.

Analysis of additional simulated data
In the main text we focused on the estimation of: diffusion coefficients, D, with values ≈ 1µm 2 /s which are typical of slower in vivo applications [5,63,81,96], The trace is acquired just as before for the same length of time, 50ms, but with lower particle photon emission rate 4 × 10 4 photons/s (background fixed at 10 3 photons/s). (D) Posterior probability distribution over the diffusion coefficient estimated from the trace. Comparing (B) and (D) suggests our current method works with different particle photon emission rates. The higher the particle photon emission rate, the more photons we collect and the better the estimate we obtain (i.e., the sharper our posteriors).
biomolecule photon emission rates of ≈ 4×10 4 photons/s [57,76], and 4 biomolecules typically emitting photons over the course of a time trace. Here, we explore broader parameter ranges.

Analysis of additional Experimental data
Next, we used our method on real traces derived from a single focus confocal setup. Here, we used measurements from two different experimental setups and different fluorescent biomolecules, namely Cy3 and 5-TAMRA dyes. We used two separate experimental datasets: Fig. (15 Comparing these posterior shows the higher the particle photon emission rate, the more photons we collect and the better estimates we obtain. was a result using Cy3 dye to benchmark the diffusion coefficients and particle photon emission rate; Fig. (16) was using a different dye, 5-TAMRA dye, to evaluate the approach with different diffusion coefficients, and particle photon emission rate, with different experimental setup.
In the Methods section of the main text we have provided details of each setup. Briefly,the differences between these two setups are summarized in numerical aperture (NA), laser power, photon emission rate of biomolecule, and type of dye which used. Below, we illustrate additional results showing the es- timation of diffusion coefficient, particle photon emission rate and a comparison with conventional FCS analyses.

Detailed methods description
Brief description of fluorescence correlation spectroscopy (FCS) In FCS the primary quantity of interest are spontaneously fluctuating fluorescence intensities [26,83]. Correlations in fluorescence intensities are used to determine physical parameters such as local concentrations, diffusion coefficients or characteristic rate constants of interor intramolecular reactions.
The normalized time autocorrelation function of the  fluorescence intensity is defined as: where I(t) is the fluorescence intensity. For one freely diffusing species of biomolecules, this autocorrelation is where, D is the diffusion coefficient, ω xy and ω z are the confocal volume axes along the x, y and z directions. Here, the first term of Eq. (11) is the inverse of the average number of biomolecule in the confocal volume. Therefore, by knowing the dimensions ω xy and ω z from calibration measurements, the local concentration of fluorescent biomolecules can be determined. For details on correlative analysis are contained in the cited literature [14,26,27,32,64,83].

Representation of biomolecule diffusive motion
For a biomolecule diffusing along one direction, the probability distribution p(x, t) of its position x at time t satisfies the diffusion equation [51] ∂p where D is the diffusion coefficient of the biomolecule. We can solve this equation to obtain p(x, t) for any later time t by assuming that the biomolecule is located at x k−1 at time t k−1 , i.e., using the initial condition p(x, t k−1 ) = δ(x − x k−1 ), and a free space boundary, i.e., lim x→±∞ p(x, t) = 0. The solution is which is the probability density of a normal random variable with mean x k−1 and variance 2(t − t k−1 )D; for example, see Table II. At time t = t k , we therefore have where ∆t k−1 = t k − t k−1 in accordance with the notation in remaining on this study. Similarly, solving the diffusion equation for biomolecules following isotropic diffusion in free space along all three Cartesian directions we obtain that constitute the biomolecular motion model used throughout this study.

Explanation of data simulation
To generate traces imitating raw data generated from a realistic confocal setup, we simulate biomolecules mov-ing through a three dimensional (3D) illuminated volume. The number of moving biomolecules, N , is predefined in each simulation. We apply periodic boundaries to our volume of L xy and L z parallel to the focal plane and optical axis, respectively, to keep a relatively stable concentration of biomolecules near the confocal volume.
We denote the locations of the biomolecules as x n k , y n k and z n k , where k labels time levels and n = 1, 2, . . . labels biomolecules. The total trace duration T total = t K − t 1 , is predefined. The inter-arrival times or time intervals between successive assessments (recorded photons) ∆t k−1 = t k −t k−1 , are generated through pseudo-random computer simulations and recorded for subsequent analysis.
The locations of the biomolecules at the first assessment x n 1 , y n 1 and z n 1 are randomly sampled from the uniform distribution with limits equal to the boundaries ±L xy and ±L z of our prescribed simulation region. Subsequent locations are generated according to the diffusion model described above under a predefined diffusion coefficient D.
We obtain photon inter-arrival times, ∆t = (∆t 1 , ∆t 2 , . . . , ∆t K−1 ), by simulating exponential random variables of rate µ k . For independent background and particle photon emission rates, the corresponding exponential emission mean rates µ k depend on a Gaussian point spread function (PSF) as follows and both background, µ back , and the particle photon emission rate of detected photons for a single biomolecule, µ part , are predefined. Here, ω xy and ω z are the semi-minor and semi-major axes lateral and parallel to the optical axis [11,36]. Similarly, the dimensions of the PSF, ω xy and ω z , are also prescribed. To avoid artifacts induced by the periodic boundaries, we ensure that L xy ω xy and L z ω z .

Definition of normalized distance and biomolecule concentration
To estimate the diffusion coefficient, as we point out in the main text, we need to estimate the positions of the biomolecules with respect to the center of the confocal volume. Therefore, we have to address problems associated with the symmetries of the confocal PSF with respect to rotations around the optical axis or the focal plane. To bypass such difficulties, for a biomolecule at locations x n k , y n k , and z n k , we rely on the following normalized distance which is unaffected by the aforementioned symmetries. Now, we define the concentration of biomolecules c k as the number of (active, i.e., photon emitting) biomolecules within the corresponding confocal volume, for a given ratio of normalized distance, d n k , and distance respect to effective volume, .
where H is the Heaviside step function, b n is the load of biomolecule n, and V is the volume of the designated region of interest (ROI) which is given by Calculating the average biomolecular photon emission rate In this study the emission rate of detected photons for a single fluorophore at position x, y, z is used. This is formulated as the product µ(x, y, z) = µ 0 ϕ d ϕ de ϕ f σ EXC(x, y, z) CEF(x, y, z).
(22) Here, µ 0 and ϕ d are the maximum excitation intensity and the efficiency of the photon collection at the center of the confocal volume respectively, ϕ de is the efficiency of the detector, ϕ f is the quantum efficiency of the fluorophore, σ is the absorption cross-section of the fluorophore, EXC(x, y, z) is the excitation profile and CEF(x, y, z) is the detection profile known as collection efficiency function that equals to the portion of the detected photons emitted by a point source [28]. Now, by revising Eq. (22) to the form of µ(x, y, z) = µ part PSF(x, y, z) where µ part = µ 0 ϕ d ϕ de ϕ f σ is called the brightness at the center of the confocal volume and PSF(x, y, z) = EXC(x, y, z) CEF(x, y, z) as the detection volume, we obtain PSF(x, y, z) = exp −2 x 2 +y 2 To relate our single molecule photon emission rate µ part to the average photon count rate typically determined in bulk experiments, we compute the spatial average of µ(x, y, z) as follows µ(x, y, z) = µ part PSF(x, y, z) µ(x, y, z) = µ part exp −2 where V ef f is the effective volume of our PSF [54,79] and it is given by As a result, our particle photon emission rate µ part is related to µ(x, y, z) according to

Description of Wilson-Hiferty approximation
To perform the necessary computations of the next section, we use a Wilson-Hiferty transform [101] to approximate the probability density of exponential random variables. We use this approximation in section IV when we sample the locations of biomolecules within our overall Gibbs sampler.
To apply the Wilson-Hiferty approximation, first we transform our observation random variable ∆t k to a new random variable ρ, where ρ = 2µ k ∆t k . A change of variables, indicates that ∆t k |µ k ∼ Exponential(µ k ) implies where χ 2 (2) denotes the chi-square probability distribution with 2 degrees of freedom, see Table II. In other words, the above random variable ρ is a chi-square variate, ρ ∼ χ 2 (2). Now, by applying another transformation, where ξ = 3 ρ/2, according to [101], ξ follows an approximately normal probability distribution So, by ξ = 3 ρ/2 and ρ = 2µ k ∆t k , we conclude 3 √ ∆t k = ξ/ 3 √ µ k . Therefore, since 3 √ ∆t k = ξ/ 3 √ µ k , by transforming the distribution function in Eq. (28), we establish the approximation Detailed description of the inference framework

Prior probability distributions
Within the Bayesian paradigm, all unknown model parameters need priors. These parameters are: the diffusion coefficient D; the particle photon emission rate µ part ; the initial biomolecule locations x n 1 ,y n 1 ,z n 1 ; as well as the load prior weights q n . Our priors are detailed below.

Prior on the diffusion coefficient
To make sure that the D sampled in our formulation attains only positive values, we place an Inverse-Gamma prior This prior is conjugate to the motion model which simplifies the computations shown below.

Priors on particle photon emission rate
To guarantee that µ part sampled in our formulation also only attains positive values, we choose a Gamma prior Due to the likelihood's dependence on µ part , conjugate priors cannot be achieved for µ part . So the above choice offers no computational advantage (see below) and could readily be replaced with more physically motivated choices if additional information on photon emission rates is available.

Priors on initial biomolecule locations
Because of the symmetries inherent to the confocal volume, e.g., a biomolecule at a location (x, y, z) gives rise to the same photon emission rate as a biomolecule at location, (−x, −y, −z), we place priors on the initial locations that respect these symmetries. To eliminate further assumptions on our framework that may determine each biomolecule's position uniquely, but may limit the framework's scope to certain experimental setups, we apply a symmetric prior. Consequently, in our framework, at the beginning of the measuring period, biomolecules are equally likely to be located at any of the locations (±x n 1 , ±y n 1 , ±z n 1 ). To simplify the computations, we place independent symmetric normal distributions, on each Cartesian coordinate of the model biomolecules We emphasize that the above symmetric prior does not affect our estimation. According to the biomolecule's motion model we apply, no matter where biomolecules are initiated, eventually they may move freely and subsequently switch to a different octant if necessary. Our symmetric priors merely indicate that for each individual biomolecule trajectory considered, there are 7 other symmetric trajectories that are equally likely to have occurred.

Priors and hyperpriors for the loads
To simplify the computations described in the next section, we use a finite, but large, model population consisting of N biomolecules that contain active and inactive ones. These biomolecules are collectively indexed by n = 1, 2, . . . . As explained in the main text, estimating how many biomolecules are actually warranted by the data analyzed is the same as estimating how many of those N biomolecules are active, i.e., b n = 1, while the rest are inactive, i.e., b n = 0, and so have no effect and are instantiated only for computational purposes.
We use a Bernoulli prior of weight q n to make sure that each load b n takes only values 0 or 1. Moreover, on each weight q n , we place a conjugate Beta hyperprior b n |q n ∼ Bernoulli (q n ) (36) q n ∼ Beta (A q , B q ) .
To guarantee that the resulting formulation avoids overfitting, we make the specific choices A q = α q /N and B q = β q (N − 1)/N . Under these choices [1,15,71,72], and in the limit that N → ∞ (that is, when the assumed biomolecule population is large), this prior/hyperprior choice converges to a non-parametric Beta-Bernoulli process. Consequently, for N 1, the posterior remains well defined and becomes independent of the chosen value of N . In other words, provided N is large enough, its impact on the results is insignificant; while its precise value has only computational implications (see below) .

Summary of model equations
Below we list the equations used in our framework.
Posterior samples are generated according to Gibbs sampling [34,61,80,92,98] by updating each one of the involved variables sequentially. We achieve this by sampling a variable conditioned on all other variables and the given photon inter-arrival times ∆t. Conceptually, the steps involved in the generation of each posterior sample (D, µ part , {q n , b n , x n , y n , z n } n ) are described below.
Since the locations of the inactive biomolecules do not contribute in the photon inter-arrival times ∆t, see

Sampling active biomolecules locations
To sample the location of an active biomolecule n in 3D, (x n , y n , z n ), we use forward filtering and backward sampling (FFBS) [9,19,50,86]. In particular, we update each dimension sequentially from the following full conditional probability distributions P (x n |D, µ part , {b n , y n , z n } n , {x n } n =n , ∆t), P (y n |D, µ part , {b n , x n , z n } n , {y n } n =n , ∆t), and P (z n |D, µ part , {b n , x n , y n } n , {z n } n =n , ∆t) for x, y, and z coordinates, respectively. Bellow, we show in detail the calculation only for sampling x n , since for sampling y n and z n they are similar.
Overview To sample the trajectory x n , we rely on the factorization P (x n |D, µ part , {b n , y n , z n } n , {x n } n =n , ∆t) = P (x n K |D, µ part , {b n , y n , z n } n , {x n } n =n , ∆t) × P (x n K−1 |x n K , D, µ part , {b n , y n , z n } n , {x n } n =n , ∆t) × · · · (50) × P (x n 2 |x n 1 , D, µ part , {b n , y n , z n } n , {x n } n =n , ∆t) × P (x n 1 |D, µ part , {b n , y n , z n } n , {x n } n =n , ∆t) and, according to this factorization, we sample individual locations x n k sequentially x n K ∼ P (x n K |D, µ part , {b n , y n , z n } n , {x n } n =n , ∆t) (51) x n k ∼ P (x n k |x n k+1 , D, µ part , {b n , y n , z n } n , {x n } n =n , ∆t), Where, k = 1, . . . , K − 1. However, to be able to perform these steps, we first need to compute the involved probability distributions P (x n k |x n k+1 , D, µ part , {b n , y n , z n } n , {x n } n =n , ∆t). Below we describe a computationally efficient way to so that proceeds in a forward filtering and a backward sampling step that we present in detail.
Forward Filtering Before we start the sampling of the locations, we need to determine each one of the individual probability distributions that are needed in Eq. (52), P (x n k |x n k+1 , D, µ part , {b n , y n , z n } n , {x n } n =n , ∆t). To do this in a computationally tractable manner [13,19], we compute filter distributions P (x n k |D, µ part , {b n , y n , z n } n , {x n } n =n , {∆t k } k <k ). In our case, both dynamic (Eq. (43)) and observation (Eq. (44)) probability distributions provide equal probabilities for +x n k and −x n k . Therefore, the filter distribution consists of two modes symmetrically placed across the origin [50]. Accordingly, we compute an approximate bimodal symmetric filter of the form P x n k |D, µ part , {b n , y n , z n } n , {x n } n =n , {∆t k } k <k ≈ SymNormal (x n k ; m n k , c n k ) where SymNormal (m n k , c n k ) describes the symmetric normal distribution (see Table II). The filter, that is the values of m n k and c n k , is updated iteratively according to P x n k |D, µ part , {b n , y n , z n } n , {x n } n =n , {∆t k } k <k ∝ P ∆t k−1 |x n k , y n k , z n k , µ part , {b n , x n , y n , z n } n × x n k−1 P x n k−1 |D, µ part , {b n , y n , z n } n , {x n } n =n , {∆t k } k <k−1 P x n k |x n k−1 , D dx n k−1 .
To be able to carry out these computations efficiently, similar to [50], we work on an approximate model where the exponential emission equation, Eq. (44), is replaced by a normal one using the Wilson-Hiferty approximation as we discussed in section IV. Our approximate emission equation is T data (∆t k ) |{x n k , y n k , z n k , b n } n , µ part ∼ Normal T mean (µ k ), S(µ k ) 2 , k = 1, . . . , K − 1.
(55) where µ k is given in Eq. (45); while T data (∆t k ), T mean (µ k ) and S 2 (µ k ) are given by the Wilson-Hiferty approximation [101]. As explained in section IV, the approximation is given by Now, because of the specific choices of our problem (i.e. diffusive biomolecules, symmetric normal filter at the proceeding time, and normal likelihood), Eq. (54) reduces to (57) Finally, to obtain the values of m n k and c n k , we linearize the product in Eq. (57) as described below.
The density in Eq. (63) consists of two modes, one on the positive semi-axis of x and one on the negative semi-axis of x. Considering f (x n k ) = 3 µ(x n k )∆t k−1 ) and g(x n k ) = 1 3 log µ(x n k ) and linearizing them around the previous filter's mode, +m n k−1 , the fist mode of Eq. (63) is approximated by Similarly, linearizing f (x n k ) and g(x n k ) around −m n k−1 , the second mode of Eq. (63) is approximated by Combining both mode approximations, the density of Eq. (63), is approximated by where we use the following definitions .
The formula in Eq. (66) describes a probability density of a symmetric normal distribution. Equating this distribution with our filter, i.e., SymNormal (m n k , c n k ), we obtain These equations apply for k = 2, . . . , K and can be used to update the filter. To begin, we use Eq. (42) and set c n 1 = σ 2 xy and m n 1 = µ xy . Backward sampling Having computed the filter distributions above, we are able to sample the individual locations according to Eq. (51) and Eq. (52) by starting from x n K and moving backward towards x n 1 . In particular, by applying the Bayes' rule, each one of the individual distributions on Eq. (52) factorize as P (x n k |x n k+1 , D, µ part , {b n , y n , z n } n , {x n } n =n , ∆t) ∝ (71) P x n k |D, µ part , {b n , y n , z n } n , {x n } n =n , {∆t k } k <k × P x n k |x n k+1 , D .
In the above, the first term is given by the fil-ter distribution which is replaced by our approximate SymNormal(m n k , c n k ), and the second term is our motion model Normal (x n k , 2D∆t k−1 ), all of which are known at this stage. Therefore, the sampling distributions become

Sampling inactive biomolecule trajectories
When we update the trajectories of the active biomolecules, the next step is the update of the trajectories of the inactive ones.
To do this, we sample from the corresponding conditionals P (x n , y n , z n |D, µ part , {q n , b n } n , ∆t). Since the locations of inactive biomolecules are not associated with the observations in ∆t, these conditionals simplify to P (x n , y n , z n |D, {q n , b n } n ) which can be readily simulated jointly in the same manner as standard 3D Brownian motion [59].

Sampling the diffusion coefficient
We sample the diffusion coefficient from the conditional probability distribution P (D|µ part , {q n , b n , x n , y n , z n } n , ∆t).
Because of the specific dependencies of the variables in this formulation, e.g. Eq. (31) and Eq. (43), the conditional distribution simplifies to P (D|{x n , y n , z n } n , ∆t). Using Bays' rule, this distribution becomes where the parameters α D and β D are given by x n k − x n k−1 2 + y n k − y n k−1

Sampling biomolecule loads
For each biomolecule n we sample its load prior weight, q n , from the corresponding conditional distribution P (q n |µ part , D, {b n , x n , y n , z n } n , ∆t), which simplifies to P (q n |b n ). To do this we use Eq. (41) and Eq. (40). According to Bays' rule, the latter distribution becomes q n ∼ Beta(α , β ) where α = α N + b n and β = β N −1 N + 1 − b n . Once the weights q n or all biomolecules are updated from the above procedure, we update the loads b n by sampling from the corresponding conditional distribution P ({b n } n |D, µ part , {q n , x n , y n , z n } n , ∆t) using a Methropolis-Hasting algorithm [17,21]. For this, we use a proposal distribution with the form b n new ∼ Bernoulli(q n ).
With this proposal, the acceptance ratio becomes Sampling the particle photon emission rate In the last step, after updating the locations and loads of the biomolecules, we sample the particle photon emission rate from the corresponding conditional distribution P (µ part |D, {q n , b n , x n , y n , z n } n , ∆t). To sample this distribution, we also use a Metropolis-Hastings step. For this, we use proposals of the form where µ old part denotes the current sampled value. Using both Eqs. (44) and (45) In the above ratio, α part and β part are parameters of the prior and α is the parameter of the proposal.