Limits of accuracy for parameter estimation and localisation in Single-Molecule Microscopy via sequential Monte Carlo methods

Assessing the quality of parameter estimates for models describing the motion of single molecules in cellular environments is an important problem in fluorescence microscopy. We consider the fundamental data model, where molecules emit photons at random times and the photons arrive at random locations on the detector according to complex point spread functions (PSFs). The random, non-Gaussian PSF of the detection process and random trajectory of the molecule make inference challenging. Moreover, the presence of other nearby molecules causes further uncertainty in the origin of the measurements, which impacts the statistical precision of estimates. We quantify the limits of accuracy of model parameter estimates and separation distance between closely spaced molecules (known as the resolution problem) by computing the Cramer-Rao lower bound (CRLB), or equivalently the inverse of the Fisher information matrix (FIM), for the variance of estimates. This fundamental CRLB is crucial, as it provides a lower bound for more practical scenarios. While analytic expressions for the FIM can be derived for static molecules, the analytical tools to evaluate it for molecules whose trajectories follow SDEs are still mostly missing. We address this by presenting a general SMC based methodology for both parameter inference and computing the desired accuracy limits for non-static molecules and a non-Gaussian fundamental detection model. For the first time, we are able to estimate the FIM for stochastically moving molecules observed through the Airy and Born&Wolf PSF. This is achieved by estimating the score and observed information matrix via SMC. We sum up the outcome of our numerical work by summarising the qualitative behaviours for the accuracy limits as functions of e.g. collected photon count, molecule diffusion, etc. We also verify that we can recover known results from the static molecule case.


Introduction
In recent years, single-molecule microscopy has become a powerful tool in cell biology [62,61]. It has allowed significant insight to be gained into the behaviour of single molecules in cellular environments using fluorescence microscopy. Single-molecule fluorescence microscopy (see [63,45] for reviews) consists of using a suitable fluorophore to label the molecule(s) of interest, exciting said fluorophore with a specific light source and capturing the fluorescence or photons emitted by the molecule(s) through an optical microscope system onto a detector during a fixed acquisition time. Many biological applications rely on being able to accurately track and estimate parameters such as the location and/or movement of a single molecule, or the separation distance between two closely spaced molecules, observed via an optical microscope [62,44]. Indeed, when inferring the parameters relating to the state of single molecules, it is vital to be able to assess the accuracy of estimates. For instance, the interactions between multiple closely spaced molecules can often lead to a resolution problem, and the resolution limit of an optical microscope is crucial when trying to accurately estimate the separation distance between two molecules. In the past, Rayleigh's criterion [4] has been used to define the minimum distance between two point sources such that they can be distinguished in the image. However, Rayleigh's criterion ignores the statistical aspect of the separation distance estimation problem. For example, it doesn't account for the fact that each new observation (taking the form of a captured emitted photon) brings new information on the separation distance. In contrast, in estimation theory, the Cramér-Rao lower bound (CRLB) [14,59,30,15] establishes a lower bound on the variance of unbiased estimates, and is therefore often used as a benchmark for the quality of a given estimator. As a result, the CRLB plays an important part in experimental design for single-molecule microscopy [47,57]. For example, in [56,58], the authors present an improved microscope resolution measure in the form of the square root of the CRLB for the separation distance between two molecules, which is referred to as the limit of accuracy with which the separation distance between the two objects can be estimated based on the observed data. A particular advantage of this new resolution measure is that it predicts that increasing the photon count makes it possible to estimate a separation distance between two molecules that is shorter than Rayleigh's criterion.
In this paper, we consider the fundamental data model [47,57], which is crucial in that is provides accessible lower bounds for the limits of accuracy of more realistic practical models, where factors such as pixelisation and readout noise come into play and make inference more challenging. Indeed, the limits of accuracy derived for the fundamental model are often known as the fundamental limits of accuracy. In this model, the detection process of the emitted fluorescence presents its own challenges, as it is intrinsically random both in time and location. While many methods [7,6,8] have assumed that the arrival times of the photons on the detector were uniformly distributed, [47,57] suggest that the arrival times of photons follow a Poisson process. As for the arrival location of these photons on the detector, a wide range of measurement models exist − corresponding to the various types of detector. The typical measurement model used for an in-focus source is the Airy profile [66,11]. If the molecule is out of focus, 3D models are generally used instead, such as the Born and Wolf model [4]. Often, these models make parameter inference difficult, and researchers have often opted for a Gaussian approximation to these models, such as in [2,60,43]. However, [66] argue that in practice, assuming Gaussian distributed photon locations on the detector is not an accurate approximation of the underlying model.
While it is important to be able to accurately study the behaviours and interactions of single molecules within a cell, it is no easy task when those molecules have stochastic trajectories. The motion of an object in a cellular environment is affected by a multitude of deterministic, as well as random factors [5], and in many applications [66,6], the trajectories of single molecules are modelled by stochastic differential equations (SDEs) [51]. The CRLB is obtained by taking the inverse of the Fisher information matrix (FIM), and analytical expressions for the FIM, and thus the limit of accuracy (given by the square root of the CRLB) for the location of an in-focus static (or unmoving) molecule have been derived in [47,11]. Similar results for an outof-focus static molecule are available in [50], and analytical expressions have also been derived in the context of molecules with deterministic linear or circular trajectories in [67]. As for the resolution problem, it is addressed in [56,57] in a static molecule context and in [42] for two dynamic molecules with deterministic trajectories. However, when molecules have stochastic trajectories, the analytical tools to obtain the CRLB and tackle many of these problems are still for the most part missing. In this paper, we propose a numerical approach to address these problems.
In the context of stochastically moving molecules, [66] developed a method to obtain the FIM for a molecule whose trajectory is described by a linear SDE. For a 2D Gaussian approximation of the photon detection process, the authors take advantage of the Kalman filter formulae to obtain an analytical form for the FIM for a specific set of photon detection times. However, if the Airy profile is used instead, the computational cost of performing numerical integration becomes prohibitive for more than a single photon. Among other things, we build on [66] and provide effective methodological advances which enable the estimation of the FIM for the hyperparameters of models with Airy and Born and Wolf distributed photon locations.
In this paper, we develop an effective and general numerical framework to obtain sequential Monte Carlo (SMC) approximations of expectations of interest, including for stochastically moving molecules. The ability to approximate these expectations is important for estimating the score and observed information matrix (OIM) for the hyperparameters of interest, and can also be employed to obtain maximum likelihood (ML) estimates of said hyperparameters. Access to the score and/or OIM is vital in order to be able to estimate the FIM. To achieve this, the observation interval is first discretised and the problem reformulated as a discrete-time state space model, which takes into account the random arrival times of photons on the detector in the form of missing observations. Then, a particle filter is employed in conjunction with forward smoothing methods [17,54] to obtain particle approximations of the expectations of interest. Our work complements [1], in which the authors similarly employed time discretisation of the observation interval, but they did not attempt to estimate the CRLB for hyperparameters.
The numerical experiments in this paper consist first of applying the methodology to estimate the limit of accuracy for a single stochastically moving molecule with 2D Gaussian, Airy, and Born and Wolf photon detection models by using estimates of the score and OIM obtained by forward smoothing. This is repeated for various expected mean photon counts to verify that for molecules with stochastic trajectories, the limit of accuracy exhibits an inverse square root decay with respect to mean photon count, i.e. the uncertainty of the hyperparameter estimates decreases as the expected number of photons increases. This has already been proven for static molecules [49,50,56]. The methodology is also applied in the context of the optical microscope resolution problem to obtain estimates of the limit of accuracy for the mean separation distance between two closely spaced diffusing molecules. Thanks to our numerical approach, insights can be obtained into the generalisation to diffusing molecules of results proven in [58] on this resolution problem for two static molecules. For instance, in [47], it was shown that the limit of accuracy for the location of a static molecule has a linear relationship with the standard deviation of the photon detection profile. From our numerical results, we show that when molecules are diffusing, the appropriate relationship behaves qualitatively with the diffusion coefficient standard deviation in a similar way, i.e. it can be translated into additional observation uncertainty. The qualitative relationships observed through our numerical experiments for stochastically moving molecules are summarised in Table 1.
This paper is structured as follows. In Section 2, the model is presented, including the molecule trajectory, described by a stochastic differential equation (SDE), and the photon detection time and location processes. In Section 3, the model is formulated as a discrete-time state space model with a discretised observation interval. Then, Section 4 establishes the main parameter inference aims and methods, which consist of particle filtering and smoothing of additive functionals in order to estimate the score and OIM for hyperparameters, and methods to estimate the FIM from the score and OIM. Numerical experiments are run in Section 5 to first estimate the limit of accuracy for the drift and diffusion coefficients of the SDE for all photon detection profiles and then estimate the limit of accuracy for the separation distance between two dynamic molecules. Finally, Section 6 provides concluding remarks.

Model specification
For the purpose of this paper, a basic optical system is considered, also known in [66,11] as the fundamental data model. See Figure 1 for an overview of the optical system. Under the fundamental model, we assume that the photons are observed under ideal conditions, in which the detector Y = R 2 is non-pixelated. This model does not describe image data obtained from actual microscopy experiments the way more realistic, or practical models do. However, the fundamental model is crucial, in that it offers an obtainable lower bound to the CRLB of parameters of the more realistic practical model, which is much more difficult to obtain. In this section, the various aspects of the model are described. These include the true molecule trajectory, occurring in the object space, the photon detection locations in the image space, and the times at which photons arrive on the detector.
location of detected photon at time t Figure 1: Illustration of an optical microscope. At time t ≥ t 0 , the molecule is located at X(t) in the object space and might be moving along the object plane. If the molecule is out of focus, it will instead move along a plane parallel to the object plane but displaced along the optical axis. The molecule emits photons through the lens system into the image space and its image is acquired on the planar detector Y located on the image plane. The location of the detected photons at time t is denoted by Y (t).

Molecule trajectory
For notational simplicity, let X t := X(t) ∈ R d denote the true, d-dimensional location of the molecule at time t. Given hyperparameters θ, let f θ s,t (x t |x s ) denote the probability density function of X t given the previous location X s . Assume that the molecule trajectory (X t ) t 0 ≤t≤T follows a linear stochastic differential equation (SDE) where b(t, X t ) := b 0 + b(t)X t and σ(t, X t ) := σ(t) represent the drift and diffusion coefficients, respectively, b 0 is the zero order drift coefficient, and (dB t ) t 0 ≤t≤T is a Wiener process with E [dB t dB ⊺ t ] = I d×d . According to [36,29] the solution to the SDE in (2.1) at discrete time points t 0 < t 1 < . . . is given by where the fundamental matrix function Φ ∈ R d×d satisfies the following for all s, t, u ≥ t 0 the vector a(t i , t i+1 ) ∈ R d is given by and finally the process is a white noise sequence with mean zero and covariance Therefore, the transition density f θ t i+1 ,t i (x ′ |x) can be expressed as a Gaussian with mean (2.5) Example 2.1. Let the trajectory of a molecule be given by the following SDE where in the drift term b ∈ R, in the diffusion term σ > 0, and (dB t ) t 0 ≤t≤T is a Wiener process and let θ = (σ 2 , b). Assuming the time points t 0 , t 1 , . . . are equidistant, i.e. t i+1 − t i = ∆ for all i = 0, 1, . . ., let the fundamental matrix Φ ∆ := ϕ θ ∆ I d×d where ϕ θ ∆ ∈ R and the covariance matrix R ∆ := r θ ∆ I d×d where r θ ∆ > 0. Then, by solving (2.3) and plugging the result into (2.4), we obtain The initial distribution X t 0 ∼ N (x 0 , P 0 ) has covariance matrix P 0 = p 2 0 I d×d where p 0 ∈ R. In a 2D setting (i.e. d = 2), let the drift b = −10 s −1 , the diffusion σ 2 = 1 µm 2 /s and the initial covariance p 2 0 = 10 −2 µm 2 and mean x 0 = (4.4, 4.4) ⊺ µm. By simulating the molecule trajectory for the time interval [0, 0.1] seconds, we obtain the trajectory in Figure 2.

Photon detection locations
The true molecule trajectory cannot be observed directly. Instead, a fluorescence microscope is used: the molecule of interest is labelled using a suitable fluorophore, magnified through a lens system and the photons it emits arrive on a detector Y := R 2 for a fixed time period (see Figure 1). The arrival location of a photon on the detector is random, and using the typical approximation of the optical microscope from [32], it can be described as follows. Let Y ∈ Y denote the observed location of a detected photon. For an object located at (x 0,1 , x 0,2 , z 0 ) ∈ R 3 in the object space, its photon distribution profile [57] is given by the density where M ∈ R 2×2 is an invertible lateral magnification matrix and the image function q z 0 : R 2 → R describes the image of an object in the detector space when that object is located at (0, 0, z 0 ) in the object space. Note that the subscript θ is used in the left-hand side of (2.7) to include dependence on hyperparameters. Depending on the model considered and inference aims, the hyperparameter(s) of interest can be (x 0,1 , x 0,2 ) if the object is static and/or z 0 if an out-of-focus molecule is considered. Three types of image functions are considered. First of all, according to optical diffraction theory from [4], an in-focus point source (i.e. when z 0 = 0) will typically generate an image that follows the Airy profile, given by 2πnα λe where n α is the numerical aperture of the objective lens, λ e is the emission wavelength of the molecule and J 1 (·) represents the first order Bessel function of the first kind. Often, to simplify the problem, the 2D Gaussian approximation to the Airy profile has been used instead (see [12,65,69,64]): If the point source of interest is out of focus, then a 3D Born and Wolf model [4] is used instead: where z 0 ∈ R is the location of the object on the optical axis, n o is the refractive index of the objective lens immersion medium and J 0 (·) is the zero-th order Bessel function of the first kind. Note that the Airy profile is a special case of the Born and Wolf model. Indeed, if the object is in focus, then z 0 = 0 on the optical axis and (2.8) and (2.10) coincide.

Photon detection times
Just like the photon detection locations, the times at which the photons arrive on the detector Y are random. More specifically, in [66,11], the arrival of the photons on the detector, or photon detection process, can be modelled as a Poisson process. Let N (t) be the number of photons detected at time t ≥ t 0 for initial time t 0 ∈ R and let λ(t) be the photon detection rate, representing the rate at which the photons emitted by the object hit the detector at any given time t. For example, the detection rate of an object that has high photostability will simply be constant, while an exponentially decaying λ(t) can indicate that the object image is photobleaching, or fading over time. The arrival times of the photons on the detector Y are denoted t 1 , t 2 , . . . where t i denotes the arrival time of the i-th photon.

The observed data
Let n p = N (T ) − N (t 0 ) be the number of photons detected in the interval [t 0 , T ]. We have now established the two aspects of the data that can be observed in a basic optical system during this interval, namely the detection times t 1 , t 2 , . . . , t np of photons and the location of those detected photons Y t 1 , Y t 2 , . . . , Y tn p on the detector Y. Assume that, conditionally on the current object location X t i , the location of the i-th detected photon Y t i at time t i is independent of the previous locations and time points of the detected photons, i.e. for x t i ∈ X , where the density g θ is the photon distribution profile from (2.7). This is a reasonable assumption, as at any given time, processes such as photon emission and image formation only depend on the state of the emitting fluorescent molecule at that time, and not on any prior event.
Example 2.2. Let the trajectory of a molecule be given by the SDE in Example 2.1 and simulated using the same parameters and for the same time interval. Let Y be a non-pixelated detector. Then, let the photon detection rate be constant such that the mean number of photons is 500, and the photon distribution profile be given by (2.7), where the magnification matrix M = mI 2×2 with m = 100. The image functions for the Airy, 2D Gaussian and Born and Wolf profiles are given by (2.8), (2.9) and (2.10) respectively, where n α = 1.4, λ e = 0.52 µm, n o = 1.515, σ 2 a = 49 × 10 −4 µm 2 and z 0 = 1 µm. By simulating the detected photon locations based on the same molecule trajectory and according to these three models, we obtain the observed photon trajectories in Figure 3. Note that the parameters of the Airy and 2D Gaussian profiles have been chosen so that the Gaussian profile approximates the Airy profile.

The model as a state space model
It is possible to reformulate this model as a state space model that takes into account the random arrival times of photons. This is achieved by discretising the time interval during which photons are recorded.

Reformulation
For simplicity, we assume for the rest of this paper (unless stated otherwise) that the photon detection rate is constant, i.e. λ(t) = λ ∈ [0, 1] for all t ≥ t 0 . First of all, let X t = (x t,1 , x t,2 ) ∈ X where X := R 2 denotes the state of the molecule at time t ≥ t 0 , which includes its location x t,1:2 on the object plane. The location of the object on the optical axis is assumed to be constant and equal to the initial location parameter, i.e. z 0 for all t ≥ t 0 . The probability of recording an observation, i.e. detecting a photon in the small interval (t, t + h] is Let t i denote the arrival time of the i-th photon on a detector Y for i = 1, 2, . . . and Y t i ∈ Y be the location of the captured photon on the detector. Assume the location of a detected photon is distributed according to the probability density function where g θ is the photon distribution profile given in (2.7). The recorded data in the time interval [t 0 , T ], 0 ≤ t 0 < T comprises of n p observations with arrival times t 0 < t 1 < . . . < t np ≤ T and photon locations y t 1 , . . . , y tn p . The inference objective is to estimate the trajectory of the molecule (X t ) t 0 ≤t≤T given data (t i , y t i ), i = 1, . . . , n p . As seen in Subsection 2.1, the molecule evolves according to the probability density function where θ denotes the model parameters and f θ s,t for t > s ≥ t 0 is the homogeneous continuoustime Markov transition density given by the the Gaussian distribution in (2.5) for d = 2.

Non-constant photon detection rate
If the photon detection rate λ(t) is not assumed to be constant, then we redefine the state of an object at time 1]. The state at time t now includes the location of the molecule (x t,1 , x t,2 ) as well as the probability λ t of detecting a photon it emits. The Markov transition density p θ (x ′ |x) can be defined as follows where t i and t i+1 denote the arrival times of the i-th and (i + 1)-th photons, respectively, f θ is the Markov transition density for the object location defined above and l θ is the Markov transition density for the photon detection rate.

Time discretisation
Let (t 1 , y t 1 ), . . . , (t np , y tn p ) be a realisation of the photon arrival times and locations observed in the time interval [t 0 , T ]. Setting t 0 := 0 for convenience, we adopt a discrete time formulation where [0, T ] is divided into segments of length ∆. Let x k ∈ X denote the state of the molecule at time t = (k − 1) ∆ where k = 1, . . . , n for n := ⌈T /∆⌉. We assume the discretisation is fine enough so that an interval (k∆, k∆ + ∆] contains at most one arrival time t i . Then, for k = 1, . . . , n, let where y t i ∈ Y denotes the location of the i-th detected photon on the detector Y. The vector y k is assigned ∅ to indicate the absence of an observation in the corresponding interval. See Appendix A for details on why the time discretisation is a valid approximation of the Poisson process. If is the so called potential function. For k = 1, . . . , n, the probability density function of X k+1 given the previous state where Φ ∆ = Φ(k∆, k∆ + ∆) is now constant and similarly for a ∆ and R ∆ . To summarise, (X k ) ∞ k=1 and (Y k ) ∞ k=1 are X -and Y ∪ ∅-valued stochastic processes where the molecule trajectory in the object space (X k ) ∞ k=1 corresponds to the unobserved latent Markov process with Markov transition density f θ ∆ (x ′ |x) and initial density ν θ (x), and the photon detection locations (or lack of) (Y k ) ∞ k=1 represent the observed process with conditional density or potential function G θ k (x), i.e.
Note that if the object is static, so that the drift and diffusion coefficient in (2.1) are zero, the model simplifies from a state space model to a basic inference problem with independent observations. The observed process is still described by (3.2) but the location of the object x 0 becomes part of the hyperparameters.

Parameter inference 4.1 Inference aim
Now that we have formulated the problem in (3.1) and (3.2) as a state space model, the first aim is going to be to estimate the posterior probability density function of X 1:n := {X 1 , . . . , X n }, n ∈ N, given the observations Y 1:n , also known as the joint smoothing distribution, which is given by p θ (x 1:n |y 1:n ) = p θ (x 1:n , y 1:n ) p θ (y 1:n ) , where the numerator represents the joint density where ν θ (x 1 ) is the initial distribution of X 1 , and the denominator represents the marginal likelihood of the observed data Estimating p θ (x 1:n |y 1:n ) is what allows the molecule to be tracked and is done using a particle filter. The second aim is to obtain particle approximations of smoothed additive functionals, which in turn will allow for estimation of the score and OIM for of the hyperparameters θ, as well as other applications such as ML estimation of said hyperparameters via gradient ascent and Expectation-Maximization (EM). Finally, the third aim is to use the estimates of the score and OIM of the hyperparameters to obtain an approximation of their FIM.

Tracking the molecule using a particle filter
The particle approximation of the marginal posterior of X 1 , . . . , X n defined in (4.1) is given bŷ (i) n = 1 and δ v 0 (v) denotes the dirac delta mass located at v 0 . To obtain this particle approximation, we employ sequential Monte Carlo (SMC) methods in the form of a particle filter (see [9,23,26,13] for comprehensive reviews of SMC methods). There is flexibility in the specific choice of particle filter, but the general form they take follows three key steps, namely resample→propagate→weight. A generic particle filter is summarised in Algorithm 3.
The resampling step avoids weight degeneracy [24,39] and consists of drawing indices ι with probabilities corresponding to the normalised weights ω . The propagation and weighting steps consist of advancing the particle population (X and updating the importance weights as follows: wherew(x k−1 , x k ) is known as the incremental weight and is given bỹ .
, the particle filter becomes the well-known bootstrap filter, introduced in [33] and the computation of the incremental weights simplifies tow(x k ) = G θ t (x k ). Given weighted particle sample (X k−1 ) at step k, we denote an iteration of running the particle filter (steps 4-6 of Algorithm 3) as For this particular problem, we must also take into account the missing observations introduced by the time discretisation. Since a lack of observation does not bring any new information, it suffices to only run the particle filter at segments which contain an observation. A typical iteration of this approach is summarised in Algorithm 1. The interval counter is initialised at c 0 := 1 and counts the number of discrete intervals since (and including) the last observation. An example of particle filtering for stochastically moving molecules observed through the 2D Gaussian, Airy and Born and Wolf models is available in Example B.1.

Particle approximations of expectations of additive functionals
The second inference aim is to obtain estimates of the score and observed information matrix (OIM) for the hyperparameters θ. To achieve these aims, we make use of smoothed additive functionals. Assume that there exists a real-valued function S θ k , k ≥ 0 such that it is an additive functional given by where s θ 1 (x 0 , x 1 ) := s θ 1 (x 1 ) and s θ k k≥0 is a sequence of sufficient statistics which may depend on the value of the observations y 0:k . The main aim is to compute the posterior or smoothing expectation, given by If the model in question is linear and Gaussian or the state space X is finite, then the expectation S k (θ) can be computed exactly by recursion. However, this is not the case if the Airy or Born

Algorithm 1 Particle filter for SDE with missing observations
Input: weighted particle sample X and interval counter c k−1 at step k − 1.
1: if y k = ∅ then 2: Do not run the particle filter Run the particle filter with updated interval length, i.e.
and Wolf profiles are used to describe photon distribution. In this case, SMC methods can again be employed to approximate the expectation as followŝ where the weighted sample (X ) is a particle approximation of the joint smoothing distribution p θ (x 1:k |y 1:k ) obtained using a particle filter.
A simple way of estimating the smoothing expectation S n (θ) for a set of n observations y 1:n is to run the desired particle filter in a 'forward pass' through the whole data to obtain the particle approximation (X ) at the final step n, followed then by a 'backward smoothing' pass through the data, starting from the latest sample y n . This is the case of algorithms such as the fixed-lag smoother by [38,52,53], forward-filtering backward smoothing (FFBSm) by [25,35,37] and forward-filtering backward simulation (FFBSi) by [31]. However, if one wishes to avoid multiple passes through the data, it is also possible to take advantage of the form of the additive functional in (4.4) to estimate S k (θ) in an online or 'forward-only' fashion, as proposed in [17] and further developed in [54]. Introducing the auxiliary function the following recursion is then created: where T θ 0 := 0 and its particle approximation given the weighted sample (X (1:N ) k ) and previous state particle approximationT θ k−1 (X for all i ∈ {1, . . . , N }, and where Finally, using the recursion on the auxiliary function T θ k , the smoothing expectation in (4.5) can be rewritten as and its particle approximation isŜ This algorithm is known as Forward smoothing SMC (SMC-FS) and is summarised in the context of our experiments in Algorithm 2.

Algorithm 2 Forward smoothing SMC (SMC-FS)
Where (i) or (j) appears, the operation is performed for all i, j ∈ {1, . . . , N }. At k = 1, 1: Initialise the particle filter to obtain the weighted particle sample X if y k = ∅ then 6: c k := c k−1 + 1 7: else 8: Use the particle filter to update the weighted particle sample, i.e.

10:
Update the auxiliary function estimatê

Estimation of the score and observed information matrix (OIM)
The score and OIM have important applications to ML estimation, e.g. see [40,55]. They can also be instrumental in assessing the performance of such an estimator, either directly, as argued by [28], or as tools to estimate the FIM when the latter cannot be computed exactly, as we will see in this section. We aim to compute, recursively in time, the score vector G k (θ) := ∇ log p θ (y 1:k ) and OIM H k (θ) := −∇ 2 log p θ (y 1:k ) where p θ (y 1:k ) denotes the marginal likelihood at step 1 ≤ k ≤ n defined in (4.3), ∇ denotes the gradient and ∇ 2 the Hessian.

Establishing the sufficient statistics
First of all, assume that the regularity conditions allowing for differentiation and integration to be switched around in expressions are satisfied. Let us establish the Fisher and Louis identities for the score and OIM, respectively, from [9,23]: 11) and note that (4.10) and (4.11) can be rewritten as where the expectations are with respect to the density p(x k |y 1:k ), and the functions α θ k (x k ) := ∇ log p θ (x k , y 1:k ) and β θ k := ∇ 2 log p θ (x k , y 1:k ) are the additive functionals of interest. A recursion for α θ k and β θ k is straightforward to obtain, more details in [55]. For α θ k and β θ k , (4.6) becomes where the sufficient statistics are given by Finally, to approximate the score and OIM, adapt the particle approximation in (4.7) to the recursions in (4.12) and (4.13) to obtain the score estimatê In Example 4.1, we apply this framework to a possible application of the single-molecule tracking model. We focus for now on the case where the photon distribution is described by the Airy or 2D Gaussian profile.
Example 4.1. Let the trajectory of a molecule be given by the following SDE where in the drift term, b = 0, in the diffusion term, σ > 0, and (dB t ) t 0 ≤t≤T is a Wiener process. Let the photon detection process be described by the Airy or 2D Gaussian profile. Then, the parameters of interest are θ = (σ 2 , b). Recall from Subsection 3.2 and Example 2.1 that the solution to the SDE can be written as and since the potential function G k does not depend on θ in this case, it can be dropped from (4.12) and (4.13) and the components of the sufficient statistic s α k (x k−1 , x k ) for the additive functional α θ k are The components of the sufficient statistic s β k (x k−1 , x k ) for β θ k are given in Appendix C. Note that these derivatives can be evaluated for any value of ∆, and it is therefore possible to adapt them in order to only compute sufficient statistics when an observation is recorded as in Algorithm 1. This is reflected in Algorithm 2.

Estimating the Fisher information matrix (FIM)
The Fisher information matrix (FIM) is widely used in estimation problems as an indicator of the performance of a given estimator. Indeed, it is a key element of the Cramér-Rao inequality, or Cramér-Rao Lower Bound (CRLB) derived by [14,59,30,15], which states that for an unbiased estimateθ of the parameter θ, its covariance has lower bound where given matrices A and B, the inequality A B indicates that A − B is a positive semidefinite matrix, and I n (θ) denotes the FIM in a random sample Y 1 , . . . , Y n of size n [16], defined as where the second equality is proven in [27]. When the expectations in (4.15) and (4.16) are intractable − which is the case when the Airy profile is used to describe the photon detection locations in the single-molecule tracking model − there are several ways one can go about estimating the FIM.

Estimating the FIM for a single large sample using the OIM
Firstly, note that from (4.16), the relationship between the FIM and OIM is simply where H n (θ) = −∇ 2 log p θ (y 1:n ) denotes the OIM. Then, for a general state space model, in [3], it was proven that under mild assumptions, where I(θ) is the asymptotic FIM. See [34] for the corresponding result for multiple targets. So for a large enough sample size n, i.e. if the interval during which the molecule(s) of interest are observed is long enough, the OIM and FIM can be used interchangeably, i.e. for n ≫ 1, See Figure 4 for an illustration. Therefore, the first way of estimating the asymptotic FIM in the single-molecule tracking model is simply to obtain the OIM for a large sample size. For more details on the OIM as an estimate of the FIM, see [22].

Estimating the FIM using the mean outer product of the score
If the molecule(s) of interest are only observed for a short interval, then the size n of the sample of interest is not large enough to estimate the FIM using the OIM. It is then also possible to instead obtain a particle approximation of the expectation in (4.15) using the score as follows: generate D datasets y where for d = 1, . . . , D, the vector G

Estimating the FIM using the mean OIM
When multiple datasets are available, the OIM can also similarly be averaged over D datasets to estimate the FIM as follows:Î This third approach is the Monte Carlo estimator of the expectation in (4.17), and can be seen as averaging the first estimation method in (4.18). Now that the various methods for estimating the FIM have been established, it can be used in an experimental design setting to plan experiments with the aim of returning the most accurate parameter estimates. See Appendix D for details on how ML estimates can similarly be obtained via EM and gradient ascent methods with the use of smoothed additive functionals and SMC-FS.
Example 4.2. To verify these approaches to estimate the FIM, consider the straightforward special case of estimating the FIM for the location x 0 = (x 0,1 , x 0,2 ) parameters of a static molecule emitting photons at a constant rate. In [47,11], the analytical expression for the FIM is derived for the Airy profile, and its diagonal components given observations y 1:n are given by λe , N phot denotes the expected photon count, and I Airy (x 0,i ) denotes the (i, i)th element of the FIM, corresponding to parameter component x i , for the Airy profile. As mentioned in Subsection 3.2, having a static molecule simplifies the model. Since we have independent data, the true values of score G and OIM H can be derived as follows. Given a set of observations y 1:n distributed according to the Airy profile, , Using the same settings as in Example 2.2, we simulate D l = 40 'large' datasets according to the Airy profile consisting of observations obtained during the interval [0, 0.2] seconds. We also simulate D s = 400 'short' datasets consisting of observations obtained during the shorter interval [0, 0.02] seconds. The score and OIM are obtained for all datasets and the FIM for the large and short datasets is estimated in three ways: (i) using the OIM returned from a single dataset selected at random (4.18), (ii) using the mean outer product of the score (4.19) over all datasets and (iii) using the mean OIM across all datasets (4.20). Finally, the square root of the CRLB, also known as the (fundamental) limit of accuracy and defined as δ ϑ = CRLB ϑ for parameter ϑ is obtained. This is repeated for various expected photon counts in order to compare the evolution of the estimated limit of accuracy as the expected number of photons increases to the true limit of accuracy obtained using the true FIM. In Figure 4, it is apparent that, apart from very low photon counts, all approaches are able to return accurate estimates of the limit of accuracy. Comparing Figure 4a and Figure 4b, it also becomes apparent that for long datasets, approach (i) is slightly more accurate than (ii), and the opposite is true for short datasets. In both cases, approach (iii) is the most accurate. Similar results can be obtained for the 2D Gaussian profile and Born and Wolf model, as analytical expressions for the FIM are also available for a static object [49,50].

Numerical experiments
In this section, we apply the particle smoother known as SMC-FS to estimate the FIM, and thus the limit of accuracy, for various parameters in the context of one or multiple moving molecules with stochastic trajectories. Experiments are first run with photon detection locations described by the Gaussian and Airy profiles, and then the Born and Wolf model, where an additional hyperparameter, namely the optical axis location, must be considered as well. The methodology is then applied to the optical microscope resolution problem, where the limit of accuracy for the mean separation distance between two closely spaced diffusing molecules is assessed.
Unless stated otherwise, the FIM for any given settings is estimated according to (4.20), i.e. by generating several datasets according to the same settings, estimating the OIM for each  seconds and the intervals are discretised. The photon detection locations are generated according to the Airy profile, with parameters as in Example 2.2. The true limit of accuracy (blue solid line) is also computed as it is available analytically [49]. Estimates of the limit of accuracy based on a single dataset (approach (i)) are more accurate when the dataset is long, while taking the mean outer product of the score over all datasets (approach (ii)) yields more accurate estimates for a large number of short datasets. Approach (iii) provides a good balance between the two. In general, estimates of the limit of accuracy are relatively poor for very low mean photon counts but quickly improve as it increases. dataset using the SMC-FS algorithm (Algorithm 2) and averaging the estimated OIM over all generated datasets. The particle filter employed in the SMC-FS algorithm is the bootstrap filter. A large number of datasets is needed to minimise Monte Carlo error in FIM estimates, so to speed up computations we adopt a distributed computing approach: the datasets and repeat runs of the SMC-FS algorithm to estimate the OIMs are divided evenly among 60 to 64 CPUs and run in parallel. We note that for our methodology, access to a large number of CPUs is beneficial to both the accuracy of estimates and the speed at which they can be obtained. The wall clock speed of the SMC-FS algorithm is also affected by the mean photon count N phot considered. Indeed, as described in Algorithm 2, the filtering and smoothing steps only occur in segments where a photon is observed, so the expected complexity of a full run of the SMC-FS algorithm is O (N phot N 2 ) where N is the size of the SMC particle population (generally N = 500).

Limit of accuracy of drift and diffusion coefficients for the Gaussian and Airy profiles
Consider a molecule with trajectory described by the SDE in Example 2.1. In [66], the authors took advantage of the Kalman filter formulae to evaluate the FIM for the diffusion (σ 2 ) and drift (b) coefficients. However, it was only possible to obtain an analytic solution for a particular set of detection times t 1 , t 2 , . . . and for the 2D Gaussian photon distribution profile. Otherwise, the computational cost of performing numerical integration was too high for more than one photon. In our particle filtering framework, it is also possible to take advantage of the Kalman filter formulae when considering the 2D Gaussian model in order to obtain an accurate approximation of the true score and OIM by numerical differentiation, and for any detection times schedule. An estimate of the FIM is therefore obtained by evaluating the true OIM for 3000 datasets and taking their mean, as described in Subsection 4.5. The molecule trajectories are simulated for [0, 0.2] seconds, with diffusion coefficient σ 2 = 1 µm 2 /s, drift coefficient b = −10 s −1 , and initial location Gaussian distributed with mean x 0 = (5.5, 5.5) ⊺ µm and covariance P 0 = 10 −2 I 2×2 µm 2 . The observations for the first experiment are generated according to the 2D Gaussian profile (2.9) with parameters as in Example 2.2. It is not possible to employ the Kalman filter formulae for the Airy and Born and Wolf profiles, and we must resort to using the SMC-FS algorithm instead. First of all, to evaluate the performance of the SMC-FS algorithm, the algorithm is employed using N = 500 particles to estimate the score and OIM for the same 3000 2D Gaussian profile datasets, and we similarly take the mean OIM over all datasets to estimate the FIM. Next, we move on to the Airy profile, for which it was too computationally costly in [66] to obtain the FIM for more than a single photon. We estimate the OIM for the diffusion and drift coefficients using the SMC-FS algorithm with N = 500 particles for 2040 datasets, where the molecule trajectories are simulated using the same parameters as for the 2D Gaussian profile, and the observations are generated according to the Airy profile (2.8) with parameters as in Example 2.2. This is repeated for various mean photon counts ranging from 10 to 1250. Then, the limit of accuracy estimate, denotedδ ϑ for hyperparameter ϑ, is computed, and the results are displayed in Figure 5.
Both Figure 5a and Figure 5b display an inverse square root decay of the limit of accuracy with respect to the mean photon count. This is consistent with the results for a static molecule from Example 4.2, and means that the quality of diffusion and drift estimates improves as the mean photon count increases. In addition to that, comparing the limit of accuracy obtained from the estimated and true OIM for the 2D Gaussian profile in Figure 5a indicates that the SMC-FS algorithm is able to return accurate estimates of the score and FIM for a stochastically moving molecule. Indeed, apart from a very slight discrepancy for very low photon counts for the drift coefficient, the estimates of the limit of accuracy are almost indistinguishable.

Limit of accuracy of drift, diffusion and optical axis location for the Born and Wolf model
When the molecule is out of focus, which means the photon detection locations are distributed according to the Born and Wolf model (2.10), the FIM components for the diffusion and drift coefficients can be obtained as for the Airy and Gaussian profiles. However, a new hyperparameter must be considered, namely the optical axis location, denoted z 0 . While previously, differentiating the log potential function was not needed, the vector of hyperparameters is now θ = (σ 2 , b, z 0 ), and G θ k (x k ) depends on z 0 for k = 1, . . . , n. While it requires numerical integration, differentiating log q z 0 (x 1 , x 2 ) for a given x = (x 1 , x 2 ) ∈ R 2 with respect to z 0 is not impossible. For notational simplicity, let α := 2πnα λe , r := x 2 1 + x 2 2 and W := πn 2 α noλe and rewrite (2.10) as The first derivative was derived in [50] and is given by The second derivative with respect to z 0 is given by The potential function only depends on z 0 , so any cross terms in the FIM and OIM between z 0 and either σ 2 or b will be zero. The OIM is estimated for the diffusion (σ 2 ), drift (b) coefficients and optical axis location (z 0 ) using the SMC-FS algorithm with 500 particles for 2040 datasets, where the molecule trajectories are simulated using the same parameters as for the 2D Gaussian and Airy profiles, and the observations are generated according to the Born and Wolf model with parameters as in Example 2.2 (i.e. z 0 = 1 µm). Then, the limit of accuracy for mean photon counts ranging from 10 to 1250 is computed, and the results are displayed in Figure 6. Once again, there is an inverse square root decay of the limit of accuracy with respect to the mean photon count for all hyperparameters considered. E timated limit of accuracy (Born and Wolf modelF igure 6: Evolution of the estimated limit accuracy for mean photon counts ranging from 10 to 1250. The limit of accuracy is estimated for the diffusion (σ 2 ), drift (b) coefficients and optical axis location (z 0 ) for an out-of-focus molecule with stochastic trajectory. The estimates are obtained by taking the square root of the inverse of the FIM, obtained by estimating the OIM using the SMC-FS algorithm with 500 particles for 2040 simulated datasets. To generate each dataset, the molecule trajectories are simulated according to the SDE in Example 2.1 for the interval [0, 0.2] seconds, with σ 2 = 1 µm 2 /s, b = −10 s −1 , and initial location Gaussian distributed with mean x 0 = (5.5, 5.5) ⊺ µm and covariance P 0 = 10 −2 I 2×2 µm 2 . The observations are generated according to the Born and Wolf model with parameters as in Example 2.2, where z 0 = 1 µm. An inverse square root curve (orange) is fitted to the resulting estimated limits of accuracy for comparison.

Limit of accuracy of the separation distance between two molecules for the Airy profile
Being able to estimate the distance of separation between two closely spaced molecules is an important aspect of single-molecule microscopy. In the past, Rayleigh's criterion [4] has been used to define the minimum distance between two point sources such that they can be distinguished in the image. However, [56] treated the separation distance problem as a statistical estimation task and derived the CRLB (or inverse of the FIM) for the mean square error of the separation distance estimate. It was shown that Rayleigh's minimum distance can be surpassed by capturing more photons, e.g. by observing the molecules for a longer period. So far, the limit of accuracy has only been derived for static molecules. In this experiment, we apply our methodology to estimate the limit of accuracy for the locations and separation distance between two molecules that are not static, but diffusing independently at their respective stationary distributions, as illustrated in Figure 7. Let X t = (X t,1 , X t,2 ) ⊺ be the cartesian coordinates of a moving molecule with stationary distribution N (x 0 , σ 2 I 2×2 ) for all t, where x 0 is referred to as the mean state. The continuous time dynamics are given by From Subsection 2.1, it is straightforward to establish the solution to this SDE, which yields the conditional pdf f x 0 ∆ of X k+1 at the (k + 1)-th discrete segment, given X k = x at the k-th segment, as In this experiment, consider two independently diffusing molecules whose states are (X t , V t ), where X t is the state of the first molecule and V t is the state of the second. Assume that the initial state of each molecule is the same as its corresponding mean state, i.e. (x 0 , v 0 ) =: θ = (θ 1 , θ 2 , θ 3 , θ 4 ) ⊺ , and is non-random but unknown and to be estimated. The conditional probability density function of (X k+1 , V k+1 ) given ( denote an estimate of θ given observations Y 1:n . Recall that the FIM, denoted I n (θ), is given by For any scalar-valued function D(θ) ∈ R, we can estimate D(θ) using D(θ) whereθ is the estimate of θ. Assuming the estimate is unbiased, we have the following CRLB for the function D, where ∇D(θ) := (∂D/∂θ 1 , . . . , ∂D/∂θ 4 ) ⊺ . For example, to estimate the separation between the two molecules we have D(θ) = (θ 1 − θ 3 ) 2 + (θ 2 − θ 4 ) 2 , and as a result This experiment is essentially the dynamic version of the experiments on estimating the separation of two static molecules by [58]. The key difference here is that the molecules are diffusing. The observations Y 1:n are generated as in [58], i.e. according to the following mixture where g is the photon distribution profile given in (2.7) and λ θ = λ x + λ v . The measurement model considered in this experiment is the Airy profile (2.8), but it is straightforward to also apply the methodology to the 2D Gaussian profile and Born and Wolf model. In the first part of the experiment, we analytically replicate results similar to those in [56,58] for two static molecules, then observe how introducing diffusion affects the progression of the limit of accuracy δ D(θ) for the separation distance (obtained using (5.2)), as this separation distance between the two molecules increases. We set λ x = λ v = λ for simplicity. Evaluating δ D(θ) analytically for the static case is performed as in [56], with a mean photon count, denoted N phot , of 3000. For the dynamic case, the molecules are observed during an interval of [0, 1] seconds with the same mean photon count, and for diffusion coefficients σ 2 varying from 5×10 −3 to 10 −4 µm 2 /s. The parameters of the Airy profile are unchanged (i.e. n α = 1.4, λ e = 0.52 µm), as is the lateral magnification matrix (M = 100I 2×2 ). The estimate of the limit of accuracy is obtained by estimating the OIM for the mean locations x 0 and v 0 via the SMC-FS algorithm for 640 to 1024 datasets then applying (5.2). The resulting estimated limits of accuracyδ D(θ) are given in Figure 8a. The second part of the experiment involves similarly estimating the limits of accuracy δ D(θ) for various separation distances, but this time the diffusion coefficient remains fixed, i.e. σ 2 = 10 −4 µm 2 /s, and the mean photon count N phot is set to vary between 100 and 4500. The resulting estimated limits of accuracy are given in Figure 8b.
As the separation distance D(θ) gets closer to zero, the limit of accuracy increases, indicating that estimates would become less accurate. Additionally, an inverse square root curve was fit to each set of estimated limits of accuracy in Figure 8a and Figure 8b. This is consistent with results in [56] that showed an inverse square root relationship between separation distance and δ static D(θ) for two static molecules, and indicates that these results can be generalised to dynamic molecules. Additionally, in [47], it is suggested that the limit of accuracy for the location of a static molecule, known as localisation accuracy and denoted δ loc , is of the form σa √ N phot where N phot is the mean photon count and σ a the standard deviation of the photon detection profile. The interpretation for this is that the quality of location estimates of a single static molecule deteriorates as the measurement uncertainty σ a increases. Now in [58], it is proven that the limit of accuracy for the separation distance between two molecules δ static D(θ) and the localisation accuracy for each of these molecules are related as follows: where δ sta,loc x 0 and δ sta,loc v 0 denote the localisation accuracy for the first and second (static) molecule observed independently with cumulative mean photon count N phot , respectively. Even though the separation distance goes to infinity, its limit of accuracy δ D(θ) remains finite. This means that as D(θ) → ∞, evaluating the limit of accuracy for the separation distance between two (static) molecules becomes equivalent to two independent localisation accuracy problems. It also means that δ static D(θ) is similarly affected by measurement uncertainty σ a as are the localisation accuracies for the two molecules.
In this experiment, the introduction of diffusion negatively affects the improvement in estimation accuracy as the mean distance of separation between the two molecules increases. This is evidenced in Figure 8a by the more and more slowly decaying limits of accuracy as the value of σ 2 increases, and in Figure 9a by the linearly increasing trend inδ D(θ) for all values of D(θ) as σ increases. As a result, the diffusion coefficient in the dynamic model can be translated into additional observation uncertainty which affects δ D(θ) in a way reminiscent of how σ a affects δ static D(θ) . More generally, from our numerical results, we observe the relationship for our dynamic application behaves qualitatively as where, as above, σ a is the standard deviation of the photon detection process, also known as measurement uncertainty. We now investigate the relationship between δ D(θ) and the dynamic equivalent to the localisation accuracy, namely the limit of accuracy for the mean locations x 0 and v 0 of each individual, stochastically moving molecule, denoted δ sto,loc of each individual object with various (cumulative) mean photon counts N phot is illustrated as horizontal lines in Figure 8b, which appear to act as asymptotes, thus indicating that the relationship in (5.4) can be generalised to stochastically moving molecules. While the introduction of diffusion leads to less accurate estimates, Figure 8b displays a stronger decay in the limit of accuracy as the mean photon count N phot increases, thus indicating that increasing the mean photon count N phot improves those estimates, as was the case for static molecules in [56]. This is reinforced in Figure 9b, which also suggests that the relationship between δ D(θ) and N phot is an inverse square root. This is also a generalisation to the dynamic case of results in [56] which showed an inverse square root relationship between δ static D(θ) and N phot for two static molecules. In summary, this experiment employs the numerical framework developed in this paper for estimating the FIM of parameters of dynamic molecules using SMC in order to gain insights into generalising results from [57,58] about the effects of separation distance, measurement uncertainty and mean photon count to a context in which the two molecules considered follow a SDE rather than being static. These effects, as well at that of the measurement uncertainty, can all be observed by applying our methodology and are summarised in Table 1. We also summarise in Table 1 the results on the limits of accuracy for the drift and diffusion coefficients of a single stochastically moving molecule observed via the 2D Gaussian, Airy profiles and the Born and Wolf model from Subsection 5.1 and Subsection 5.2. Note that the limits of accuracy for the mean locations of each molecule, denoted δ θ := (δ θ 1 , δ θ 2 , δ θ 3 , δ θ 4 ) ⊺ , can also be estimated as part of our methodology (as their FIM is required for (5.2)) and return similar relationships with separation distance, mean photon count, diffusion coefficient and measurement uncertainty as δ D(θ) (not reported here).

Name
Parameter Relationship Reference [56,47] δ D(θ) , δ θ mean photon count  Table 1: Summary of the qualitative relationships between various parameters and the limits of accuracy δ θ := (δ θ1 , δ θ2 , δ θ3 , δ θ4 ) ⊺ and δ D(θ) for the mean locations θ = (x 0 , v 0 ) = (θ 1 , θ 2 , θ 3 , θ 4 ) ⊺ and separation distance D(θ), respectively, of two stochastically diffusing molecules observed simultaneously. Also included in the table is the relationship between mean photon count and the limits of accuracy for the hyperparameters of the SDE trajectory (drift b and diffusion σ 2 coefficients) and photon detection process (optical axis location z 0 ) of a single molecule. Note that when we increase the mean photon count N phot , the observation interval length remains fixed.   Figure 8: Comparison of the evolution of the estimated limit accuracy for separation distances ranging from 20 × 10 −3 to 2 µm for various (a) diffusion coefficient (σ 2 ) values (b) mean photon counts (N phot ). The limit of accuracy for the separation distance δ D(θ) , where θ = (x 0 , v 0 ) = (θ 1 , θ 2 , θ 3 , θ 4 ), is estimated using the square root of the CRLB obtained using (5.2) (in the dynamic case) and evaluated using analytical results from [56] (in the static case). The estimates of I n (θ) in (5.2) are obtained by running the SMC-FS algorithm with 500 particles for 640 to 1024 simulated datasets. For the dynamic case, the molecule trajectories are initialised at their respective mean locations x 0 and v 0 and each is propagated according to its corresponding SDE (5.1) during an interval of [0, 1] seconds with (a) fixed and mean photon count N phot = 3000 (b) fixed diffusion coefficient σ 2 = 10 −4 µm 2 /s. The observations are generated according to a mixture of Airy profiles (5.3) with parameters as in Example 2.2. This is repeated for (a) σ 2 varying from 10 −3 to 10 −4 µm 2 /s (b) N phot varying from 100 to 4500. Finally, an inverse square root curve is fitted to each of the resulting sets of estimated limits of accuracy for comparison purposes. Note that the pink set of estimates and their corresponding solid fitted curve in (a) coincide with those in (b). In (b), the horizontal lines correspond to the equivalent mean photon counts and represent the distances H sto N phot between the limits of accuracyδ sto,loc x0 andδ sto,loc v0 for the mean locations x 0 and v 0 of each individual object, estimated independently for each molecule using the SMC-FS algorithm. Note that any variation in estimates for low separation distances is due to Monte Carlo error, and can be reduced by increasing the number of simulated datasets.  Figure 9: Evolution of the estimated limit of accuracy for the separation distance δ D(θ) (obtained using (5.2)) between two stochastically moving molecules observed simultaneously for (a) σ ranging from √ 10 −4 to √ 5 × 10 −3 µm s −1/2 (b) N phot ranging from 100 to 4500. Estimates are obtained through the same algorithm and parameters as in (a) Figure 8a (b) Figure 8b, with separation distances ranging from 20 × 10 −3 to 2 µm. In (b), inverse square root curves are fitted to the resulting estimatesδ D(θ) for comparison.

Conclusion
In this paper, we introduced an SMC approach to performing parameter inference when tracking a molecule with stochastic trajectory for a fixed time interval. The three main aspects of this fundamental model in single-molecule microscopy were the true location of the molecule in the object space, which follows a linear SDE, the Poisson distributed arrival process of the photons it emits on the detector in the image space, and the arrival location of those photons on the detector, which follows either a 2D Gaussian, Airy profile, or Born and Wolf model.
First of all, we discretised the time interval in order to formulate the problem as a discretetime state space model, in which all states are equally spaced in time, but a number of observations are marked as missing. From this, SMC methods were applied for parameter inference. A general forward smoothing algorithm was employed to estimate the score and OIM of the data regardless of the distribution of the photon locations. For the first time, this allowed for the estimation of the FIM and hence the limit of accuracy (square root of the CRLB), which could not be done before for the Airy profile and Born and Wolf model, and could only be achieved analytically for a specific set of photon detection times for the 2D Gaussian profile. The methodology was subsequently applied to characterise the precision limits for estimating the separation distance between two moving molecules, thus providing new insights into results for the static case from [58]. The outcome of our numerical work was summarised in Table 1, which sums up the qualitative behaviours of the limits of accuracy as functions of the mean photon count, separation distance, diffusion coefficient and measurement uncertainty.
Although for the first time a method has been described to estimate the limit of accuracy for the hyperparameters of dynamic single molecules with non-uniform observation times and complex measurement models, such as the Airy profile or Born and Wolf model, there is scope to use the techniques developed here to provide a wider range of more computationally efficient approaches. Indeed, an advantage of the straightforward state space model formulation of the problem is access to the vast range of filtering and smoothing algorithms available. While we employed forward smoothing, any kind of particle smoothing algorithm would be suitable, and indeed, the SMC-FS algorithm of [17] employed for forward smoothing, even though it mitigates issues related to path degeneracy, is of O(N 2 ) complexity. For example, the PaRIS algorithm of [54] can reduce the complexity of the algorithm to linear.

A Validity of the time discretisation
Given a realisation of the observations (t 1 , y t 1 ), . . . , (t n , y tn ) observed in the time interval [t 0 , T ], we adopt a discrete time formulation in our methodology where [t 0 , T ] is divided into segments of length ∆. We assume the discretisation is fine enough so that an interval (k∆, k∆ + ∆] contains at most one arrival time t i . We now prove in Proposition A.1 that this discretisation is a valid approximation of the homogeneous Poisson process which is used in [47,48] to describe photon detection. Proposition A.1. Let the photon detection process {N (t)} t≥t 0 be a homogeneous Poisson process with photon detection rate λ > 0. The probability of observing k photons during the time interval [t, t + h] for t ≥ t 0 and h > 0 is Discretise the interval [t, t + h] into segments of length ∆, so that a single segment contains at most one arrival time. Then as ∆ → 0, the probability of observing k is also given by (A.1).
Proof. Discretising the interval into segments of length ∆ as such, we now have for this interval a Binomial random variable M ∆ ∼ Binomial( h ∆ , λ∆) with h ∆ trials, with probability of success (i.e. a photon is observed) λ∆ and probability of failure (i.e. no photon is observed) 1 − λ∆. The probability observing k photons in the interval [t, t + h] is Taking the limit as ∆ → 0, we have Employing the following property of the ceiling function Finally, employing the following results This paper mainly considers the situation in which λ is a scalar, but this proof can be generalised to the situation where the Poisson process is inhomogeneous. As suggested in Proposition A.1, the approximation of the Poisson process becomes increasingly more accurate as the discrete segment length ∆ becomes smaller.

B Particle filtering in single-molecule microscopy
Given our reformulation of the fundamental model as a discrete state space model, a particle filter, summarised in Algorithm 3, can be applied to track the state of stochastically moving particles. There are several approaches to resampling, studied in [21,19,18]. In this paper, we refer to the resampling step of algorithms as An example of particle filtering for for stochastically moving molecules observed through the 2D Gaussian, Airy and Born and Wolf models is available in Example B.1.
Example B.1. Let the trajectory of a molecule be given by the SDE in Example 2.1 and simulated three times (one for each measurement model) using the same parameters and for the same time interval. Observations are generated as per in Example 2.2 for the 2D Gaussian, Airy profiles and Born and Wolf model and the molecules are tracked using the bootstrap filter. The resulting estimated trajectories for each measurement model are given in Figure 10.

C Sufficient statistic for estimating the OIM by forward smoothing
In Example 4.1, recall that the molecule trajectory is described by the following SDE in ddimensional space dX t = bI d×d X t dt + √ 2σdB t ,
Given weighted particle sample X (Propagate) Sample X , where η k is the proposal density.

6:
(Weight) Compute the incremental weights then update and normalise the importance weights to obtain ω Example D.1. Building on Example 4.1, note that given the model specification in (4.14), it is impossible to compute the maximum of Q(θ, θ i ) for the parameter b = 0 directly. However, as seen previously, the equation can also be written such that we simply have where the auxiliary parameters are given by ϕ θ := e ∆b and r θ := σ 2 b e 2∆b − 1 .
It is straightforward to maximise Q(θ, θ i ) for the auxiliary parameters ϕ θ and r θ as follows: let {S l,k (x 1:k )} 3 l=1 denote the additive functionals of interest at time k and {s l,k (x k−1 , x k )} 3 l=1 their corresponding sufficient statistics. Luckily, the sufficient statistics are easily obtained, since for the Gaussian and Airy profiles, the likelihood G k does not depend on θ: The maximisation function is given by Finally, to obtain maximum likelihood estimates for b and σ 2 , simply use the following transformation: b = ∆ −1 log ϕ θ and σ 2 = r θ log ϕ θ ∆(ϕ 2 θ − 1) .
Note that when dealing with measurements distributed according to the Born and Wolf model, we must also estimate the optical axis location parameter z 0 , which is done via gradient ascent.  Figure 11: Estimates of the diffusion (σ 2 ) and drift (b) coefficient over 150 EM iterations or passes through the data. The blue and red dashed lines represent the true parameter values σ 2 = 1 µm 2 /s and b = −10 s −1 , respectively. The red and blue solid lines and bands correspond to the mean estimates and their corresponding 95% confidence intervals over 50 datasets generated during the time interval [0, 0.2] seconds, with initial location x 0 = (5.5, 5.5) ⊺ µm and a mean photon count of 1000. The observations were generated according to the Airy profile with parameters as in Example 2.2 and the sufficient statistics were estimated using the PaRIS algorithm [54].