Optimal estimation of drift and diffusion coefficients in the presence of static localization error

We consider the inference of the drift velocity and the diffusion coefficient of a particle undergoing a directed random walk in the presence of static localization error. A weighted least-squares fit to mean-square displacement (MSD) data is used to infer the parameters of the assumed drift-diffusion model. For experiments which cannot be repeated we show that the quality of the inferred parameters depends on the number of MSD points used in the fitting. An optimal number of fitting points popt is shown to exist which depends on the time interval between frames Δt and the unknown parameters. We therefore also present a simple iterative algorithm which converges rapidly toward popt. For repeatable experiments the quality depends crucially on the measurement time interval over which measurements are made, reflecting the different timescales associated with drift and diffusion. An optimal measurement time interval Topt exists, which depends on the number of measurement points and the unknown parameters, and so again we present an iterative algorithm which converges quickly toward Topt and is shown to be robust to initial parameter guesses.

(2) 2 Derivation of the variance of the MSD The calculation of the variance of the MSD is trickier and uses properties of Weiner processes. First of all, we calculate the variance of the MSD for a single particle, denoted by σ 2(S) n . Then, due to the independence of the N S trajectories, the variance of the MSD for an ensemble of particles is given by By denoting |x i+n − x i | 2 , n = 1, . . . , N, as the overlapping MSD for a single particle, then the variance of the MSD is defined as Letting K = N + 1 − n be the number of samples of squared displacements of length n∆t, then, due to the correlation between overlapping displacements, we have that where Var(|X t | 2 |)(n∆t) is the variance of the squared displacement at time point n∆t and is the covariance between squared displacements. The first term in (3) is known and so in order to calculate the variance of the MSD we need only to calculate the covariance of the squared displacements. For this we have to consider the different ways the displacements |x i+n −x i | 2 and |x j+n − x j | 2 can overlap. This can be split into two cases as shown in Figure 1, where R 1 , R 2 and R 3 are the displacements between the given vertices. We can then write the covariance as Notice that when j > i+n (case (a) in Figure 1) the trajectories do not overlap and so R 2 = 0.
Since the remaining displacements R 1 and R 3 are independent then Cov(i, j) = 0. Therefore, we need only consider the case when j ≤ i + n (case (b) in Figure 1). In this case, we can write the displacement vectors as dW 1 , dW 2 , dW 3 are Weiner increments over the time length t 1 , t 2 , t 3 respectively, and Note that when j = i + n, we have that R 2 = 0. Expanding the right hand side of equation (4b) results in The first term is calculated as Multiplying out the brackets and taking the expectation of each term gives From here, we note that E(dt i ) = t i for i ∈ {1, 2, 3}, and since the Weiner processes are independent, then (dW i · dW j ) = 0 for i = j. Furthermore, since the Weiner processes and noise terms are independent then (dW i · ψ j ) = 0 for i, j ∈ {1, 2, 3}. However, due to the noise terms containing variances of the observational noise which can share a common vertex, then (ψ i · ψ j ) = 0 for |i − j| = 1. Using these properties, many of the terms in (6) will be zero. Therefore, we are left with We have that dW i = (dW 1 i , dW 2 i ) for i ∈ {1, 2, 3}, where dW 1 i , dW 2 i are independent and identically distributed (i.i.d.) normal random variables with zero mean and variance √ t i , i.e. dW 1 i , dW 2 i are of the form N (0, Similarly for the noise term, we have, for example, ψ 1 = η j − η i , where η i = (η 1 i , η 2 i ) and η 1 i , η 2 i are also i.i.d. random variables of the form N (0, η 2 ). These can be used to calculate the final expectation terms left in (7). This gives E(|R 1 | 2 |R 2 | 2 ) = α 4 t 2 1 t 2 2 + 4Dα 2 t 2 1 t 2 + 4α 2 η 2 t 2 1 − 4α 2 η 2 t 1 t 2 + 4Dα 2 t 1 t 2 2 + 16D 2 t 1 t 2 + 16Dη 2 t 1 + 4α 2 η 2 t 2 2 + 16Dη 2 t 2 + 20η 4 .
The rest of the expectations will not be shown here but use similar algebra. Instead we will simply present the results.
E(|R 2 | 2 (R 1 · R 3 )) = α 4 t 1 t 2 2 t 3 + 4Dα 2 t 1 t 2 t 3 + 4α 2 η 2 t 1 t 3 + 4η 4 − 2α 2 η 2 t 1 t 3 − 2α 2 η 2 t 2 t 3 , Here, represents the Kronecker delta function. In the absense of drift we recover the same expressions as found in the supplementary material of Michalet's paper [? ]. Substituting these results into (5) and simplifying yields where l = j − i. Now that the covariance has been calculated, to obtain the variance of the MSD we must sum all the non-zero terms appearing in the double sum in (3). This can be split into two cases. When n ≤ K we sum the non-zero terms along the diagonals l = 1, ..., n − 1 of the covariance matrix. The l th diagonal contains K − l identical elements, and therefore the non-zero terms are where Cov(l) = 16D 2 ((n − l)∆t) 2 + 8Dα 2 n 2 (n − l)(∆t) 3 . When l = n this corresponds to the case when j = i + n and so the Kronecker delta term will now be non-zero. This diagonal will contain K − n = N + 1 − 2n elements, and so will equal (N + 1 − 2n)(4η 4 − 4(αηn∆t) 2 ). When n > K we don't have any segments with no overlaps. Therefore, we only have overlapping contributions to the covariance matrix. This time the upper limit in the sum (10) is N − n rather than n − 1. So when n > K we have where Cov(l) = 16D 2 ((n − l)∆t) 2 + 8Dα 2 n 2 (n − l)(∆t) 3 . Summing all these terms in either case (n ≤ K and n > K) as above allows us to calculate σ 2 n . It can be shown that the final result is In Figure 2 we plot the theoretical variance of the MSD (11) with an empirical estimate averaged over 1000 and 10,000 samples for the parameter values D = 2 µm 2 /s, α = 1 µm/s, η = 2 µm, N S = 10, N = 100 and T = 100 s. We can see that as we increase the number of samples, the empirical estimate gets closer to the theoretical expression (11).

Derivation of the covariance of the MSD
A theoretical expression for the covariance is needed to calculate the uncertainties in the regression parameters. As before, we consider the covariance of the MSD for a single particle, denoted by σ 2(S) nm . Then, the covariance of the MSD for an ensemble of particles will be From the definition of the covariance, we have that . The process of calculating the covariance is similar to that of deriving the formula for the variance of the MSD. First, we assume that m > n, then letting K = N +1−n and P = N +1−m we have As before, we have to consider the different ways the trajectories can overlap. This time we consider the five cases as shown in Figure 3. The covariance for each case can be re-written as These expectations can be calculated using the results of (8). From this we obtain We need to sum these different covariance values to obtain the overall covariance of the MSD in the two cases m + n ≤ N and m + n > N . As before, this is done by summing the non-zero terms along diagonals in the covariance matrix. Also, note the Kronecker delta terms must be considered separately. The sums are done as follows We used the software Maple to evaluate these sums. Finally, we obtain Figure 4 shows the comparison between the theoretical covariance of the MSD (12) with its empirical estimate, along with a cross section of the covariance of the MSD averaged over 1000 and 10,000 samples, for the parameters values D = 2 µm 2 /s, α = 1 µm/s, η = 2 µm, N S = 10, N = 100 and T = 100 s. Note that we only provide one plot of the full covariance because the plots for the two different number of particles are very similar. Again, we see good agreement between the theoretical covariance of the MSD (12) and its empirical estimate.

Existence of p opt for different parameter values
In the main paper, all ensemble-averaged calculations used N S = 10 trajectories calculated at N = 100 time points. To test the robustness of our calculations, we now test with N S = 100 and N = 10. Figure 5 shows the theoretical and simulated uncertainty σ b /b + σ c /c as a function of the number of fitting points. These experiments were for D = 2 µm 2 /s, α = 1 µm/s, η = 2 µm, N S = 100 and N = 10, with ∆t = 1 s giving T = 10 s for the left plot, while ∆t = 10 s giving T = 100 s for the right plot. Here we see a similar result as that in Figure 2 in the main paper. We have that, for a small value of ∆t, the optimal estimate of the parameters comes from fitting with all the MSD points. As ∆t is increased, we see that the optimal inference comes from fitting with a subset of points; specifically, for all the cases tested, p opt = 9. We also looked for the existence of an optimal number of fitting points when inferring the diffusion coefficient D, the drift magnitude α and the observational noise η. Analogous to the two parameter case, we look to minimise the uncertainty σ a /a + σ b /b + σ c /c since the noise term is related to the regression coefficient a. Figure 6 shows the theoretical and simulated value of σ a /a + σ b /b + σ c /c as a function of the number of fitting points p for D = 2 µm 2 /s, α = 1 µm/s, η = 2 µm, N S = 10 and N = 100, with ∆t = 1 s giving T = 100 s for the left plot, while ∆t = 10 s giving T = 1000 s for the right plot. Here we see that the level of uncertainty, as well as the optimal number of fitting points, cannot be easily predicted by the value of η. for n=1:p tn=(n * T/N); variover=varmsd(n); s0over=s0over+1/variover; s1over=s1over+tn/variover; s2over=s2over+tn^2/variover; s3over=s3over+tn^3/variover; s4over=s4over+tn^4/variover; end del _ over=s0over * s2over * s4over−s0over * (s3over)^2−...

Additional iterative algorithm experiments for p opt
The iterative algorithm was also tested for different values of D and α to check the robustness of the algorithm. First we test for D = 2 µm 2 /s, α = 7 µm/s, η = 2 µm, N S = 10 and N = 1000, for ∆t = 1 s, ∆t = 10 s and ∆t = 100 s. The results are shown in Figure 7. These are similar to the case shown in the main paper. We see that the value of p i converges to p opt in no more than three iterations for all the cases shown. The value of |α i /α − 1| does not decrease between iterations but decreases as ∆t increases. The value of |D i /D − 1| decreases when we fit with p opt points rather than all the points, and the final value of |D i /D − 1| decreases very slightly from ∆t = 1 s to ∆t = 10 s and then increases again from ∆t = 10 s to ∆t = 100 s.
The algorithm was also tested for the same parameter values given above but for D = 6 µm 2 /s and α = 1 µm/s. The results are shown in Figure 8. Again, the value of p i converges quickly to p opt , in this case, after 2 iterations each time. The value of |α i /α − 1| remains roughly the same between iterations but again decreases as ∆t increases. Here, the value of |D i /D − 1| decreases after one iteration in each case but the final value of |D i /D − 1| does not change much between ∆t values. Although not shown here, if the value of ∆t was further increased then the value of |D i /D − 1| would begin to increase again.

Additional single particle experiments for p opt
We now test the iterative algorithm in the single particle case for the same parameters as in Section 4.2. First we look at D = 2 µm 2 /s, α = 7 µm/s, η = 2 µm, N S = 1 and N = 1000, for ∆t = 1 s, ∆t = 10 s and ∆t = 100 s. The results are shown in Figure 9. We see that the results have similar dynamics to those from Figure 4 in the main paper. In all three cases, the value of p i converges close to p opt , the value of |α i /α − 1| only decreases significantly when ∆t is increased and the value of |D i /D − 1| is significantly improved when fit with p opt points rather than all the points. Notice, however, for corresponding ∆t values, that the final value of |α i /α − 1| are always smaller, while the final value of |D i /D − 1| are always larger, than those given in the main paper. This is expected as the larger value of α will cause the MSD to increase quicker for larger times, and so a smaller value of ∆t would be required for better inference of D. The case where D = 6 µm 2 /s, α = 1 µm/s, η = 2 µm, N S = 1 and N = 1000, for ∆t = 1 s, ∆t = 10 s and ∆t = 100 s was also tested and is shown in Figure 10. Again, the results show the same dynamics as those in the main paper. This time, however, we see that the final value of |α i /α − 1| are always larger for corresponding values of ∆t. Again, this is is expected as the larger value of D will in turn require a larger value of ∆t to optimise the inference of α.

Additional iterative algorithm experiments for T opt
The iterative algorithm was tested for other values of the adaptation parameter ψ, as well as more values of D and α, to test the robustness of the algorithm. First, we look at testing the iterative algorithm, but changing the adaptation parameter (Algorithm 2, Step 10 in the main paper) to ψ = 0.5 and ψ = 0.2. Both of these experiments were for D = 2 µm 2 /s, α = 1 µm/s and η = 2 µm with the two initial measurement time intervals, T 0 = 10 7 s and T 0 = 10 −3 s. The results are shown in Figures 13 and 14. The reduction of the adaptation parameter does not appear to have affected the inference in either case. The value of T i still converges to a time close to T opt and the final value of both |D i /D − 1| and |α i /α − 1| are under 10% in both cases. We now test adapting the values of D and α. The first experiment is for D = 2 µm 2 /s, α = 7 µm/s and η = 2 µm; for these parameters T opt ≈ 32 s. The second experiment is for D = 6 µm 2 /s, α = 1 µm/s and η = 2 µm; for these parameters T opt ≈ 2195 s. The value of the adaptation parameter for both experiments were set back to ψ = 0.8. The results are plotted in Figure 15 and 16, respectively. As before, the results are very similar to what has been seen before. As both cases converge to T opt the relative errors are reduced, particularly for a smaller value of the measurement time interval where the reduction is more significant.

Additional single particle experiments for T opt
We look at using the iterative algorithm in the single particle case for different values of D and α. First we test for D = 2 µm 2 /s, α = 7 µm/s, η = 2 µm, N S = 1 and N = 100, again for both T 0 = 10 7 s and T 0 = 10 −3 s. For these parameter values, we have that T opt ≈ 32 s. The results are given in Figure 17. The algorithm takes a few more iterations to converge compared with the ensemble of particles case. However, we still see convergence in a small number of iterations, especially for a smaller measurement time interval. The final parameter values tested are D = 6 µm 2 /s, α = 1 µm/s, η = 2 µm, N S = 1 and N = 100, again for both T 0 = 10 7 and T 0 = 10 −3 . For these parameter values, we have that T opt ≈ 2195 s. The results are given in Figure 18. The algorithm continues to converge in a small number of iterations and reduce the errors significantly in both cases.

Motion blur
It is of interest what effect motion blur has on the MSD. To answer this question we follow the approach of Goulian and Simon [? ], who considered the effect of full-frame motion blur on the MSD of a purely diffusive motion. We assume that the particle position in the nth image, x(t) with t = n∆t, is taken to be the combination of the average of the true position of the particlẽ x(t) over the time frame and the addition of static Gaussian measurement noise. If the time interval ∆t is divided by M smaller microsteps of size δt = ∆t/M , then we assume that the measured position We then have for t ≥ t + M δt Furthermore, we have For the MSD we take t − t = n∆t, and using (13) and (14) and letting M → ∞ we finally get M SD(n∆t) = α 2 (n∆t) 2 + 4Dn∆t + 4 η 2 − D∆t 3 .
It is clear that motion blur only effects the offset of the MSD curve for zero time lags. In particular, if the MSD is fitted by a quadratic polynomial then the quadratic term can be used to estimate the drift velocity and the linear term used to estimate the diffusion coefficient. Note that the same MSD expression given by (15) can be obtained using the general formula for the MSD subject to dynamic and static localization errors derived in Savin and Doyle [? ]. The effect of motion blur and static measurement noise was simulated by first generating true particle trajectories using microsteps of size where ∆t is the time between frames and M is the number of microsteps between frames. For the jth particle the displacement was updated as follows: withx 1 = 0. The measured displacement at t = (n − 1)∆t was obtained by averaging the true position of the particle over the previous M microsteps to simulate full-frame motion blur, and a Gaussian static measurement noise was then added so that i + η n , n = 2, . . . , N + 1, and x (j) 1 = η 1 . The following results are for calculating the optimal number of fitting points. First, we look for the existence of an optimal number of fitting points which minimises the value of σ b /b+σ c /c. Figure 19 shows a comparison between the theoretical value of σ b /b + σ c /c assuming no motion blur and its simulated value which includes motion blur. We see that for small ∆t, corresponding to using all the MSD points in the fitting, motion blur does not seem to have an effect on the results. However, when ∆t is increased, motion blur appears to effect the value of σ b /b + σ c /c. While the theoretical optimal number of fitting points is 8, the simulated optimal number of fitting points is 5. Since it can be seen that motion blur has an effect on the value of p opt , we look to see whether using our theoretical value for p opt on data which includes motion blur leads to a change in results. To test this, we will use the iterative algorithm to see the effect of motion blur on the inference of D and α. For this, the simulated data will include motion blur but will be driven towards the theoretical p opt which does not include motion blur (see section 4.2). In the absence of motion blur, the regression coefficient c = 4η 2 . However, when motion blur is present, the regression coefficient becomes c = 4η 2 − 4 3 D∆t and so the value of η will be inferred with this in mind. The results are shown in Figure 20. For this we have D = 2 µm 2 /s, α = 1 µm/s, η = 2 µm, N S = 10 and N = 1000 for ∆t = 1 s, ∆t = 10 s and ∆t = 100 s. Note that these plots are comparable with Figure 3 from the main paper. Even though we are driving p i to the theoretical optimal number of fitting points which does not include motion blur, we do not see a significant difference in the inference of D and α when motion blur is included. This suggests that our theory for the optimal number of fitting points can still be used when motion blur is present.
To further test the effects of motion blur, we also tested the iterative algorithm for a single particle. The parameters values were chosen to be the same as those above but for N S = 1. The results are shown in Figure 21. Note that these plots are comparable with Figure 4 from the main paper. Again, we do not see a significant difference with the inclusion of motion blur, further suggesting that our theoretical p opt value works when motion blur is present.

Determination of the drift direction
Until now we have assumed that the main interest is the determination of the magnitude of the drift velocity as well as the diffusion coefficient. If the drift angle is also of interest then this can also be inferred using the trajectory data. To calculate the mean direction and measures of the spread about the mean requires the use of circular statistics, details of which can be found in, for example, Fisher [? ] or Mardia and Jupp [? ]. Following in the same vain as the estimation of the MSD, we first calculate the ensemble overlapping time-averaged quantities where cos θ The resultant vector using data with a time lag of n∆t is R n = C 2 n + S 2 n from which we can define the cosine and sine of the average angle using a time lag of n by cos θ d,n = C n /R n and sin θ d,n = S n /R n .
The average angle using displacements with a time lag of n∆t is given by θ d,n =    tan −1 (S n /C n ) S n > 0, C n > 0 tan −1 (S n /C n ) + π C n > 0 tan −1 (S n /C n ) + 2π S n < 0, C n > 0.
To help quantify the uncertainty in inferring the drift angle, the circular variance V n is defined as V n = 1 − R n where R n = R n /(N S (N + 1 − n)) is the mean resultant length. Finally, the first and second central trigonometric moments and m 2,n = 1 N S (N + 1 − n) can be used to define the sample circular dispersion δ n = (1 − m 2,n )(2m 2 1,n ).
The error bars in the following plots use the circular standard error σ n = δ n /(N S (N + 1 − n)). Figure 22 shows the calculated circular variance and mean drift angle using N S = 10 particles and N = 100 measurements, where we have assumed that the optimal measurement time has been used. We can see that the variance in the estimation of the drift angle decreases rapidly over time. This would suggest that the optimal time to infer the drift angle would be at the maximal measurement time. However, due to the lack of samples in the time averaging using the large time lags we find that there is very little difference in the quality of the inferred drift angle other than when very short lags are used. We leave till further study a theoretical analysis to determine the optimal strategy of the drift direction in combination with the other parameters of the drift-diffusion model.