Optimal calibration of optical tweezers with arbitrary integration time and sampling frequencies: a general framework [Invited]

Optical tweezers (OT) have become an essential technique in several fields of physics, chemistry, and biology as precise micromanipulation tools and microscopic force transducers. Quantitative measurements require the accurate calibration of the trap stiffness of the optical trap and the diffusion constant of the optically trapped particle. This is typically done by statistical estimators constructed from the position signal of the particle, which is recorded by a digital camera or a quadrant photodiode. The finite integration time and sampling frequency of the detector need to be properly taken into account. Here, we present a general approach based on the joint probability density function of the sampled trajectory that corrects exactly the biases due to the detector’s finite integration time and limited sampling frequency, providing theoretical formulas for the most widely employed calibration methods: equipartition, mean squared displacement, autocorrelation, power spectral density, and force reconstruction via maximum-likelihood-estimator analysis (FORMA). Our results, tested with experiments and Monte Carlo simulations, will permit users of OT to confidently estimate the trap stiffness and diffusion constant, extending their use to a broader set of experimental conditions.

the fact that, at thermodynamic equilibrium, the probability distribution of the particle position follows the Boltzmann distribution [1,2,13]. Other methods can additionally estimante the diffusion constant by relying on the mean square displacement (MSD) [1], the autocorrelation function (ACF) [14], the power spectral density (PSD) [15] and the force reconstruction via maximum likelihood estimator (FORMA) [2,16]. Generally speaking, the accurate and precise calibration of OT will depend on several parameters involved in the detection of the position of the particle, such as the signal-to-noise ratio (SNR), the sampling frequency or bandwidth, the integration time, and the length of the acquired data.
Typically, an OT setup includes a microscope digital camera or a quadrant photodiode (QPD) to measure the position of the optically trapped particle. QPDs represent an excellent solution to calibrate OT when dealing with a single static optical trap; they work at very high sampling frequencies (on the order of hundreds of kilohertz) and short intergation times (on the order of microseconds) with high SNR. Nevertheless, when a larger field of view is needed (e.g., in holographic optical tweezers trapping multiple particles at the same time), digital cameras are commonly used to visualize the whole sample, permitting the user to select the region of interest and the particular particle(s) to be analyzed [1,13]; however, standard microscopy cameras have the disadvantage that have low sampling frequencies (usually up to some kilohertz) and long integration times (up to milliseconds), which can affect the quality of the OT calibration with standard methods.
Several works have discussed the effect of the integration time and sampling frequency on the calibration of OT [17][18][19][20][21][22][23][24][25][26][27][28]. The EP method is arguably the simplest one and, often, the natural starting point for more complex analyses, with the limitation that it only provides the stiffness of the optical trap but not the diffusion constant of the particle [17][18][19]; a finite integration time and a limited sampling frequency have been shown to introduce artifacts in the reconstructed trapping potentials [17]. There has also been some work exploring the effects of the integration time on different geometries of the trapping potential and in cases of anisotropic diffusion, which may have applications in nuclear magnetic resonance imaging techniques [19]. For MSD-based methods, the effect of the integration time has been discussed in the context of rheological studies measuring the viscoelastic properties of the liquid where the particles are immersed [17,19,[21][22][23][24][25], relevant for the study of processes intertwined with, e.g., cellular dynamics (such as cell growth, stem cell differentiation, cell crawling, wound healing, protein regulation, cell malignancy, and even cell death [24,26]). When using the PSD method, the finite integration time and limited sampling frequency result in an effective low-pass filtering of the signal whose signature can be identified in the acquired power spectra [18,20,27,28].
Recently, there has been a growing interest in applying techniques of classical information theory to the calibration of OT. For instance, estimations using posterior distributions and maximum posterior estimators (e.g., FORMA with an uninformative prior [16]) are two of the recently developed alternatives, which, nonetheless, still consider restrictive ideal conditions, such as high sampling frequency and negligible integration time [3,16,29,30]. Using maximum likelihood estimators, Refs. [23,25,28] were able to incorporate into the mathematical analysis a limited sampling frequency and integration time and, then, use concepts from information theory, such as Fisher information [25] and the Cramer-Rao bound [23], to ascertain the bona fide of their techniques. Other works have considered also a covariance-based estimator [31] While many works and methodologies are available, a general and comprehensive approach to tackle the effects of the integration time, sampling rate and trajectory duration in standard calibration methods is still lacking. Also, some of the works mentioned previously rely on some intermediate procedures to fit the model's functions, typically by means of non-linear fitting algorithms, with the disadvantage of being time consuming and potentially adding errors to the final estimators.
Here, we develop a general framework from which we derive analytical solutions that incor-porate sampling frequency and integration time in their descriptions for the most commonly used methods to estimate the stiffness and the diffusion constant in OT (namely EP, MSD, ACF, PSD, and FORMA). We provide generalized analytical solutions for these methods, enabling accurate and precise estimations, independent of the frame rate and of the integration time of the camera. This will offer a framework for future research to tackle problems where fast or real time estimations of the stiffness and diffusion are required, such as those dealing with non equilibrium thermodynamics or molecular biology. This article is organized as follows: First, in Sec. 2, we highlight the problems that arise due to a finite integration time and a limited sampling frequency on the estimates derived from the standard calibration methods (EP, MSD, ACF, PSD, and FORMA), which assume ideal conditions (i.e., high sampling frequency and negligible integration time). In Sec. 3, we derive analytical expressions that incorporate arbitrary integration time and sampling frequency for all the methods, and we further use them to estimate the stiffness and diffusion. A summary of the methods in their standard and generalized forms, including practical considerations, is presented in Table 1. Then, in Sec. 4, we compare the performance of these new formulas, to which we will refer as generalized formulas, to estimate the stiffness and diffusion constant against standard ones in a broad range of experimental conditions, going from low sampling frequencies and large integration times to high sampling frequencies and short integration times. Finally, in Sec. 5, we show the performance of the generalized formulas to estimate stiffness and diffusion constant when controlling total time and sample number. At the end of the article, in Secs. 6 and 7, we discuss our results and give general conclusions and future research directions.

Problems due finite sampling frequency and integration time
Let us start with a reminder of the standard calibration methods to later dwell on the effects that a finite sampling frequency and integration time may have on them. We assume that the effect of the OT on a particle is well described by the Langevin equation modelling the motion of a Brownian particle in the overdamped regime subject to a Hookean restoring force. For simplicity, we also restrict ourselves to the one-dimensional case: The diffusion constant is = B / , with B the Boltzmann constant, the absolute temperature, = 3 p the drag coefficient, p the diameter of the particle, the viscosity of the medium, and ( ) a white noise with zero mean and Dirac-delta-correlated variance ) ( ′ ) = ( − ′ ) [32]. A formal solution to Eq. (1) reads where 0 is the initial position at time = 0 and ot = / is the characteristic relaxation time of the particle in the optical trap. From this theoretical model, one can easily derive exact formulas for physical quantities of interest that will depend on the two parameters and . To infer these parameters, one must contrast the theoretical formulas with sample estimators constructed from a dataset containing a time series of the particle's position measured experimentally. Let our dataset be D ≡ { } s =1 , where ≡ ( ) are the particle's positions taken at regular times = Δ = / s , for = 1, 2, ..., s , and s = 1/Δ denotes the sampling frequency. A summary of the implementation and relevant functions involved in these methods are (for further details see Refs. [1,2]): • Potential and EP analyses. At thermodynamic equilibrium, the probability of finding the particle around a position is given by the Boltzmann distribution: where 0 is a normalization factor. In this case, the statistical estimator is, fairly simply, the histogram of the probability of finding the particle at a given position constructed from the dataset D. The experimental potential is estimated by means of the histogram ℎ( ) of the position, with the bins' coordinates, If we further assume that the potential is harmonic, with eq the stable equilibrium position, we can fit this model function to the estimates in Eq. (4) and obtain as a free parameter. Equivalently, by the equipartition theorem we know that This implies that a simple way to infer the value of is to use the dataset D to construct an estimate of the position's variance ( − eq ) 2 . Its statistical estimator is the so-called sample variance, usually denoted as 2 s . Thus, with 2 s = 1 s −1 s =1 ( − eq ) 2 and eq = 1 s s

=1
. Since the potential and the EP formulas are essentially equivalent, very similar results are expected when the number of data is high enough; therefore, in the following, we will focus only in the EP method.
• MSD analysis. The MSD is theoretically defined as for larger than ot . Given the dataset D, its statistical estimator is given by with = ( Δ ) and ℓ = ℓΔ . The theoretical formula and its estimator are then fitted to infer and ot (from which can be straightforwardly obtained).
• ACF analysis. The autocorrelation function is defined as: again for larger than the characteristic relaxation time. In this case, its estimator is given by which upon comparison with its theoretical counterpart allows us to infer and (through ot ).
• PSD analysis. For a continuous and infinitely long signal (solution to Eq. (1)), the PSD is given by which depends on c = /(2 ) and . If, however, the signal is taken at discrete time steps, the above formula must be replaced by [2]: where Δ = ((1 − 2 ) /2 c ) 1/2 and = −2 c / s . The corresponding sampled estimator of the PSD is computed by means of the Fast Fourier Transform (FFT), with = / s for = 1, . . . , ( s − 1)/2. In this case, (· · · ) stands for the expected value estimated by averaging many experimental replicas or by data compression (typically obtained by moving average).
• FORMA analysis. The values of and are directly computed from the experimental dataset D by maximizing the likelihood of the linear model. The linear model comes about discretizing the Langevin equation to linear order, and rescaling the noise accordingly, yielding: with Gaussian variables of zero mean and unit variance. From here, the model's likelihood is: maximizing with respect to the parameters we obtain the maximum likelihood estimators: All these methods assume that each sample contained in the dataset D is measured instantaneously. Additionally, the MSD, ACF, PSD and FORMA methods need the sampling frequency to be high enough in comparison with the characteristic time of the trap to fully access the time dependence of these observables. However, a real sampled trajectory will be distorted by the time the detector takes to collect the data, called integration time, as schematically shown in Fig. 1(a). This will skew the estimations of and . A way to model the effect of the integration time is to assume that the sampled position at a given time is averaged around a time window of size , which represents the integration time, that is: The new datasetD ≡ {˜ } s =1 captures better the trajectory obtained in an experiments. This is equivalent to a low-pass filter, with an upper bound, ≤ Δ . In a digital camera with global shutter, the integration time (exposure time) can be set in a range of values defined by the camera maker and is limited by the maximum sampling frequency (frame rate) allowed by the device. In a QPD, the effective integration time depends on the electronic bandwidth and typically is not a controllable parameter (in contrast to standard digital cameras, QPDs may reach quite easily sampling frequencies of the order of ∼ 10 5 Hz, allowing very short integration times of the order of 10 −5 s).
To appreciate the effect of the sampling frequency and the integration time on the standard methods previously described, we perform some Monte Carlo simulations of the dynamics of a particle trapped in an OT varying the sampling frequency and the integration time (see Refs. [1] and Appendix A for more details on the Monte Carlo simulations). For this purpose, we used a particle of diameter p = 1.  Fig. 1(b) shows the ideal case where the data points corresponding to each estimator follow the theoretical predictions of the standard formulas. In this case, the sampling frequency is more than one order of magnitude higher than the characteristic frequency of the trap, s = 5000 Hz ≫ f ot = 1/ ot = / = 297 Hz.
A less ideal condition with still zero integration time but with a low sampling rate is shown in Figs. 1(c). In this case, the sampling frequency, s = 100 Hz, is slightly lower than the characteristic frequency of the trap. Broadly speaking, we can already notice some important effects on the estimators that could have repercussions on the estimated values of and . Probably, the least affected is the potential, since this method does not depend on the sampling frequency (as the main condition to properly recover the potential or the variance of the position is that the total elapsed time of the trajectory is much larger than the relaxation time of the trap ( s ≫ ot ) [2], which is easily accomplished in the majority of cases since typically ot is of the order of some milliseconds and s could be on the order of seconds or more).
Regarding the MSD (second column in Figs. 1(c)), the data of the estimator do not appear appreciably distorted. In this respect, the only condition needed to properly resolve the MSD with the estimator relies on a high enough sampling frequency to recover the first part of the function. Since this curve reaches its maximum value at a characteristic time = ot , Δ ot would enable enough data points in the first region of the plot.
The figure in the third column of Figs. 1(c) shows the case of the ACF. The estimator in this case, similarly to what happens with the MSD, is not distorted by the low sampling frequency, following the standard formula. Nevertheless, since this function falls rapidly to zero with time, with a characteristic time given by ot , from a statistical point of view, the behavior of the ACF would be better reproduced by the estimator if Δ ot , similarly to what happens with the MSD.
The estimator of the PSD shown in the fourth column of Figs. 1(c) follows its corresponding analytical expression. Since the estimator of the PSD depends on the discrete Fourier transform, the aliasing of the data turns more evident as the sampling frequency gets lower, which is the case in this example. In this case, the analytical solution of the aliasing PSD is more adequate, which is in turn the one that is shown in these figures. Under these still ideal conditions, the only restrictions for the PSD are to have a sufficiently long trajectory, with s ≫ ot , and high enough sampling frequency s ot in order to have a good representation of the PSD at low and high frequencies around the corner frequency c , which in our example is c = /(2 ) = 47.3 Hz.
The last figure in Figs. 1(c) depicts the behavior of FORMA. In this figure, in contrast to the one described above, where the sampling frequency was high, it is evident that the trend of the scattered plot, particularly its slope, is very different from the analytical formula. This was already expected since FORMA in its simple form was designed for data sets taken at sampling frequencies much higher than the characteristic frequencies of the trap [16], giving rise to wrong estimates of and when these formulas were applied to experimental data under these conditions. Figs. 1(d-e) show the cases with low sampling frequency and a finite integration time, different from zero. These figures illustrate how the estimators of the different methods fail as the integration time increases. In the case of the potential, as the integration time increases, the estimated values follow a steeper parabola than the expected one, which, according to Eq. (3), would indicate an apparent higher stiffness than the real one. In the same way, this effect would affect the estimation of the stiffness via the equipartition formula in its standard form (see Eq. (7)).
In these same conditions, the estimator of the MSD is distorted at all times, getting lower values than expected, with a lower initial slope than the theoretical prediction. According to the ideal formula (see Eq. (8)), the MSD behaves linearly for short times with a slope of 2 , while it reaches a plateau for long times at a value 2 B / . Hence, comparing directly the estimated values of the MSD with a finite integration time and the corresponding standard formula (Eq. (8)) would give rise to biased values of and , namely lower diffusion constant and higher stiffness than the expected ones.
As the integration time increases, surprisingly, the ACF is not drastically distorted, as can be seen in the third column of Figs. 1(d-e). The difference between the estimated values and the analytical standard description is quite subtle, albeit clearly visible in the data points evaluated at = 0. We will show in the following section that the zero lag time of the ACF is actually the one that is mainly affected by the integration time while the following terms may present negligible distortions when has a low value. Considering this property, the ACF would enable more reliable estimations of and than the other methods, particularly if the integration time is short or moderate and if the first data points of the ACF are ignored in the fitting procedure.
For the case of the PSD, the integration time distorts the estimated PSD by reducing its expected values at short frequencies and increasing the slope of the decay at large frequencies.
Since at short frequencies the dominant behavior of the PSD, according to the analytical expression Eq. (13), is given by /(2 ) −2 c and at large frequencies the PSD is expected to decay predominantly as /(2 ) −2 when the aliasing is ignored, similarly to the MSD, this give rise to an apparent increment of and a reduction of .
In the case of FORMA, as the integration time increases, we see that the scatter plot becomes thinner and tilts counterclockwise, indicating again apparent higher and lower valued estimates of the stiffness and the diffusion constant, respectively.

Generalized methods
As we have mentioned before, the dynamics of a particle of diameter p , immersed in a fluid with viscosity and undergoning the action of an OT with stiffness , can be modelled by the Langevin equation (1). Recall that the standard formulas of the methods, i.e., EP (Eq. (5) and (7)), MSD (Eq. (8)), ACF (Eq. (10)), PSD (Eq. (13)) and FORMA (Eq. (17)) described above, do not account for the integration time, which implies that the inferred parameters will be biased. To incorporate this effect in a general framework, we should find the probability of observing a whole particle's trajectory already corrected for the integration time. If we were able to derive such an expression, then we could easily derive any physical quantity of interest from it. In the following, we show that this is indeed possible.
Starting from Eq. (18), using the formal solution Eq. (2), and after some calculations based on the path integral formalism (see Appendix C), it can be shown that the joint probability density function, denoted here as {˜ } s =1 of observing the particle at sampled positions˜ 1 , . . . ,˜ s at times 0 < 1 < · · · < s < s , is given by : corresponds to the ( , )-entry of the inverse covariance matrix between two particle's blurry positions at time and , where denotes the Kronecker delta and = /2 ot is half the ratio between the integration time and the relaxation time. For the derivation of this formula, we have solely assumed that | − | > , which is quite realistic. Besides, since we assume that the process runs within an interval starting at zero, we must have that the first time 1 at which we have the first point recorded must obey 1 − > 0.
In the limit of going to zero, which corresponds to either a very fast shutter velocity or a very long relaxation time, we recover the standard covariance matrix, and the joint PDF given by Eq. (19), that can be written as ( +1 | ), with ( +1 | ) the propagator of the standard Ornstein-Uhlenbeck (OU) process. Generally, the joint PDF given by Eq. (19) cannot be factorized in this way, since the corrected process is non-Markovian. Other than that, this expression contains all the statistical information of the system and we can now proceed to rederive the theoretical formulas used in the calibration methods. Henceforth, we will take the initial condition 0 = 0 and, moreover, for MSD and ACF we will assume that enough time has elapsed so that we only keep the time-translational invariant term in the covariance matrix.
• Potential and EP analysis. From Eq. (19), we can evaluate the expectation value ˜ 2 s which equals to . This automatically implies that This implies, in turn, that the effective harmonic potential is given by the following formula: When = 0, Eqs. (21) and (22) reduce to their standard form (Eqs. (7) and (5)). Notice that the formulas of the standard methods do not depend on ot , so can be directly determined from these formulas without any previous knowledge of the fluid properties. This must be contrasted to Eqs. (21) and (22), which depend on ot , through , and hence on the drag coefficient of the particle in the fluid, . In this sense, to directly use the generalized equations to predict the stiffness of the trap from a single experiment, previous knowledge of has to be incorporated in the solution. A different approach, as it was pointed out in Ref. [18], would be to carry out some experiments, at least two, with different exposure times in order to obtain a data set of the variance ˜ 2 s ( ) = F ( /2 ot )/ , and from these determine and ot either by solving the resulting equations or by fitting the data points to the analytical formula. A similar approach can be followed in the potential method, with the difference that the variance in this case is obtained through the fitting of the potential, Eq. (22). For simplicity, in the following sections, we apply the EP using = 3 p , where we assume that the viscosity, , is the one of water at the laboratory temperature and p is the diameter of the particle reported by the fabricant.
• MSD analysis. The mean squared displacement for a lag time ℓ = +ℓ − is defined as MSD( ℓ ) = (˜ +ℓ −˜ ) 2 . Now, from the new joint PDF we have that and gathering these results yield Fitting this equation to the experimental estimators allows us to infer the three free parameters = 2 F ( )/ , = 2 (sinh( )/ ) 2 / , and = ot . Interestingly, we could infer ot and from these three equations without knowing in advance, which actually could be inferred as well. In our case, as we will see later, we use as a known parameter in all the realizations discussed below.
• ACF analysis. Recall that the autocorrelation function is defined as ˜ ˜ = , and therefore Interestingly, since F ( ) ≤ sinh( ) 2 there is a jump from the first term at equal times to the following ones at different times in the autocorrelation function. Notice also that, for small, the first leading correction to F ( ) is linear in , while it is quadratic for the factor sinh( ) 2 . Comparing the generalised equation Eq. (25) with the standard one, Eq. (10), the terms corresponding to zero lag time ( = ) is the one that differs the most, while the behaviour for non-zero lag times has a more moderate correction to ≠ 0. Overall, ignoring the zero-time term makes the ACF more resilient to trajectories sampled with a finite integration time. This explains the robustness of the standard ACF as the integration time increases (Fig. 1(b-e)).
Looking at Eq. (25), similarly to what happens to the MSD in its generalized form, one could divide the inference problems into two parts corresponding to zero and non-zero lag time and get the values of ot , and . For instance, we could use the non-zero lag time values to fit Eq. (25) and obtain the free parameters = (sinh( )/ ) 2 / and = ot and the zero lag-time value to estimate = F ( )/ , then, we can compute , ot and , or equivalently , from these three parameters. Importantly, this implies that we do not need to know in advance in order to infer and accurately. In our case, as we will see later, we use as a known parameter in all the realizations discussed below.
• PSD analysis. Given the datasetD of the blurry trajectory, the PSD is defined as follows: which can be rewritten as: where we have used the fact that ˜ ˜ = is the covariance matrix. Gathering the previous results, we get the generalised formula for the PSD: After a bit of an unrelentingly tedious algebra, one shows that the double sum reads This allows us to get the final expression for the generalised PSD formula: Notice that the leading term for large s is: • Bayesian inference and maximum likelihood estimator (FORMA). For the model at hand, the likelihood L of observing the datasetD given the model's parameters, denoted here collectively as , is: Using Bayes' rule, the posterior distribution of observing the parameters of the model given the dataset is: Unfortunately, the model's likelihood L (D| ) is fairly complicated since the corrected process is non-Markovian, which implies that the inverse of the covariance matrix is not tridiagonal. Nevertheless, since ({˜ } s =1 | ) is multivariate Gaussian, any marginal is multivariate Gaussian with the corresponding reduced covariance matrix. This allows us very easily to write the probability (˜ +1 ) of observing the particle at the corrected position˜ +1 at at time +1 conditioned that at a previous time it was at the corrected position . Let us denote Δ = +1 − . Then, one can show that and, therefore, where we have introduced the function From here, we can approximate the model's likelihood as where we have considered as parameters = ( 1 , 2 , 3 ) = ( , 1/ ot , /2). Let us next introduce the estimators: Then, maximizing with respect to 1 and 2 we obtain the MLEs If we want to plot the dataset D in a two-dimensional scatter plot where the -axis represents˜ and the -axis represents (˜ −˜ )/Δ , then, looking at the argument of the exponential in Eq. (37), the cloud of experimental points will have an elliptical form with the major axis having a slope given by − (1 − T 2 /T 3 ) /Δ and a width proportional to In the following section, we show the performance of these generalized solutions and compare them to the standard ones on experimental measurements.

Experimental estimation of the stiffness and diffusion
To demonstrate the performance of the generalized formulas presented in the previous section, we performed a series of experiments with an OT. The details of the experimental setup are described in Appendix B. Briefly, a silica particle of diameter p = 1.54 ± 0.10 m was trapped in water with an OT obtained by focusing a laser beam of 532 nm wavelength. The laboratory temperature was = 22 ± 0.5 • C. The trapped particle was recorded at three different sampling frequencies and four integration times In all these experiments, we acquired trajectories with s = 10 5 samples and in all the cases where fitting is required (e.g., MSD, ACF and PSD), we used standard non-linear least square fitting routines provided by MatLab.
The estimated values of and are shown in Figs. 2, 3, and 4 for the three sampling frequencies s = 500, 1500 and 3500 Hz. For each of these experiments, the values of and were estimated as a function of following standard and generalized formulas. In Figs. 2, 3, and 4, the dots represent the experimental results. To compare our results with theory, we performed Monte Carlo simulations of the OU proccess (Eq.(1)) taking the mean of the experimental values of and retrieved by means of the generalized analyses, giving¯ = 4.08 ± 0.05 pN/ m and¯ = 0.299 ± 0.008 m 2 /s. Performing 50 realizations with s = 10 5 samples for each case, the Monte Carlo simulations enable to estimate the confidence intervals obtained by means of the standard deviation of all the replicas as a function of , which are depicted in colored shaded areas in Figs. 2, 3, and 4. The gray shaded areas of the diffusion constant depict the confidence interval of the expected diffusion computed from the size of the particle and the viscosity of water at the laboratory temperature ( = 0.95 ± 0.01 mPa s, with its error estimated from the error range of ), * = 0.295 ± 0.020 m 2 /s, which is useful as a reference parameter to evaluate our estimations. Additionally, using¯ and¯ , together with the Stokes-Einstein relation = / , we estimate the characteristic time of our OT to be ot = / = 3.4 ms, which corresponds to the characteristic frequency ot = 1/ ot = 297 Hz.
As already illustrated in Fig. 1, the standard formulas fail to give a good description of the main quantities as increases. This gives rise to wrong estimations of and in all the analyses, as can clearly be seen in Figs. 2(a) and 2(c). While increases, the values of and linearly move away from the expected value: is more and more overestimated, while is increasingly underestimated. An exception is the ACF method, which shows an opposite behavior and gives the best results under these conditions. These values were obtained from the fitting procedure ignoring the first data point of the ACF (ACF( ℓ=0 )), which, as we saw in previous section, carries most of the bias arising from the integration time. Moreover, in the MSD and in the ACF only the first data points corresponding at most to 6 ot ( ℓ ≤ 6 ot ) were taken into consideration for the fitting.
We can see from the estimated values that the bias with the integration time gets less important as the sampling frequency raises, as shown in Figs. 3 and 4, since in the extreme case with s = 3500 Hz and = 150 s the sampling frequency is at least one order of magnitude larger than the characteristic frequency of the trap and the integration time is one order of magnitude shorter than the characteristic time of the trap ( s ≫ ot and ≪ ot ). The bias of and with respect to and sampling frequency disappear for all the methods when the generalized formulas are used (see Figs. 2(b,d), 3(b,d), and 4(b,d)). Interestingly, all the diffusion constants estimated by the generalized analyses fall within the confidence interval of * .
All of the experimental data points shown in Figs. 2, 3, and 4 were estimated with the same number of samples ( s = 10 5 ). This means that the data obtained for one experiment at

Influence of the trajectory length
While it is intuitive to assume that both the total time s and the number of data points s have to be large in order to obtain accurate and precise values, the increase of these two parameters do not necessarily lead to better estimates, as we show in Fig. 5. At first glance, as Figs. 5(a-c) show, the convergence of the stiffness with respect to the total sampled time s does not show any significant differences for the three considered sampling frequencies. After a long enough sampled time of about s = 10 s, all the methods yield good results within an error of 10% (dashed-black lines). Since these results are independent of the sampling frequency, for a given require enough time only, much larger than the characteristic relaxation time of the trap, and a faster camera or QPD will not necessarily improve the experimental accuracy. Moreover, this result could play a very important role when the experiment requires a long integration time, for instance when the light used for detection is very dimmed, whether it is for practical reasons or because the interesting sample could be photodamaged or altered with high intensity illumination.
On the contrary, a good estimation of the diffusion constant requires a large number of samples independently of the total time acquired. Figs. 5(d-f) exhibit the behavior of the diffusion with the number of samples s for all the methods, showing no dramatic dependence for each sampling frequency. In this case, we fixed the maximum number of samples to s = 10 5 corresponding to s = 28.6 s for the highest sampling frequency (Fig. 5(f)) and to s = 3.3 min for the lowest sampling frequency (Fig. 5(d)). Due to this property, the estimation of the diffusion constant is not limited by the total acquisition time, so it can be retrieved much faster than the stiffness by increasing the speed of the camera or detector. This is important in practice if the main goal is to estimate the diffusion, drag coefficient or the viscosity in conditions where these parameters are not stationary. For instance, using the highest sampling frequency in our experiments of s = 3500 Hz (Fig. 5(f)), the time taken to have a good estimation, under the gray shaded area in Figs. 5(d-f), will require only a few seconds, or even a fraction of a second using FORMA.
Comparing the performance of the generalized methods, in general all of them have a very similar behavior when estimating the stiffness. Looking at the diffusion as a function of the number of samples s (Fig. 5(d-f)), the rhythm at which the various methods converge is very different. While the MSD, ACF, and PSD methods have an error of about 10% at s = 2 × 10 4 samples, FORMA achieves a much lower error, lower than 1%. The PSD seems to be more sensitive to sampling frequency, showing larger errors at the highest sampling frequencies. This is probably an error arising from a deficient representation of the PSD at low frequencies, biasing the fitting in this case, meaning that a longer total time could benefit this estimation. It is important to stress that these estimations were realized with standard non-linear fitting, but weighted non-linear fitting would improve the estimations making them closer to those obtained with FORMA, at the expense of using more computing time and more sophisticated algorithms. Moreover, the complete expression for the PSD shown in Eq. (30) may also contribute to improve these results.

Discussion
Integration time, sampling frequency and data length of the trajectory of the particle in an OT affect the estimation of stiffness and diffusion constant in all the standard methods normally employed to extract these observables. All the standard methods enable accurate results when (i) the sampling frequency s is much higher than the characteristic frequency of the trap ot , (ii) the integration time is much shorter than the characteristic relaxation time ot , and (iii) the total acquisition time is several orders of magnitude larger than the characteristic relaxation time of the optical trap.
When these ideal conditions start to weaken, the estimators for and based on standard methods diverge from the real values. First, thinking of an ideal experimental situation where the integration time is negligible and only the sampling frequency is reduced until a value close or even lower than the characteristic frequency of the trap, most of the methods seem to give reasonable predictions as long as the sampling frequency remains similar to the characteristic frequency. Under this condition, FORMA shows less reliable results, which is understandable since this method is based on the assumption of high sampling frequencies [16]. If the sampling frequency gets much lower than the characteristic frequency, then an important loss of information at short times in the MSD and ACF could bias the fitting of their respective analytical functions, affecting particularly the prediction of . The predictions by means of the PSD may also be affected if the Nyquist frequency (corresponding to half the sampling frequency) does not reach values around the corner frequency in order to properly define the PSD and obtain a good fitting, enabling a good estimation of and . The potential and equipartition methods are probably the least affected under these circumstances as long as the sampled trajectory is long enough in order minimize the statistical error in the experimental histogram or the variance.
Second, going a bit farther from the ideal conditions, as the integration time increases, the deviation of and from their estimated values obtained using the standard methods increase as well. For the EP, MSD, and PSD the trend is more or less similar: the stiffness gets overestimated and diffusion underestimated as the integration time increases, as can be seen in Figs. 2, 3, and 4. As can be seen in Fig. 1, the main deviation of the ACF from the theoretical value occurs at ACF( ℓ=0 ). Ignoring this value allows to get very good estimations of the stiffness and diffusion, even when the data set are far from the ideal conditions, as can be seen in Figs. 2, 3, and 4. On the opposite end, there is FORMA, which is a reliable, simple and fast method when the sampling frequency is much higher than the characteristic time of the OT [16]. In this respect, low sampling and non-zero integration time drastically degrade the estimations obtained by means this method in its standard form.
Incorporating both the integration time and frequency sampling into the estimators in all the common methods improve the estimates drastically, making them more accurate and precise. For a sampled trajectory with a large number of data taken over a long time, the estimations for all these generalized methods are comparable, showing a high degree of accuracy and precision, even if the integration time is large or the sampling frequency low, as can be seen in Figs. 2, 3, and 4, for the generalized formulas. At this level, the main differences in the methods will come from the way in which the experimental estimators are used or compared with model functions in order to extract and . MSD, ACF, and PSD require an intermediate routine to fit their experimental estimators, commonly using well-established non-linear fitting routines, or some intermediate procedures such as data selection or windowing in the case of the PSD, requiring in many cases preliminary knowledge of the system to optimize the solutions and computing time. In this respect, EP and FORMA contrast with the other methods in the sense that there is not any fitting involved in their solutions and it is not necessary to incorporate any extra information or any intermediate routine, making them very fast and reliable. This can clearly be seen in Figs. 2, 3, and 4, where EP and FORMA in their generalized form, overall, retrieve the most accurate estimations of and .
In general, the estimations are expected to improve as the amount of information contained in the trajectories increases. This information is given by the number of data sampled and the total acquisition time. One might be tempted to increase them by recording longer times and/or by increasing the number of data points in the trajectory by means of the sampling frequency. However, one must make considerations about the specific requirements imposed by the experiment; for instance, there are many situations where the sampling frequency can in principle be increased to very high values (e.g., using a fast camera system or a quadrant photodiode), but degrading the signal-to-noise ratio and hence the estimations of and .
In practice, the total acquisition time cannot be extended indefinitely because the experimental conditions may be non-stationary or this would increase considerably the computational resources to record, store and analyze the data. In this respect, there are better approaches depending on whether one needs an accurate value of the stiffness or of the diffusion constant . For instance, if one wants to determine the stiffness with high accuracy, this will require enough acquisition time only, much larger than the characteristic time of the trap, while a faster camera or detector will not necessarily improve the estimation. On the other hand, contrary to the stiffness, the diffusion constant can be retrieved much faster by increasing the speed of the camera or detector (Fig. 5). Additionally, we can notice important discrepancies between the methods when insufficient information is contained in the trajectories. Particularly, in Fig. 5 we notice that the stiffness for all the methods behaves very similarly, but for the case of the diffusion constant, FORMA in its generalized form shows much better accuracy and precision than the other ones.
The analysis shown in this work is directly applicable to data acquired by means of a standard digital camera, where the integration time is typically well defined. When dealing with quadrant photodiodes, the integration time may not be accessible, but a slightly adapted version of the analysis presented here may be still applicable. For example, the generalized MSD and ACF allow one to estimate in addition to and . Similar considerations apply for the cases where an effective integration time is indirectly implemented in the data processing by means of low-pass filtering.

Conclusions
In this work, we have derived generalized formulas for the most common OT calibration methods by incorporating the integration time and sampling frequency. The effects of these factors on the estimates of the stiffness and diffusion were analyzed theoretically, numerically, and experimentally.
We have shown that the integration time has important consequences for all the standard methods, leading to errors of up to 50% in some extreme cases. The ACF seems to be an exception, behaving better than the other methods if the zero lag time (ACF( ℓ=0 )) is ignored in the fitting procedure . On the other hand, the generalized formulas give accurate and precise results independently of the technique, with errors that are less than 10% for a few seconds, or a thousand of data sampled under our experimental conditions. The results can be drastically improved by incorporating either more data or sampling for a longer time, independently of the integration time or the sampling frequency. Particularly, we show that the generalized formulas give very good estimations of the stiffness if the sampled time is long enough, longer than an order of magnitude of the characteristic relaxation time, without worrying about the sampling frequency. In stark contrast, the diffusion constant can be estimated very fast and accurately with all generalized methods, even when the sampled time is very short or with very few data. More interestingly, the generalized expression for FORMA enables to retrieve very good estimations of the diffusion within an error of less than 10% with some thousands, or even hundreds, of data points, making feasible to estimate the diffusion extremely fast with a high accuracy, depending mainly on the speed of the detector. In our case, we showed that a sampling frequency of 3500 Hz enables to obtain the diffusion within less than a second and with much less relative error than 10% for a stiffness of 4.08 pN/ m. This work paves the way to extend the applicability of common methods to contexts where the experimental conditions or the limitations of the setup do not allow measurements in ideal conditions, i.e., at short integration time, high frequency sampling and large amount of data, to obtain fast and reliable information about the stiffness and diffusion in OT.

Autocorrelation function (ACF) analysis. The statistical estimator is ACF
. These data points are fitted to the model function, with the free parameters and ot , Standard: Generalized: .

Generalized:
Force reconstruction via maximum-likelihood-estimator (FORMA) analysis. The stiffness and diffusion (or ) are directly computed from D via the formulas Standard: Generalized: Table 1. Overview of the calibration analyses in the overdamped regime. These methods allow to estimate the stiffness of the optical trap and, in most of the cases, the diffusion constant of the optically trapped particle from the sampled trajectory of the particle D ≡ {˜ } s =1 at sampled time Δ and integration time . The standard model functions assume much shorter sampling and integration times than the characteristic time of the optical trap. On the other hand, the generalized model functions take into account arbitrary sampling frequency and integration time, allowing to drastically improve the calibration of OT even if data acquisition is very far from the ideal conditions. For the second term, the idea is to change the order of integration to express it as an integral over the whole trajectory of a function coupled with the noise. Thus, we write where we have defined being Θ( ) the Heaviside step function. The two integrals appearing in the above definition of ( ) can be easily done, viz.
We finally obtain the following expression for the stochastic dynamics of the corrected trajectory: Now, we are in position to derive the joint PDF, denoted here {˜ } s =1 , of observing the corrected particle's trajectories at the blurry points˜ 1 , . . . ,˜ s and times 1 , . . . , s . Mathematically, this is given by: where (· · · ) denotes the average over all possible realization of the noise's trajectory ( ), whose probability measure is given by the following path integral: where is a normalization constant. Let us now derive an exact expression for the joint PDF.
To do this, we write The average over all possible realizations of the noise ( ) has the following form: But the path integral over ( ) is easy to evaluate yielding: where we have defined Gathering the results, we have that: With respect to the auxiliary variables this is a multi-variate Gaussian integral, easy to carry out. The final result for the joint PDF of observing a blurry trajectory is: where the matrix elements are given by Eq. (54).
To carry out the integral (2) , we will assume that | − | > , that is, the intervals [ − /2, + /2] and [ − /2, + /2] never overlap (even for consecutive ones) unless = . Next, looking at the expression of (2) , we see that the three Heaviside functions indicate that for the integral to be non-zero we must have that the intersections of the intervals [0, − /2] ∩ [ − /2, + /2] and [0, − /2] ∩ [ − /2, + /2] must be non-zero. This automatically implies that whenever = , the integral is zero. We are left to consider the two cases > or < . This automatically implies that: Doing the integrals and rearranging terms, we finally arrive at the following expression Finally, looking at the expression of the integral (3) , we observe that the integration interval is given by I[ − /2, + /2] ∩ [ − /2, + /2], so unless = the integral is zero as the intervals do not overlap. Thus, We can now add up the three contributions to obtain the following expression for , viz.
Finally, using the following identities allow us to simplify the covariance matrix, yielding: .