Fisher information theory for optimised lifetime estimations in time-resolved fluorescence microscopy

Measuring the lifetime of fluorescent emitters by time-correlated single photon counting (TCSPC) is a routine procedure in many research areas spanning from nanophotonics to biology. The precision of such measurement depends on the number of detected photons but also on the various sources of noise arising from the measurement process. Using Fisher information theory, we calculate the lower bound on the precision of lifetime estimations for mono-exponential and bi-exponential distributions. We analyse the dependence of the lifetime estimation precision on experimentally relevant parameters, including the contribution of a non-uniform background noise and the instrument response funtion (IRF) of the setup. We also provide an open-source code to determine the lower bound on the estimation precision for any experimental conditions. Two practical examples illustrate how this tool can be used to reach optimal precision in time-resolved fluorescence microscopy.


I Introduction
Nowadays, the precise determination of the lifetime of fluorescent emitters has become essential for a wide range of applications. Indeed, enhancing the spontaneous rate of single emitters is a topical challenge in nanophotonics [1,2], leading to the measurement of strongly reduced fluorescence lifetimes [3][4][5][6][7][8]. In biology, the contrast induced by lifetime variations is used to map different parameters on biological samples [9][10][11] such as the viscosity, the potential of hydrogen (pH) or the interaction between two emitters due to Förster resonance energy transfer (FRET). Whatever application one is interested in, the best precision that can be achieved depends on both the experimental conditions and the efficiency of the estimators. While the performances of lifetime estimators can numerically be studied using Monte Carlo experiments [12][13][14][15][16], finding the right experimental conditions requires a reliable benchmark to compare the performances of different experimental setups. Such a benchmark is set by Fisher information theory and the Cramér-Rao inequality, which gives the lower bound on the variance of unbiased estimators [17]. In other words, the Cramér-Rao inequality allows one to calculate the best precision that can be achieved in the estimation of one or several parameters, taking into account the various constraints induced by an experiment. In the recent years, this gauge became a standard used to assess the limit of localisation precision in the context of single-molecule microscopy [18][19][20][21]. Other recent applications of this formalism include the investigation of the dynamics of single molecules on the millisecond time scale [22] and the comparison of different imaging modalities in fluorescence diffuse optical tomography [23,24]. Among the different techniques that can be implemented for lifetime measurements [11,25,26], TCSPC is commonly used to exploit low level light signals with picosecond resolution [27]. Using the Cramér-Rao inequality, a relation between the estimation precision and the number of collected photons was obtained in 1992 for a simplified TCSPC model [28]. Here, we perform an extensive Cramér-Rao analysis to unravel the dependence of the lifetime estimation precision on experimentally relevant parameters, in the general case of a biexponential distribution along with any non-uniform background signal and considering the finite IRF of the setup. In addition, we provide an open-source code which computes the Cramér-Rao bound for any set of experimental parameters. Within this framework, we thus provide a versatile tool which can be used for different purposes such as determining the shortest lifetime that can be probed with a TCSPC setup or achieving optimal contrast for FLIM-based applications.

II.1 Excited-state lifetime distribution
Let us consider that the measured decay histogram follows a bi-exponential distribution with an additional contribution due to background noise. Usually, this noise originates from the dark count of the detector, unfiltered excitation laser or luminescence of the substrate. Assuming that the noise follows a known probability density function (PDF) noted q b (t), the total signal can be modelled by using a set of 5 unknown parameters noted θ, namely, the decay rate of each component (Γ 1 and Γ 2 ), the average number of detections for each component (N 1 and N 2 ), and the average number of detections due to background noise (N b ). Furthermore, we consider that the excitation laser has a finite pulse duration, and that the jitter of the detection system induces a loss of precision over the photon detection time. These two effects can be accounted for by measuring the IRF of the system, which is described by a PDF noted q irf (t). Then, if the detected photon rate does not exceed the maximum counting speed of the detector, the expected number of events f i detected in the i-th bin of the decay histogram (and associated with delays in-between t i and t i+1 ) reads where T is the repetition period of the excitation laser. In this equation, only the first term of the sums (corresponding to l = 0) is significant if the fluorescence lifetimes associated with both exponential decays are shorter than the repetition period. For a given integration time, and assuming that the detections are independent, we can then model the distribution of detected events (including fluorescence photons and background noise) for each point of the decay histogram by a Poisson distribution of expectation f i . The PDF associated with the observation of X i events on a given point of the decay histogram is therefore expressed by

II.2 Cramér-Rao lower bound
In estimation theory, a well-known result is that the variance of any estimatorθ must satisfy the Cramér-Rao inequality [17], which reads where Var is the variance operator and I is the Fisher information matrix defined by where E is the expectation operator. The Fisher information matrix can be interpreted as a measure of the amount of information about the parameters θ contained in a given data set X: the more information about the parameters, the lower the bound on the variance of their estimators. In the case of TCSPC measurements, we can assume that the n data points of the decay histogram are independent. Moreover, since p i (X i ; θ) is a Poisson distribution of expectation f i , the variance of this distribution is also equal to f i . From Eq. (4), we obtain Mathematically, the magnitude of the off-diagonal elements determines the extent to which the Cramér-Rao lower bound on a given parameter is affected by the estimation of the other parameters. While these off-diagonal elements generally vanish in the context of single-molecule localisation [20,21], the cross-terms of the information matrix must be considered here. Indeed, the precision of lifetime estimations can be strongly influenced by a lack of information about other parameters. This is notably the case for bi-exponential decay histograms characterised by two decay rates of the same order of magnitude (Γ 1 ∼ Γ 2 ).
Dimensionless quantities for the parameters involved in the calculations can be obtained by performing the change of variable u = Γ 1 t. Hence, we define the normalised repetition period r = Γ 1 T , the normalised expected number of counts due to background noise β = N b /(rN 1 ), the ratio of the decay rates γ = Γ 2 /Γ 1 and the ratio of the expected number of detections η = N 2 /N 1 . Table 1 summarises the parameters used in the model. With these parameters, Eq. (3) reads where σ Γ 1 is the standard error on the decay rate estimates and F can be calculated by numerical inversion of the information matrix, the elements of which are reported in Supplementary Section 1. It must be noted that the Cramér-Rao bounds are the same for the relative standard error on the decay rate and lifetime estimators (see Supplementary Section 2). Equation (6) explicitly give the fundamental limit on the precision of decay rate (and lifetime) estimations. In these expressions, F is calculated by inverting the information matrix and describes the influence of the different parameters involved in the model on Parameters Dimensionless parameters First fluorescence decay N 1 and Γ 1 Second fluorescence decay the value of the Cramér-Rao bound. A low value of F indicates an experimental setup with a high sensitivity: the F-value is always greater than unity and equals unity when the shot noise limit is reached. Optimisation of a TCSPC setup is therefore achieved when the F-value reaches unity, indicating that the precision of lifetime estimation is limited by the number of detected fluorescence photons. For these reasons, the F-value is used as a figure of merit to quantify the performance of a lifetime imaging technique [29,30]. In the following sections, we will perform a parametric study of the F-value. To do so, we will consider as a reference situation the ideal case for which k = 500, r = 100, β = 0 andq irf is a Dirac delta function. With these parameters, the F-value is approximately of unity. Each parameter will then be individually varied, in order to highlight the influence of each parameter upon the F-value.

II.3 Mono-exponential case
Let us consider a signal following a mono-exponential distribution (η = 0) with a uniform background noise. The set of unknown parameters is θ = (N, Γ). Figure 1 shows the dependence of the F-value upon the number of counts due to background noise (Fig. 1a), the number of data points per lifetime (Fig. 1b), the number of fluorescence lifetimes per repetition period (Fig. 1c) and the standard deviation of the IRF (Fig. 1d) which is assumed to follow a inverse Gaussian distribution. All these parameters fully characterise the experimental conditions, and must be optimised in order to perform shot-noise limited estimations. An analytical expression of F was obtained by Köllner and Wolfrum [28], for the special case in which the IRF is modelled by a Dirac delta function and the number of counts due to background noise is β = 0: As expected, the results obtained from Eq. (7) and those obtained by numerically inverting the information matrix are the same under these conditions, as shown in Fig. 1b and Fig. 1c.
The results shown in Fig. 1 can be straightforwardly applied to identify the parameters limiting the precision of an experimental setup. For instance, if N = 400 fluorescence photons are detected from an emitter, it follows from Eq. (6) that a relative error on Γ of 6% can be reached for F ≤ 1.2. Considering each of the four situations depicted in Fig. 1 individually, such precision is achieved for a number of counts due to the background noise β ≤ 10 −2 (Fig. 1a), for a number of data points per lifetime k ≥ 0.5 (Fig. 1b), for a number of fluorescence lifetimes per repetition period r ≥ 4 (Fig. 1c), and for a standard deviation of the IRF Γσ irf ≤ 0.7 (Fig. 1d). When these four conditions on the parameters are simultaneously verified, one is ensured to achieve a relative error smaller than 9.7%.

II.4 Bi-exponential case
Let us now consider a signal following a bi-exponential distribution. If N 2 and Γ 2 can be precisely estimated from independent measurements, the set of unknown parameters is θ = (N 1 , Γ 1 ). Such situation can occur for instance if the second component originates from the luminescence of the substrate, because N 2 and Γ 2 can be estimated by performing a reference measurement without the emitter. In contrast, if the second component originates from the emitter itself, N 2 and Γ 2 are generally estimated from the data, and the set of unknown parameters to be considered becomes θ = ( In both cases, the precision of lifetime estimations depends on η = N 2 /N 1 and γ = Γ 2 /Γ 1 . Figure 2 shows the dependence of the F-value upon these parameters for both situations. When the parameters of the second component can be independently estimated (Fig. 2a), the estimation of Γ 1 is not perturbed by the presence of the second component if N 2 < N 1 or Γ 2 > Γ 1 , i.e. the lifetime of the second component is shorter than the lifetime to be estimated. If this latter condition is satisfied, Γ 1 can indeed be correctly estimated from the last points of the decay histogram, for which the contribution of the second component has vanished (see Supplementary Section 3). These arguments also hold when the parameters of the second component must be estimated from the data (Fig. 2b) but, in this case, Γ 1 and Γ 2 must be significantly different in order to enable a correct estimation. As an example, we can investigate the number of fluorescence photons required to obtain a relative error of 6% if the number of photons detected from each component is the same (N 2 /N 1 = 1) and for the two situations represented in Fig. 2c. To begin with, we consider that the parameters of the second component are independently estimated. While N 1 = 381 photons are required if Γ 2 = 2 Γ 1 , N 1 = 852 photons are needed if Γ 2 = 0.5 Γ 1 . This example illustrates that the situation Γ 2 > Γ 1 is favourable in order to precisely estimate the decay rate Γ 1 . As expected, the number of photons required to properly estimate Γ 1 drastically increases when the parameters of the second component must be estimated from the data. Indeed, N 1 = 5, 237 photons are required if Γ 2 = 2 Γ 1 and N 1 = 11, 450 photons are required if Γ 2 = 0.5 Γ 1 .

II.5 Numerical tests
In this section, we demonstrate the versatility of the approach by analysing two examples, one for the estimation of lifetimes on the order of several picoseconds, the other for the estimation of lifetimes on the order of the nanosecond. In both cases, we consider an exponential decay with known background noise, so that the parameters to be estimated are θ = (N, Γ). Each example takes into account an IRF that was experimentally measured on a TCSPC setup. In these two examples, we compare the Cramér-Rao bound on lifetime estimators to numerical results obtained from Monte Carlo experiments. To this end, we numerically generate a set of 10,000 decay histograms for each experimental condition that will be investigated. This is performed by using Eq. (1) to calculate the cumulative distribution function, from which decay histograms can be randomly generated based on the inversion principle [31]. The lifetime τ = 1/Γ is then estimated from each histogram by using a maximum-likelihood (ML) method. The estimation bias is not significant for these numerical experiments (see Supplementary Section 4). Moreover, the variance of ML estimators asymptotically approaches the Cramér-Rao bound for large sample statistics, which allows the root-mean square (RMS) deviation of the estimated lifetimes to be close to the fundamental limit for the experimental conditions that are investigated. Note that the implementation of the second example is provided in the open source code.

Precision of picosecond lifetime estimations
We consider an experimental setup designed to estimate picosecond lifetimes, with a board resolution of 1 ps, with a repetition rate of 80 MHz (T = 12.5 ns) and with an IRF characterised by a full with at half maximum (FWHM) of 38 ps. The expected number of fluorescence photons is set to N = 1, 000. Moreover, the background signal is assumed to be uniform, and the expected number of detections due to this background is set to N b = 500. We calculated the lower bound on the relative standard error σ τ /τ as a function of the lifetime for these experimental conditions (Fig. 3a, blue curve) as well as in the case the IRF is a Dirac delta function (Fig. 3a, green  curve). For τ ∼ 1 ns, the lower bound on σ τ /τ is slightly larger than the shot-noise limited bound due to the background noise. For an ideal IRF, the lower bound on σ τ /τ slightly decreases when τ decreases since the number of counts per lifetime due to background noise (noted β) decreases. In contrast, when the actual IRF is considered, the lower bound on σ τ /τ strongly increases for lifetime shorter than half the FWHM, which clearly highlights the relevance of taking into account the IRF for picosecond lifetime estimations. Note that, in this regime, the ML estimator becomes less efficient and slightly deviates from the Cramér-Rao bound.  Optimal contrast in fluorescence lifetime microscopy We consider a typical TC-SPC setup designed for nanosecond lifetime estimation, with a board resolution of 16 ps, a repetition rate of 80 MHz and an IRF characterised by a FWHM of 240 ps. The expected number of fluorescence photons is set to N = 1, 000. Moreover, we assume that a luminescence background signal is also detected, with the same intensity as the fluorescence signal of interest and a lifetime of 1 ns. In order to choose fluorescent emitters that will lead to the most contrasted FLIM images, we calculated the lower bound on σ τ /τ as a function of the fluorescence lifetime of the emitter under these experimental conditions (Fig. 3b, blue curve) as well as in the case of uniform background noise (Fig. 3b, green curve). While the precision of lifetime estimations is optimal for τ ∼ 400 ps for the uniform background, the precision of lifetime estimations is optimal for τ ∼ 2 ns for the exponential background. Indeed, the precision of lifetime estimations depends on γ (see Fig. 2a) and the time dependence of the background noise allows estimations that are more precise for γ > 1, that is, τ > 1 ns. In both situations, for lifetimes larger than 5 ns, the lower bound on σ τ /τ strongly increases as the number of fluorescence lifetimes per repetition period (noted r) decreases. In this regime, using a laser with a lower repetition rate must be considered in order to improve the precision of lifetime estimations. Moreover, for lifetimes smaller than 100 ps, the lower bound on σ τ /τ also strongly increases as the fluorescence lifetime becomes comparable to the IRF. In this regime, the precision of the estimations could be improved by using a detection system with a smaller jitter and an excitation laser with a shorter pulse width.

III Conclusion
In summary, we calculated the lower bound on the standard error on lifetime estimates depending on key experimental parameters. These results can be used as a benchmark for the evaluation of the precision of lifetime estimations. Moreover, they reveal the influence of different parameters upon the estimation precision, providing us with a powerful tool for the optimisation of a TCSPC setup as illustrated by two examples. We notably showed that a significant enhancement of the precision of lifetime estimations can be achieved by choosing the proper fluorescent emitter depending on the expected background noise, the IRF of the setup and the repetition rate of the excitation laser. Various dependent parameters such as the integration time, the power of the excitation laser or the measured spectral range can easily be studied by using the proposed formalism, which can also be extended to the study of time-gated photon counting techniques. We expect these results to be of great interest for current experimental challenges such as the reduction of the acquisition time in FLIM-based techniques, the characterisation of photonic antennas with high Purcell factors as well as fluorescence lifetime measurements at the single molecule level.

Supporting Information
This section includes details concerning: (1) Calculation of the information matrix. (2) Cramér-Rao bound on the lifetime estimator. (3) Partial and cumulated information. (4) Bias of maximum-likelihood estimations. The open-source code for the calculation of the Cramér-Rao bound will soon be available on the internet.
1 Calculation of the information matrix

General expression
Using the dimensionless parameters defined in the manuscript, the expectation of each data item reads (S1) In order to calculate the information matrix, let us define J I , J II , K I , K II and J B as follows: With these notations, Eq. (S1) reads Differentiating this expression by each parameters yields Let us recall the expression of the information matrix: are the parameters to be estimated. The elements of the information matrix are therefore expressed by Since Γ 1 is our reference, the Cramér-Rao inequality will be most conveniently expressed in terms of this parameter. By inverting the information matrix, we indeed obtain the following expression: where σ Γ 1 is the standard error on the decay rate estimates and F can be calculated by numerical inversion of the information matrix.

Limiting cases
Ideal IRF Whenever the IRF can be considered as a Dirac delta function, the coefficients defined by Eq. (S2) become +∞ l=0 e −lr , +∞ l=0 e −γlr , J B i remains unchanged, as it does not depend on the IRF. These expressions can be further simplified, by using the following properties of geometric series: We obtain (S10) Uniform background noise Whenever the background noise is uniform over the repetition period, the coefficient J B i simply becomes 2 Cramér-Rao bound on the lifetime estimator From Eq. (S7), it is straightforward to calculate the Cramér-Rao bound on the lifetime estimator. To do so, we define the excited-state lifetime τ 1 = 1/Γ 1 and we perform a transformation of parameter, as detailed in Appendix 3B of Ref. [17]. This reads whereτ 1 is the lifetime estimator. This expression simplifies to where σ τ 1 is the standard error on the lifetime estimates. This demonstrates that the Cramér-Rao lower bounds are the same for the relative standard error on the decay rate and lifetime estimators.

Partial and cumulated information
In general, elements of the information matrix are calculated as a sum over the n points of the decay histogram. From Eq. (S5), it can therefore be expressed in a formal way as follows: We define the partial information matrix I p as the information matrix calculated from the sum over p points of the decay histogram, where p is the number of parameters to be estimated. This reads We also define the cumulated information matrix I c as the information matrix calculated from the sum over the m first points of the decay histogram, with m ≤ n. This reads From these definitions, the partial information and the cumulated information can be used to compare the information carried by different sets of points of the data histogram about the decay rate Γ 1 to be estimated. Figure S1 (a) shows the F-value calculated from the partial information matrix as a function of j/k, considering a bi-exponential decay histogram (η = 1) with a large number of data point per lifetime (k = 500), a large number of lifetime per repetition period (r = 100), no background noise (β = 0) and an ideal IRF. The first points of the decay histogram carry more information for γ = 0.2 than for γ = 5. Indeed, in this latter case, the signal due to the second decay is concentrated on the first points of the decay histogram, making the estimation of Γ 1 more difficult. However, this signal vanishes faster and, indeed, we can see that the information contained in the last points of the histogram is larger for γ = 5 than for γ = 0.2. Figure S1 (b) shows the F-value calculated from the cumulated information matrix as a function of m/k. For m/k ∼ 2.3, the F-value becomes smaller for γ = 5 than for γ = 0.2, resulting in a better precision for γ = 5 than for γ = 0.2 whenever the whole histogram is considered.

Bias of maximum-likelihood estimations
The bias B τ of an estimatorτ describes whether the estimator can, on average, recover the true value of the parameter τ . It is defined by (S17) Figure S2 shows the relative bias B τ /τ of the maximum-likelihood estimator used in the two examples analysed in the manuscript. In both cases, we can see that the relative bias increases when τ decreases, which can be attributed to the influence of the IRF on the estimation process. However, the relative bias is always one order of magnitude smaller than the relative standard error (Fig. 3 of the manuscript). This justifies the relevance of using the Cramér-Rao bound as a benchmark for the estimation precision (the Cramér-Rao bound applies only to unbiased estimators).