Histogram clustering for rapid time-domain fluorescence lifetime image analysis

We propose a histogram clustering (HC) method to accelerate fluorescence lifetime imaging (FLIM) analysis in pixel-wise and global fitting modes. The proposed method’s principle was demonstrated, and the combinations of HC with traditional FLIM analysis were explained. We assessed HC methods with both simulated and experimental datasets. The results reveal that HC not only increases analysis speed (up to 106 times) but also enhances lifetime estimation accuracy. Fast lifetime analysis strategies were suggested with execution times around or below 30 μs per histograms on MATLAB R2016a, 64-bit with the Intel Celeron CPU (2950M @ 2GHz).

A fluorescence decay is usually modeled as a sum of exponential decay functions: where A is the amplitude, q p and τ p are the fraction and lifetime of the p th component, p = 1, . . . , P.
In vector forms, q = [q 1 , . . . , q P ] T and τ = [τ 1 , . . . , τ P ] T . In reality, the measured signal is a convolution of f (t) and the instrument response function (IRF) irf (t), where ϵ(t) is noise. FLIM analysis is equivalent to solving the inverse problem from Eq. (2) with the measured h(t) to obtain q and τ. FLIM experiments can be conducted either in time-or frequency-domain manners [8]. In time-domain approaches, samples are illuminated with ultrashort laser pulses. h(t) is measured using a time-correlated single-photon counting (TCSPC) system [16,17] with photomultiplier tubes, delay line anode detectors [18] or single-photon avalanche diodes [19] in scanning or widefield modes. h(t) can also be measured with time-gated cameras [20,21] and streak cameras [22,23]. There are also frequency-domain approaches [24,25], but we will focus on time-domain approaches in this report.
A fluorescence decay histogram measured by a TCSPC system can be: where M is the number of time-bins, and ∆t is the bin width (the TCSPC's temporal resolution). We can express Eq.

(3) in a vector form with
With h and irf already measured, A, q and τ can be extracted with a lifetime determination algorithm by solving a nonlinear minimization problem arg min ∥h −ĥ∥ 2 , whereĥ is the estimated histogram. The iterative convolution (IC) is commonly used with the least-squared method (LSM) [26,27] for solving the inverse problem, denoted as IC-LSM. Still, IC-LSM suffers from low photon efficiency and slow analysis. Several deconvolution approaches have been developed to enhance the analysis, such as the Laguerre expansion [28][29][30], the non-fitting and the global fitting [31,32] methods. The Laguerre expansion methods speed up deconvolution procedures by converting the nonlinear-fitting problem to a linear-fitting problem estimating a Laguerre basis set's expansion coefficients. The non-fitting methods, including the centre-of-mass method (CMM) [33][34][35], the integral extraction method (IEM) [36,37], the phasor method [38,39], or the rapid lifetime determination method [40,41], can provide rapid average lifetime analysis [42]. The global fitting methods can accelerate analysis by changing the estimation mode from the pixel-wise mode to a global fitting mode and using spatial lifetime invariances of fluorescent species in an image to reduce the degree of freedom significantly. There are two strategies, IC [31] and the variable projection (VP) method [32], for implementing global fitting.
However, the Laguerre expansion, the non-fitting and the global fitting methods are not fast enough for growing demands for real-time FLIM. This work presents a histogram clustering (HC) method for improving FLIM analysis in analysis speed and accuracy. Section 2 (Methods) summarizes the workflows for decay parameter image reconstructions with and without HC. We will then introduce and demonstrate the HC method. Besides the algorithms used in this work, HC can also accelerate other algorithms, such as the maximum likelihood method [43], Bayesian methods [44], and deep-learning methods [45]. In Section 3 (Results and Analysis), synthetic and experimental TCSPC datasets will be used to evaluate the HC method's performances. Suggestions of the fastest algorithms for different outcomes will be given.   Instead of estimating decay parameters individually for each pixel, GF-P assumes lifetimes τ are constants, and A and q vary across the image. t GF−P exe = t GF A . t GF A is the adopted algorithm's execution time for GF. Figure 2 shows the workflows where the HC method is embedded. Figure 2(a) shows the Cluster-Wise (CW) mode, which combines PW and HC; likewise, Fig. 2(b) shows the Global Fitting mode for all Clusters (GF-C), which combines GF-P and HC. In CW, N vp histograms are first sorted by HC, whose execution time is t HC , into N c classes with N c cluster-histogramsh (s) , s = 1, . . . , N c .h (s) is used to estimate decay parameters for Cluster s.

Modes for decay parameter image reconstructions
Then, the decay parameters are assigned to the corresponding cluster's pixels with a parameter assignment function, whose execution time is t PA . Therefore, t CW exe = t HC + N c × t PW A + t PA . In GF-C, N vp histograms are processed with HC first, and the output N c histograms are sent into an algorithm for GF. Decay parameters for all clusters are obtained and assigned to the pixels in corresponding clusters with the parameter assignment function. Therefore, t GF−C exe = t HC + t GF A + t PA .
The algorithms used in this work are reviewed in Supplement 1.

Histogram clustering
In reality, there are always many pixels within the field of view showing similar histogram profiles, and it's unnecessary to analyze them individually because it would be time-consuming. The idea of HC is to sort histograms with similar profiles and to divide them into N c clusters. If histograms have similar decay profiles, they are supposed to show similar decay parameters. Therefore, we can average similar decay profiles in a cluster into one profile to estimate decay parameters and then assign them to all pixels. With this arrangement, we only need to process N c instead of N vp histograms. HC significantly speedups FLIM analysis. For simplicity, we only discuss bi-exponential decays widely used in practice. Figure 3(a) shows an IRF and normalized signal profiles h(t) following a bi-exponential decay model, and Fig. 3(b) shows corresponding cumulative signals, , which is not sensitive to Poisson noise due to the integration. Signal decay parameters are also labelled in Fig. 3(a). If we choose an intensity bound, I bound , then each signal has a corresponding time delay, t Ib , to reach I bound , as shown in Fig. 3 It is straightforward for mono-exponential decays, f (t) = A exp (−t/τ), that t Ib has an approximately linear relationship with τ. Figure 4 shows t Ib curves with different IRFs which introduce time-shifts (assuming that IRFs for all histograms are the same in a scanning system). If a multichannel sensor is used, IRF alignments are required before using HC. However, it is less straightforward for bi-exponential decays. Thus, we used numerical methods to conduct analysis, as shown in Fig. 5, in which three cases were simulated to explain how the proposed concept works. Case A has q 1 = 0.5, 0.1 ≤ τ 1 ≤ 3 ns, and τ 2 = 3 ns as shown in Fig.  5(a); Case B has 0 ≤ q 1 ≤ 1, τ 1 = 0.5 ns, and τ 2 = 3 ns as shown in Fig. 5(b); Case C has q 1 = 0.5, τ 1 = 0.5 ns, and 0.5 ≤ τ 2 ≤ 10 ns as shown in Fig. 5(c). The IRF follows a Gaussian distribution with an FWHM of 0.5 ns. For Case A, t Ib is not monotonic with τ 1 , and the monotonic range and the slope are functions of I bound . As I bound increases, the slope increases with a smaller monotonic range. The profiles with τ 1 outside the range are wrongly sorted into a cluster with a larger τ 1 . The monotonic ranges for I bound = 0.2 and 0.6 are 0.4 ∼ 3 ns and 1 ∼ 3 ns, respectively. For Cases B and C, t Ib is monotonic (decreasing and increasing) with q 1 and τ 2 for all I bound , respectively. For the signals like Cases A ∼ C (which only have one variable), we can cluster the signals by t Ib with a proper I bound considering the monotonic range. For example, for Case A, if the shortest lifetime is around 0.5 ns, I bound = 0.2 is a proper choice; for Cases B and C, I bound can be set arbitrarily in 0.1 ∼ 1. We use I bound = 0.2 hereafter.
However, it is not realistic that the signals in a dataset have one variable and two constant decay parameters. For instance, in FRET-FLIM applications, donors without FRET have a constant lifetime and donors interacting with acceptors have shorter lifetimes due to FRET. Therefore, the short and long lifetimes, τ 1 and τ 2 , are donors' lifetimes with and without FRET, and q 1 is the portion of the donors undergoing FRET among all donors. q 1 and τ 1 are variables depending on FRET efficiency.
For FRET-FLIM datasets, such as Case D: q 1 = 0 ∼ 1, τ 1 = 0.5 ∼ 3 ns, and τ 2 = 3 ns, with two variables, it is not enough to divide the histograms only depending on t Ib , as histograms with different profiles would have the same t Ib and be wrongly divided into one cluster. Figure 6(a) shows the resulting clusters ( N 1 = 15) in different colors for Case D with M = 256 depending on t Ib . Figures 6(b) and (c) show the cumulative signals in Clusters 14 and 7 in red, respectively, and the averaged cumulative histogramsH for the clusters (green dash lines). When q 1 ∼ 0 (or q 1 ∼ 1) or τ 1 ∼ τ 2 (such as Clusters 14 and 15), the signals are nearly mono-exponential and have similar profiles, as shown in Fig. 6(b). However, the signals for other clusters (for example, Cluster 7) have the same t Ib , but the profiles after t Ib diverge. At t = t bound in Fig. 6(c), the cumulative intensity I tb in Cluster 7 is within [0.7, 0.9]. Therefore, we can further divide each cluster into N 2 sub-clusters depending on I tb .
Setting a larger N 2 can result in a higher clustering precision, which means histogram profiles in one cluster are more similar. Another way to increase clustering precision is interpolating h with M time-bins to h interp with M · N interp time-bins. N interp (≥ 1) is an interpolation factor, and h interp can be expressed as   The HC workflow is summarized in Fig. 8(a). There are three steps: 1) depending on t Ib , N vp histograms are divided into N 1 clusters, which can be adjusted by setting N interp ; 2) depending on I tb , histograms in each of the N 1 clusters are further divided into N 2 sub-clusters, and N c = N 1 ×N 2 clusters are finally produced; 3) N c histograms are generated by obtaining the averaged histogram in each cluster. To assess HC in terms of N interp and N 2 , we can define: whereh (s) m is the cluster histogram produced with HC for the cluster including Pixel s. Figure  8(b) shows the boxplot of χ 2(s) for Case D for various combinations of N interp and N 2 . Poisson noise is included in each signal with a total intensity of I = 10 4 photon counts. The higher N interp and N 2 are, the lower χ 2(s) becomes, meaning higher accuracy. We set N interp = 2 and N 2 = 4 for HC used in CW and GF-C modes in this work.

Results and analysis
Synthetic and experimental TCSPC datasets were used to assess the performances of HC. Table 1 summarizes different output types of algorithms: (1) the fitting method LE-LSM in PW and CW modes can produce q, τ, τ A , and τ I images; (2) the non-fitting methods LE-IEM and CMM in PW and CW modes can produce τ A and τ I images; (3) the global fitting methods IC and VP in GF-P and GF-C modes can produce q, τ A , and τ I images and constant τ. τ A and τ I are two types of average lifetimes, amplitude-and intensity-weighted average lifetimes, Using IEM to calculate τ A requires conducting the deconvolution first. The Laguerre expansion method is employed for deconvolution when IEM is used; therefore, we denote the whole process as LE-IEM.

Simulated data
The synthetic TCSPC dataset has an image size of 150 × 150 pixels and M = 256. The simulated signals are bi-exponential (P = 2). Figure 9(a) shows the log 10 (I i ) image consisting of three regions with integrated intensities of I 1 = 500, I 2 =1000, and I 3 = 10000, respectively. Possion noise is included in the dataset.  relative standard deviations of 10%; the τ 2 image has a mean value of 3 ns and a relative standard deviation of 10%. The execution times (t exe ) and the mean squared errors (MSE) of the results evaluated by different algorithms without and with HC are summarized in Table 2. MSE is defined as: whereâ represents the estimated a (a = q 1 , τ 1 , τ 2 , τ A , τ I ). The results for the fitting and non-fitting methods in PW and CW modes and the global fitting methods in GF-P and GF-C modes are illustrated and analyzed in the following sections.  Figure 10 showsq 1 ,τ 1 ,τ 2 ,τ A , andτ I images estimated with LE-LSM in (a1) -(a5) PW (without HC) and (b1) -(b5) CW (with HC) modes, respectively. The pixel brightness of each image represents the intensity. The parameter histograms (q 1 ,τ 1 ,τ 2 ,τ A , andτ I ) of different intensity regions (blue, red, and magenta lines for I 1 , I 2 , and I 3 respectively) and the true histogram (black dash line) are attached to each image. HC improves the estimated images, especially for Regions I 1 and I 2 , as the histograms are closer to the truth than those estimated without HC. MSEs are reduced from 0.019 to 0.011 for q 1 , from 0.173 ns 2 to 0.102 ns 2 for τ 1 , from 0.198 ns 2 to 0.104 ns 2 for τ 2 , from 0.110 ns 2 to 0.037 ns 2 for τ A , and from 0.027 ns 2 to 0.025 ns 2 for τ I . t exe is significantly reduced from 389.45 s to 3.36 s. Fig. 10.q 1 ,τ 1 ,τ 2 ,τ A , andτ I images with LE-LSM in (a1) -(a5) PW (without HC) and (b1) -(b5) CW (with HC) modes. Histograms of different intensity regions (blue, red, and magenta for I 1 , I 2 , and I 3 , respectively) and the true histogram (black dash line).  CMM is for estimatingτ I with the shortest t exe either in PW or CW, around 0.20 s, but it has a bias, as shown in Figs. 11(b) and 11(d). MSE(τ I ) is around 0.180 ns 2 . There is a way to correct the bias, as described in [46]. Figure 12 showsq 1 ,τ A , andτ I images estimated with (a1) -(a3) IC and (a4) -(a6) VP in GF-P (without HC) mode and with (b1) -(b3) IC and (b4) -(b6) VP in GF-C (with HC) mode. The estimated constants (τ 1 ,τ 2 ) are labelled in correspondingq 1 images. The estimations of q 1 in Region I 1 with IC in GF-P are mostly inaccurate, as shown in Fig. 12(a1) and (τ 1 ,τ 2 ) = (0.59, 2.74) ns. As a result,τ A andτ I are also not correct in Region I 1 , as shown in Figs. 12(a2) and (a3). In GF-C, IC performs better with a successfully estimated Region I 1 , a significantly reduced t exe from 724.82 s to 11.85 s, and a reduced MSE(q 1 ) from 0.100 to 0.017, a reduced MSE(τ A ) from 0.678 to 0.102, and a reduced MSE(τ I ) from 1.098 to 0.050, as shown in Figs. 12(b1) -(b3). Although VP has some invalid estimations withq 1 < 0 (pixels in white) when q 1 is small, as shown in Figs. 12(a4) and (b4), itsτ A andτ I images are accurately evaluated without invalid pixels, as shown in Figs. 12(a5), (a6), (b5), and (b6). Thus, VP in GF-C is a promising choice for fast average lifetime estimations for its short execution time (t exe = 0.31 s).

Global fitting methods IC and VP in GF-P and GF-C modes
In conclusion, HC not only accelerates analysis but also enhances accuracy (MSE). HC sorts histograms with similar profiles into a cluster and takes the average of histograms for lifetime determination, equivalent to increasing the number of photon counts and reducing noise. Therefore, the decay parameters estimated with the average cluster histogram by lifetime determination algorithms have higher accuracy than those of individual histograms. Although the decay parameters of the histograms in one cluster have a deviation from those estimated with the average cluster histogram, the results indicate that the error introduced by HC is smaller than that introduced by processing original histograms with a relatively lower photon count.

Experimental data
Mouse raw macrophage cells were routinely cultured in DMEM (Dulbecco's Modified Eagle Medium) supplemented with 10% FCS (Fetal Calf Serum) under 5% CO 2 at 37 o C. Cells were seeded on glass cover slips in 24-well plates and cultured overnight for bacterial infection. Bacteria engineered to express GFP (Green Fluorescent Protein) were harvested from an early exponential phase and added to the cells with an MOI (Multiplicity of Infection) = 100. Cells were washed with PBS (Phosphate-Buffered Saline) and stained for actin with phalloidin Alexa Flour 546 (Thermo Fisher Scientific). The scanning FLIM used in this work is LSM510 (Carl Zeiss), equipped with a TCSPC module (SPC-830, Becker & Hickl GmbH). The sample was excited by a tunable femtosecond Ti: Sapphire laser (Chameleon, Coherent) at 850 nm as a two-photon excitation source. The repetition rate is 80 MHz, and the pulse width is less than 200 fs. The emitted photons were collected through a 63× water-immersion objective lens (N.A. = 1.0) and a 500 ∼ 550 nm bandpass filter and transferred into a photomultiplier tube. Figure 13 shows the intensity image and t exe for all algorithms without or with HC for a TCSPC dataset. The estimations of three output types are shown as follows.   Histograms of (g)q 1 , (h)τ 1 , and (i)τ 2 in PW (blue) and CW (red) modes.
3.2.2. Type 2:q 1 image and constants (τ 1 ,τ 2 ) with IC and VP in GF-P and GF-C modes  Figure  15(e) shows histograms ofq 1 with IC (dash blue) and VP (dash red) in GF-P and IC (solid blue) and VP (solid red) in GF-C. The constants (τ 1 ,τ 2 ) of each approach are attached inq 1 images. Figure 16 showsτ A images (a) -(d) without and (e) -(h) with HC. Figure 16(i) shows the histograms ofτ A with LE-LSM and LE-IEM in PW and CW. Figure 16(j) shows the histograms ofτ A with IC and VP in GF-P and GF-C. Figure 17 showsτ I images (a) -(d) without and (e) - (h) with HC. Figure 17(i) shows the histograms ofτ I with LE-LSM and CMM in PW and CW. Figure 17(j) shows the histograms ofτ I with IC and VP in GF-P and GF-C. Like the conclusions drawn from simulations, LE-LSM in CW is the fastest for Type 1 with t exe = 5.87 s; VP in GF-C is the fastest for Type 2 with t exe = 0.41 s. For average lifetime images, VP in GF-C is the fastest for both τ A and τ I with t exe = 0.41 s, LE-IEM in CW is the second one for τ A with t exe = 0.94 s; meanwhile, CMM in CW is the fastest for τ I with t exe = 0.20 s.

Conclusion
We developed a histogram clustering (HC) method to accelerate FLIM analysis. HC can improve both the speed and the accuracy for FLIM analysis by sorting histograms with similar profiles in a dataset into several clusters and significantly reducing the number of histograms to be analyzed. The HC method implements clustering with two features of a histogram. Several commonly used lifetime determination algorithms' performances for producing decay parameter images without and with HC were compared using synthetic and experimental datasets. For different output types, the fastest FLIM analysis methods are suggested: 1) LE-LSM with HC for all lifetime component images with an execution time (t exe ) of 5.87 s, 106-fold shorter than t exe without HC; 2) VP with HC for constant lifetimes, q 1 , τ A , and τ I images with t exe = 0.41 s, 32-fold shorter than t exe without HC; 3) LE-IEM with HC as the second choice for τ A with t exe = 0.94 s, 78-fold shorter than t exe without HC, and CMM as the second choice for τ I with t exe = 0.2 s without or with HC (biased if the largest lifetime > T/4). The analysis was conducted in Matlab, and it can be translated to C or other environments to speed up the analysis. We believe the proposed HC method can benefit applications demanding real-time FLIM such as clinical diagnosis and fast screening.