Automated suppression of sample-related artifacts in Fluorescence Correlation Spectroscopy

Fluorescence Correlation Spectroscopy (FCS) in cells often suffers from artifacts caused by bright aggregates or vesicles, depletion of fluorophores or bleaching of a fluorescent background. The common practice of manually discarding distorted curves is time consuming and subjective. Here we demonstrate the feasibility of automated FCS data analysis with efficient rejection of corrupted parts of the signal. As test systems we use a solution of fluorescent molecules, contaminated with bright fluorescent beads, as well as cells expressing a fluorescent protein (ICA512-EGFP), which partitions into bright secretory granules. This approach improves the accuracy of FCS measurements in biological samples, extends its applicability to especially challenging systems and greatly simplifies and accelerates the data analysis. © 2010 Optical Society of America OCIS codes: (180.0180) Microscopy; (170.0170) Medical optics and biotechnology; (170.1420) Biology; (170.1530) Cell analysis; (170.6280) Spectroscopy, fluorescence and luminescence; (170.1790) Confocal microscopy; (170.2520) Fluorescence microscopy. References and links 1. E. L. Elson and D. Magde, “Fluorescence correlation spectroscopy. I. Conceptual basis and theory,” Biopolymers 13 (1), 1–27 (1974). 2. R. Rigler and E. Elson, Fluorescence Correlation Spectroscopy: Theory and Applications, (Springer, 2001). 3. E. P. Petrov and P. Schwille, State of the art and novel trends in fluorescence correlation spectroscopy, in: Standardization in Fluorometry: State of the Art and Future Challenges, (Springer, Berlin Heidelberg New York, 2007). 4. K. Bacia and P. Schwille, “A dynamic view of cellular processes by in vivo fluorescence auto-and crosscorrelation spectroscopy,” Methods 29(1), 74–85 (2003). 5. T. Dertinger, V. Pacheco, I. von der Hocht, R. Hartmann, I. Gregor, and J. Enderlein, “Two-Focus Fluorescence Correlation Spectroscopy: A New Tool for Accurate and Absolute Diffusion Measurements,” ChemPhysChem 8(3), 433–443 (2007). 6. S. Kim, K. Heinze, and P. Schwille, “Fluorescence correlation spectroscopy in living cells,” Nat. Methods 4(11), 963–974 (2007). 7. K. Bacia, S. Kim, and P. Schwille, “Fluorescence cross-correlation spectroscopy in living cells,” Nat. Methods 3(2), 83–89 (2006). #126041 $15.00 USD Received 30 Mar 2010; revised 19 Apr 2010; accepted 21 Apr 2010; published 11 May 2010 (C) 2010 OSA 24 May 2010 / Vol. 18, No. 11 / OPTICS EXPRESS 11073 8. J. Ries and P. Schwille, “New Concepts for Fluorescence Correlation Spectroscopy on Membranes,” Phys. Chem. Chem. Phys. 10(24), 3487–3497 (2008). 9. S. R. Yu, M. Burkhardt, M. Nowak, J. Ries, Z. Petrásek, S. Scholpp, P. Schwille, and M. Brand, “Fgf8 morphogen gradient forms by a source-sink mechanism with freely diffusing molecules,” Nature 461(7263), 533–536 (2009). 10. D. Magatti and F. Ferri, “Fast multi-tau real-time software correlator for dynamic light scattering,” Appl. Opt. 40(24), 4011–4021 (2001). 11. A. Tcherniak, C. Reznik, S. Link, and C. F. Landes, “Fluorescence correlation spectroscopy: criteria for analysis in complex systems,” Anal. Chem. 81(2), 746–754 (2009). 12. M. Asfari, D. Janjic, P. Meda, G. Li, P. A. Halban, and C. B. Wollheim, “Establishment of 2-mercaptoethanoldependent differentiated insulin-secreting cell lines,” Endocrinology 130(1), 167–178 (1992). 13. M. Trajkovski, H. Mziaut, A. Altkruger, J. Ouwendijk, K. P. Knoch, S. Muller, and M. Solimena, “Nuclear translocation of an ICA512 cytosolic fragment couples granule exocytosis and insulin expression in beta-cells,” J. Cell. Biol. 167(6), 1063–1074 (2004). 14. C. C. Guet, L. Bruneaux, T. L. Min, D. Siegal-Gaskins, I. Figueroa, T. Emonet, and P. Cluzel, “Minimally invasive determination of mRNA concentration in single living bacteria,” Nucleic Acids Res. 36(12), e73 (2008). 15. G. Meacci, J. Ries, E. Fischer-Friedrich, N. Kahya, P. Schwille, and K. Kruse, “Mobility of Min-proteins in Escherichia coli measured by fluorescence correlation spectroscopy,” Phys. Biol. 3(4), 255–263 (2006).


Introduction
In Fluorescence Correlation Spectroscopy (FCS) the fluorescence fluctuations, caused by fluorophores diffusing through a small (≈ fL) detection volume, are analyzed in terms of their auto-correlation function. It is a powerful technique to measure local concentrations, translational and rotational diffusion coefficients, binding and reaction kinetics and photodynamics in vitro as well as in vivo [1][2][3][4]. The use of FCS has been promoted by the introduction of commercial FCS systems, so that it can now be considered as a well established technique. This holds true especially for measurements in bulk solution where FCS can reach remarkable accuracy [5]. On the other hand, measurements in cells [6,7], biological membranes [8] or whole organisms [9] often suffer from imperfections of the system. Aggregates of fluorescent molecules or their association to vesicles result in spikes in the fluorescence intensity. Photobleaching of the auto-fluorescent background or of immobilized molecules, depletion due to photobleaching or a change in the local environment due to sample movements may lead to slow changes of the fluorescence signal in time which superimpose the fluctuations from single molecule dynamics resulting in distorted correlation curves [Figs. 1(a)-1(c)]. This complicates the determination of concentrations, diffusion coefficients and interactions of single biomolecules, the usual application of FCS.
The most common way to reduce the impact of the above mentioned imperfections is [4] (1) to take several short measurements instead of one long measurement, (2) to manually discard distorted correlation curves and (3) to fit the average of the remaining curves with a model function describing one additional mobile species. This additional component in the correlation function, sometimes combined with an overall offset, approximates the distorted part of the experimental correlation curve at larger lag-times.
This approach is not optimal. Hand-selection of curves is often the most time-consuming step in FCS data analysis and it is often ambiguous [ Fig. 1(d)] with the danger of introducing a subjective bias. In addition, the distorted parts of the correlation curve can often not be described well by only one additional component. As a result, during the fit the component describing the single molecule dynamics may still contain features of the distortions. This can lead to a strong error and bias in the parameters of interest such as diffusion coefficients and concentrations of single molecules [Figs. 1(b) and 1(c)]. In addition, the introduced additional free fitting parameters can render the fit results ambiguous. Note that even in optimal in vivo systems, often a two-component diffusion or anomalous diffusion has to be assumed. Including a third component for distortions usually results in too many free parameters and indefinite fit results. Finally, large parts of the data are discarded leading to additional noise on the correlation curves.
Here we describe a simple approach for automated FCS data analysis. The idea is to divide one long measurement into many short measurements, to automatically discard parts of the data leading to distorted curves and to calculate undistorted correlation curves from the remaining part. This automation has several advantages. It avoids the subjective hand-selection and leads to more objective, simpler and faster data analysis. The use of shorter time intervals allows the extraction of a larger portion of the measured data and extends the use of FCS to very difficult systems, e.g. with many bright aggregates. Finally, it enables completely automated FCS measurements in non-ideal systems -a prerequisite for in vivo high-throughput FCS screens.

Model functions
In FCS, the fluorescence intensity I(t) is recorded with a high temporal resolution. From this signal, the auto-correlation curve G(τ), which measures its self similarity, can be calculated: Here denotes the time average, δ I(t) = I(t) − I(t) . τ is called the lag time. The calculation of the auto-correlation curve from the fluorescence intensity [Eq. (1)] can be performed efficiently on a quasi-logarithmic time scale with a 'multiple tau' correlation algorithm [10].
To obtain the parameters of interest, the auto-correlation curve is fitted with a mathematical function which takes into account the sources for the intensity fluctuations. A commonly used model for pure Brownian diffusion through a three-dimensional Gaussian detection volume is [3]: Here N = V eff C is the number of particles in the detection volume V eff = π 3/2 Sw 3 0 , w 0 is the 1/e 2 -radius of the laser focus and structure parameter S = w z /w 0 measures the aspect ratio of the Gaussian detection volume. τ D = w 2 0 /4D is the diffusion time and a measure for the diffusion coefficient D.
For two diffusing species with different diffusion times τ D1 and τ D2 , taking into account triplet/blinking kinetics and assuming the same brightness of the molecules the correlation function is: N is the total number of particles in the detection volume, F denotes the fraction of molecules with the diffusion coefficient D 1 , T is the fraction of the molecules in the dark state and τ t is connected to the lifetime of the dark state [3]. Often, this formula is also used if the molecular brightnesses of the two components are not equal (e.g. free fluorophores and bright vesicles). In this case N is not directly related to the concentrations any longer.

Automated analysis of FCS data
The typical steps involved in hand-selection of distorted curves are: acquisition of several curves (e.g. 10 × 10 s); comparison of the curves; rejection of distorted curves, identified by a deviation from the majority of the other curves; and calculation of the average of the undistorted curves. Our approach attempts to formalize and automate these steps. Accordingly, the  algorithm we propose consists of the following parts: calculation of correlation curves corresponding to short intervals of the fluorescence intensity I(t), ordering of the curves based on their deviation from the other curves and averaging of the curves with the smallest deviation: 1. Division of the fluorescence intensity trace I(t) of length T M into n = T M /ΔT short intervals (time windows) I k (t) of length ΔT . ΔT can be much smaller and n much larger than practical for hand-selection.
2. Calculation of n correlation curves G k (τ i ) from the short intensity traces I k (t) with a reduced time resolution using a multiple tau correlation algorithm. The choice of a minimal lag-time τ min ≈ 0.1τ D1 has the advantage that calculation times are reduced and that the parts at smaller τ, where the shot noise (random noise on the curve) dominates the correlation curve, is not considered for the following ordering step.
3. Ordering of the curves according to their deviation from the average: (a) Make a list of all curves.
with the average of all other G j =k (τ i ) in the list. As a measure for the difference we use: j =k denotes the average over all curves j = k, i is the average over all lag times τ i .   At the end of step 3 all curves are sorted according to their quantitative deviation from the average. 4. Chose maximum allowed dG max . How to chose dG max will be discussed in more detail below. After this step we have eliminated the irregular curves.
5. For all dG m < dG max calculate the corresponding correlation curvesG m (τ i ) with the full time resolution.
We implemented the algorithm based on the raw data of photon arrival times, but it can be implemented equally well if many short correlation curves are acquired, e.g. by using a hardware correlator.
A larger usable part of the data can be obtained by choosing overlapping time intervals, e.g. by using 1 s time intervals at 0.5 s spacing. Stronger oversampling (smaller spacing) further increases the usable part of the data. However, this comes at the expense of increased calculation times.

Length of time interval ΔT
How long should the time interval ΔT be? A small ΔT increases the number of correlation curves and therefore the usable portion of the data. In addition, a residual change of the average intensity during ΔT will be reduced. However, when ΔT is too small the shot noise becomes the dominating noise on the correlation curve and conceals the distortions. As a result the sorting algorithm fails. Also, traces that are too short (ΔT 10 5 τ D , where τ D is the timescale of interest, e.g. the diffusion time of the single molecules) result in a systematic bias [11]. Therefore, a ΔT should be chosen which is large enough that this bias is avoided and for which the shot noise is not larger than the deviations due to distortions on the time scale evaluated during the sorting algorithm. Visual inspection of correlation curves from a typical measurement, calculated on different time intervals ΔT , will help finding an optimal value for this parameter.

Maximum difference dG max
The parameter dG max defines what distortions are still allowed. A large dG max results in better statistics and lower noise on the correlation curves at the expense of a larger influence of distortions. If dG max is so small that only a few curves are left, the average curve will be very noisy and a possible bias can occur [Figs. 2(e)-2(g)].
To determine the optimal dG max we suggest plotting the fitted parameters of interest (e.g. τ D , N) in dependence of dG max for a few curves of a dataset and to determine the range in which these parameters are constant [Figs. 2(e)-2(g)]. Another option is to measure homogeneous control samples to determine the range of naturally occurring dG m . We found that the optimal dG max was usually about one order of magnitude above the minimum of dG m [ Fig. 2(h)]. It is a merit of this approach that the dependence of the parameters of interest (e.g. τ D , N) on the exact choice of dG max is very small.

FCS on Streptavidin-Atto565 with fluorescent beads
As a well defined test system to investigate the performance of the automated analysis algorithm we chose a 2 nM solution of Streptavidin-Atto565 contaminated with 100 nm fluorescent beads which mimic bright vesicles or aggregates. The transit of the bright beads leads to spikes in the fluorescence intensity [ Fig. 2(a)]. These spikes occurred so frequently that all curves calculated on 10 s parts of the intensity trace were affected. Therefore, even hand-selection could not recover undistorted correlation curves in this case. Auto-correlation curves, calculated on longer (10 s) or on shorter (1 s) intervals, showed severe distortions [ Fig. 2(b)]. The two-component fits, although visually acceptable, resulted in diffusion times (τ D1 = 0.09 ms, 10 s intervals and τ D1 = 0.10 ms, 1 s intervals) very different from the control consisting of only Streptavidin-Atto565 without beads (τ D = 0.185 ms). The curve extracted with the automated selection algorithm described above resulted in a correlation curve hardly distinguishable from the control τ D = 0.185 ms).
Measurements on samples with varying amounts of fluorescent beads exhibited a large spread in the diffusion times of the uncorrected curves (τ D = 0.76 ± 0.95 ms, 10 s interval and τ D = 0.16 ± 0.09 ms, 1 s interval), whereas the automatically processed curves lead to very reproducible parameter estimates (τ D = 0.181 ± 0.007 ms, F = 0.953 ± 0.016). Even for the samples with a large amount of beads, where only 10% of the data could be used to construct the final correlation curve, the fit parameters were consistent (τ D = 0.180 ms, F = 0.951).

FCS on ICA512-EGFP in Ins-1 cells
To demonstrate the automated selection algorithm on biological samples we chose a Ins-1 cell line [12] stably expressing ICA512-EGFP [13], a protein that partitions into secretory granules. These granules are bright, slowly moving entities that cause spikes in the fluorescence intensity [ Fig. 2(c)]. In addition, the average intensity decreases slightly over time due to depletion of ICA512-EGFP and the bleaching of immobile fluorescent proteins and the auto-fluorescent background. Also here the resulting correlation curve was severely distorted [ Fig. 2(d)]. The automated selection algorithm resulted in a curve with greatly reduced distortions, although the comparison with the control (ICA512-EGFP in a cell showing only very few secretory granules) revealed a residual slow component. The reason could be dim granules diffusing through the periphery of the detection volume which were not bright enough to cause distortions comparable to the residual noise on the curves. However, compared to the uncorrected curves the slow component was reduced from 75% (10 s intervals) and 57% (2 s intervals) to 15% and had a smaller impact on the parameter estimates.

Dependence of fit parameters on dG max
To investigate the influence of the exact choice of dG max on the parameter estimates we used the selection algorithm to order the curves calculated on short intervals of the fluorescence intensity, determined the average of a fraction of the curves with the smallest dG m and fitted this average with a two-component fit including triplet (Streptavidin-Atto565 with fluorescent beads and ICA512-EGFP with bright granules) or a one-component fit with fixed triplet (controls).
For the control samples, the parameter estimates hardly depended on the fraction of used curves [Figs. 2(e)-2(f)]. Only the use of a fraction which was too small, as when it was comprised of just a handful of remaining curves, lead to high noise on the average curve and less accurate parameter estimates. The parameter dG m , a measure for the difference between the last curve used and the average of the other curves [Eq. (4)], spanned about one order of magnitude [ Fig. 2(h)]. This can be considered the variation of dG m in samples not influenced by distortions, suggesting a choice of dG max ≈ 10 × min(dG m ).
For measurements on Streptavidin-Atto565 with fluorescent beads and on ICA512-EGFP with bright granules dG m spanned several orders of magnitude [ Fig. 2(h)]. Due to the higher number of free fitting parameters (in addition to N and τ D in the control case now also the triplet fraction T , the fast fraction F and the diffusion time of the slow fraction τ D2 were varied during the fit), their variation was in general increased. Apart from this noise, N and τ D remained constant over a surprisingly large range of dG max , even for values of dG max significantly larger than the suggested ≈ 10 × min(dG m ) where the slow fraction became significant. Only inclusion of more than 70% of the curves lead to a significant deviation of the parameters.
Compared to the control, a systematic variation of τ D was visible already much earlier when more than 70% of the curves were discarded. The reason was mainly due to the triplet/blinking part: if fewer curves are considered, the higher noise on the curves results in a poorly defined triplet/blinking fraction T . Accordingly, in this range T showed strong deviations (data not shown). Fixing T during the fitting reduced these deviations. However, this also decreased the range over which τ D remained unaltered -due to the large number of fluorophores in vesicles or fluorescent beads these entities do not exhibit a significant triplet/blinking. If their contribution to the correlation curve becomes larger, the overall triplet/blinking fraction T is reduced.

Limitations
The suggested method will ease the data analysis in most cases where hand-selection is otherwise required. The density of bright events limits the length of the time intervals ΔT , since only unaffected parts of the data are used to construct the final curve. If the density of bright aggregates or vesicles is so high that every time interval is affected, this approach fails. On the other hand, a minimum length of the time windows is required to reduce the shot noise on the correlation curves sufficiently, such that distortions become detectable. If bright events are sparse, the time windows can be chosen longer and samples with lower molecular brightness and hence stronger noise on the correlation curves can be analyzed.
A strong change in the mean intensity (depletion due to photobleaching, bleaching of a background) leads to distortions even if curves are calculated using short time windows. In addition, the individual curves will exhibit different amplitudes and might be discarded by the algorithm since they deviate from the average correlation curve. In this case, a correction of the raw intensity trace [8] might be helpful. However, care must be taken that bright events are excluded from the intensity trace when calculating its smoothed approximation.
Due to rejection of the parts of the data affected by bright aggregates, information about this slowly mobile species is lost. In addition, the short time intervals lead to a cut-off of slow dynamics. Although of potential interest, this component is usually not accessible by FCS since rare events require immensely long measurement times for statistical accuracy, otherwise repeated measurements will exhibit a large spread. To recover some information about the bright events, additional methods like counting the spikes and comparison of the average fluorescence intensity of the selected parts with the overall intensity can be attempted.

Comparison to other approaches
Although hand-selection of undistorted curves is not optimal, there are only a few reports on automated FCS data analysis. This might be partially due to the complexity of previous approaches with a set of empirical parameters which had to be adapted to the experimental conditions. These methods rely on judging the quality of the fit and rejection of those curves with irregular residuals [14,15]. They require individual curves of reasonably low noise and therefore comparably long time windows ΔT . The influence of the exact choice of the parameters and the model on the outcome of the selection is not clear. A direct selection of curves prior to fitting might therefore be more suited to distinguish between different possible models.
A straight-forward approach to reject distortions due to single bright events is the identification of peaks in the intensity. We tested an algorithm which rejects short intervals if the maximum of the intensity, sampled at a time resolution t b , is larger than a cut-off I max . The performance of this algorithm in rejecting distortions was only slightly below that of the sorting algorithm presented in this work, but required a careful choice of t b and I max for every set of measurements. With parameters t b and I max optimized for every curve, the peak-finding algorithm resulted in a fast fraction F a few percent lower than that obtained with the sorting algorithm. τ D was similar within the fitting error. Apart from its dependence on empirical parameters a drawback of the peak-finding algorithm is that it might fail to detect peripheral transits or low brightness vesicles, mistake shot noise in the intensity for an irregular event and does not remove distortions due to instabilities or photobleaching.
The new sorting approach presented here has a good performance in rejecting distorted curves, it requires only short time windows ΔT and can therefore sample large parts of the data and, most importantly, it does not depend strongly on empirical parameters which renders it easy to use.

Fluorescence Correlation Spectroscopy
All FCS measurements were performed on a Zeiss LSM 710 Confocor 3 System using a 40 x NA 1.  Streptavidin-Atto565 (Atto-Tec, Siegen, Germany) mixed with 100 nm fluorescent beads (Flu-oroSpheres 505/515, Molecular Probes, Eugene, OR) 6.4 μW of the 561 laser line and 3.2 μW of the 488 laser line were used. For measurements on ICA512-EGFP 8.1 μW of the 488 nm laser line were used. The raw data of photon arrival times were stored and further processed.

Data analysis
Data analysis routines, as described in the section 'Automated analysis of FCS data' were implemented with MATLAB (Mathworks, Natick, MA) with some time critical routines (binning of the photon arrival times, multiple tau correlation algorithm) written in C. FCS curves were fitted with a nonlinear least squares fitting algorithm. The time required to automatically analyze a typical 100 s measurement is a few seconds. The majority of the time is required to calculate the final correlation curve with a high temporal resolution.

Conclusion
We presented a method to automatically analyze FCS data sets and to discard corrupted parts which otherwise lead to distortions of the correlation curves. This method replaces the time-consuming and subjective hand-selection of distorted curves, opening up a way to highthroughput FCS screens on biological samples. In addition, by evaluating short parts of the measurement, it extends the applicability of FCS to very noisy systems where even handselection does not result in undistorted curves.
The rejection of distorted curves is determined by essentially one parameter. Since the results depend only weakly on the choice of this parameter, it does not have to be adapted to the individual measurement. This renders the application of this method very easy with a low risk of introducing additional errors. Especially when implemented into the analysis software of commercial FCS instruments, this method will simplify FCS measurements in biological samples and further promote the application of this technique.