Ultra-fast, high-precision image analysis for localization-based super resolution microscopy

Localization-based super resolution microscopy holds superior performances in live cell imaging, but its widespread use is thus far mainly hindered by the slow image analysis speed. Here we show a powerful image analysis method based on the combination of the maximum likelihood algorithm and a Graphics Processing Unit (GPU). Results indicate that our method is fast enough for real-time processing of experimental images even from fast EMCCD cameras working at full frame rate without compromising localization precision or field of view. This newly developed method is also capable of revealing movements from the images immediately after data acquisition, which is of great benefit to live cell imaging. ©2010 Optical Society of America OCIS codes: (180.2520) Fluorescence microscopy; (100.6640) Superresolution. References and links 1. J. Lippincott-Schwartz, and G. H. Patterson, “Development and use of fluorescent protein markers in living cells,” Science 300(5616), 87–91 (2003). 2. D. J. Stephens, and V. J. Allan, “Light microscopy techniques for live cell imaging,” Science 300(5616), 82–86 (2003). 3. B. Huang, M. Bates, and X. W. Zhuang, “Super-resolution fluorescence microscopy,” Annu. Rev. Biochem. 78(1), 993–1016 (2009). 4. S. W. Hell, “Far-field optical nanoscopy,” Science 316(5828), 1153–1158 (2007). 5. H. Shroff, C. G. Galbraith, J. A. Galbraith, and E. Betzig, “Live-cell photoactivated localization microscopy of nanoscale adhesion dynamics,” Nat. Methods 5(5), 417–423 (2008). 6. T. J. Gould, V. V. Verkhusha, and S. T. Hess, “Imaging biological structures with fluorescence photoactivation localization microscopy,” Nat. Protoc. 4(3), 291–308 (2009). 7. P. N. Hedde, J. Fuchs, F. Oswald, J. Wiedenmann, and G. U. Nienhaus, “Online image analysis software for photoactivation localization microscopy,” Nat. Methods 6(10), 689–690 (2009). 8. S. B. Andersson, “Localization of a fluorescent source without numerical fitting,” Opt. Express 16(23), 18714– 18724 (2008). 9. B. A. Wilt, L. D. Burns, E. T. Wei Ho, K. K. Ghosh, E. A. Mukamel, and M. J. Schnitzer, “Advances in light microscopy for neuroscience,” Annu. Rev. Neurosci. 32(1), 435–506 (2009). 10. S. A. McKinney, C. S. Murphy, K. L. Hazelwood, M. W. Davidson, and L. L. Looger, “A bright and photostable photoconvertible fluorescent protein,” Nat. Methods 6(2), 131–133 (2009). 11. J. D. Owens, D. Luebke, N. Govindaraju, M. Harris, J. Kruger, A. E. Lefohn, and T. J. Purcell, “A survey of general-purpose computation on graphics hardware,” Comput. Graph. Forum 26(1), 80–113 (2007). 12. G. Quan, T. Sun, and Y. Deng, “Fast reconstruction method based on common unified device architecture (CUDA) for micro-CT,” J. Innovative Opt. Health Sci. 3(1), 39–43 (2010). 13. C. von Middendorff, A. Egner, C. Geisler, S. W. Hell, and A. Schönle, “Isotropic 3D Nanoscopy based on single emitter switching,” Opt. Express 16(25), 20774–20788 (2008). 14. R. J. Ober, S. Ram, and E. S. Ward, “Localization accuracy in single-molecule microscopy,” Biophys. J. 86(2), 1185–1200 (2004). 15. P. H. Calamai, and J. J. More, “Projected gradient methods for linearly constrained problems,” Math. Program. 39(1), 93–116 (1987). #126111 $15.00 USDReceived 29 Mar 2010; revised 13 May 2010; accepted 14 May 2010; published 20 May 2010 (C) 2010 OSA 24 May 2010 / Vol. 18, No. 11 / OPTICS EXPRESS 11867 16. D. Q. Mayne, and E. Polak, “Feasible directions algorithm for optimization problems with equality and inequality constrains,” Math. Program. 11(1), 67–80 (1976). 17. A. R. Small, “Theoretical limits on errors and acquisition rates in localizing switchable fluorophores,” Biophys. J. 96(2), L16–L18 (2009). 18. R. E. Thompson, D. R. Larson, and W. W. Webb, “Precise nanometer localization analysis for individual fluorescent probes,” Biophys. J. 82(5), 2775–2783 (2002). 19. http://www.andor.com/scientific_cameras/ixon. 20. S. T. Hess, T. P. K. Girirajan, and M. D. Mason, “Ultra-high resolution imaging by fluorescence photoactivation localization microscopy,” Biophys. J. 91(11), 4258–4272 (2006). 21. E. Betzig, G. H. Patterson, R. Sougrat, O. W. Lindwasser, S. Olenych, J. S. Bonifacino, M. W. Davidson, J. Lippincott-Schwartz, and H. F. Hess, “Imaging intracellular fluorescent proteins at nanometer resolution,” Science 313(5793), 1642–1645 (2006). 22. M. J. Rust, M. Bates, and X. W. Zhuang, “Sub-diffraction-limit imaging by stochastic optical reconstruction microscopy (STORM),” Nat. Methods 3(10), 793–796 (2006).


Introduction
Fluorescence microscopy offers a sensitive, noninvasive approach to probe cells. With the advent of fluorescence labeling technologies and advanced fluorescence imaging techniques, it is now possible to gain deep insights into dynamic processes in living cells with high spatial and temporal resolution [1,2]. In particular, innovative super resolution imaging techniques, which are mainly derived from fluorescence microscopy, have revolutionized biological imaging by providing the capability to visualize the distribution, transport, and interactions of individual molecules in living cells with unprecedented spatial resolution. Among the innovations, photoactivated localization microscopy (PALM) and stochastic optical reconstruction microscopy (STORM) rely on the detection and localization of single fluorescent probes to reconstruct images the resolution of which is not limited by far-field diffraction but by photon-counting statistics [3,4]. In live cell imaging, one has to make a trade-off among the sensitivity of detection, the imaging speed and the viability of the specimen [2]. Intrinsically wide-field imaging approaches such as PALM and STORM hold great potential for live cell imaging, since they offer simultaneously ultra-sensitive detection, maintain cell viability for extended time periods and image a large field of view at once.
Imaging speed becomes a key issue when multiple dynamic cellular processes are to be monitored. Localization-based super resolution microscopy is relatively slow [3,4]. For example, in live cell PALM, ~30 s are necessary at present to acquire images at a resolution of ~60 nm [5]. However, a complete image analysis, including localizing multiple molecule positions in each image frame and reconstructing the final image from a large number of image frames usually takes even longer than image acquisition [6]. As a result, images can only be visualized long after data acquisition, which is highly undesirable in tracking dynamic processes in living cells.
Recently, Nienhaus and associates introduced a fast image analysis method that relies on an algebraic algorithm called fluoroBancroft [7,8], where parallel data acquisition and analysis is possible for up to 100 molecules per frame with a camera frame time of 100 ms at only a small resolution loss. Further optimization of imaging acquisition speed is desirable because brighter probes and faster cameras [9,10] will become available in the future. These developments will lead to substantially larger numbers of molecules that need to be analyzed in parallel. Ultimately, live cell imaging will benefit most from real-time image analysis of a much larger number of fluorescent molecules without compromising either localization precision or field of view. Moreover, revealing dynamic information for large-scale live cell imaging will be possible immediately after image acquisition. Obviously, this goal is still beyond reach.
To meet the increasing need for fast image analysis in localization-based super resolution microscopy, we have developed a new method, termed MaLiang (after a traditional Chinese folktale "Ma Liang and his Magic Brush") for "maximum likelihood algorithm encoded on a Graphics Processing Unit (GPU)". This GPU-based maximum likelihood method exploits the fact that a GPU can compute locations of molecules in parallel [11,12], while the maximum likelihood algorithm guarantees a high localization precision [13]. Compared with the fastest algebraic fluorosBancroft (FB) method reported recently, our method has an ~8-fold speed gain in the full routine for analyzing experimental image frames. Simulation results indicate that this new method can analyze ~20,000 molecules per second and, thus, is capable of processing experimental images from fast EMCCD cameras in real-time. The capability of this method for discovering fast cellular process in living cell imaging is also discussed.

Description of the maximum likelihood algorithm
A variety of methods are applied in localizing single fluorescent molecules. Similar to Gaussian fitting (GF), the maximum likelihood algorithm is a high precision localization method, as described earlier [14]. This algorithm was combined with a generalized projection gradient method to locate single fluorescent molecule efficiently. Note that the point spread function is generally used to model the emission intensity pattern from individual fluorescent molecule. Theoretically, the intensity in pixel (i, j) is described by where (x 0 ,y 0 ) is the position of fluorescent molecule, and A and b denote the peak values of signal and background noise intensities, respectively. The observed signal, corresponding to Eq. (1) in pixel (i, j), is expressed by the symbol q i,j. We can compute the probability of I i,j = q i,j by the following equation , , , From Eq. (2), the spatial distribution of the light intensity is well described by a joint probability , , , The logarithm of Eq. (3) can be maximized by adjusting the parameters in Eq. (1). Note that the parameters should all be above zero. Based on the information described above, we can compute the position (x 0 , y 0 ) of the fluorescent molecule by solving the following optimization problem , , , The generalized projection gradient method [15,16] can be used to obtain the optimal values.

Description of the MaLiang method
The total process for localizing the positions of multiple molecules and reconstructing a final super resolution image is embedded in a combined computational framework of CPU and GPU. As shown in Fig. 1, three steps are carried out successively for the entire image analysis routine.

Step 1. De-noising
Original images were loaded into the memory of the CPU and then delivered to the GPU hardware. An optimized convolution template for de-noising in the GPU was modified with the following facts: an image filter separated into horizontal (row) and vertical (column) passes is optimal for boosting computing efficiency of the GPU [17], and an annular averaging filter efficiently eliminate the influence of different background intensity of fluorescent molecules in an image [7]. Therefore, in the convolution template for de-noising, the forms of the averaging and annular averaging filters were modified to have the same sum of two separate filters. The raw image was then convoluted with the two filters respectively, resulting in two corresponding images which were further summed up to give the final denoising image. Note that the dimension of the averaging filter should be adjusted to match the diameter of the corresponding Airy disk ad should be smaller (2 pixels) than that of the annular filter.

Step 2. Sub-region extraction
This step is used to extract sub-regions for further position localization. Due to a sparse distribution of fluorescent molecules in an image, extracting sub-regions can be performed efficiently on the CPU. For a noise-reduced image, the positions of the pixels where the values are local maxima above a given threshold value were extracted and stored. The threshold value is estimated to be more than 5 the standard deviation of the background noise intensity in the noise-reduced image. A sub-region with 5 x 5 pixels was extracted for the FB method, and a region with 7 x 7 pixels for the MaLiang and GF methods. The different sizes used here were based on the following considerations. For the FB method, it is necessary to separate the signal and noise. A larger sub-region makes such a separation more difficult, thus reducing the localization precision of the molecules. On the other side, MaLiang and GF methods are based on iterative optimization algorithms, therefore it is important to keep the data set that entirely includes the PSF of a single molecule.

Step 3. Localization
Values in each extracted sub-region are sent to the GPU and assigned a thread for further parallel computation. In the thread, the maximum likelihood algorithm was adopted to obtain precise sub-pixel localization of the fluorescent molecule. The number of extracted subregions that can be analyzed simultaneously in the GPU is governed by the number of threads, which is determined by the capacity of the GPU graphics card used. Using an NVIDIA GeForce 9800GT graphics card with 1.0 GB memory, 3,000 fluorescent molecules could be simultaneously analyzed and located.

Experimental and simulated data for image analysis
For a fair comparison between the fluoroBancroft method [7] and our newly developed GPUbased maximum likelihood algorithm, we used the same PALM image frames as those analyzed by the former method. As Gaussian fitting is more widely used by researchers in this field, we also include a comparison with this method. To verify our results from real PALM image data and to explore the limitations of our method in regard to the spatial and temporal resolution of localization, simulation studies based on the concept described by Ober et al [14] were also carried out. Parameters were modified to match the experimental images, where the radius of Airy disks equals 260 nm in object space. We note that spots with radii exceeding the average radius of the Airy discs by more than 3.5 × the standard deviation were regarded as overlapping spots and thus rejected.
In the simulations, the pixel size was set to 130 nm, so that the Airy disk occupies 16 pixels. In this case, when the photons and the background noise are ascertained, the localization precision could be optimized according to the theoretical analysis and the simulation by R. E. Thompson et al. [18] . The background noise was mostly considered as Poissonian noise and the SNR is defined as the ratio of I 0 to the standard deviation of I 0 + I b .
Here, I 0 is the peak signal above the background noise and I b is the intensity of background noise and was set to 10.

Hardware platform for image analysis
All programs were written in C under the Microsoft Visual Studio 2005 environment, using an Intel i5-750, 2.67 GHz personal computer with 4.0 GB memory. A fairly cheap (~$150 as of January 2010) NVIDIA GeForce 9800GT graphics card with 1.0 GB memory was used for GPU-based localization computation. Note that the execution times depend on the hardware platform.

Method for eliminating bad data points
In Figs. 5c and 5f, a simple linear regression method was applied to reject data points that markedly deviated from the mean contour line. Only points are included that deviate not more than four times the standard deviation from the mean contour line. The final images are built up by repeating this process several times. Note that only one linear structure exists in each of the final images.

Localization accuracy
Comparisons of the localization accuracy among Gaussian fitting (GF), fluoroBancroft (FB), and MaLiang (MLG) methods are presented in Fig. 2. The resolution differences among these methods (Figs. 2b-2d) are not easily visualized, indicating that further analysis is required. Based on an approach suggested by Nienhaus et al [7], where the half width of the distribution of fluorescent molecules around their mean contour line was used for quantitative comparisons, it is found that the localization precision of the MaLiang method is comparable to the GF method and significantly higher than the FB method (Fig. 2f). This finding is supported by simulation (Figs. 2g-2h). Apparently, the higher localization precision of MaLiang over FB comes from the intrinsic performance difference between fitting and algebraic algorithms. The results also indicate that the Poisson noise model used in the MaLiang method is suitable to describe the noise behavior of the real image frames. (f) The full width at half maximum (FWHM) of the distribution for all image frames (data sets 1-8). The frames were grouped into 8 consecutive data sets with 1000 frames each. The mean absolute error (g) and standard deviation (h) as a function of signal-to-noise (SNR) are also presented.

Computational speed of image analysis
Generally, the computation duty increases with the image acquisition rate and the molecules needed to be localized per frame. The computation duty between typical and fast EMCCD is presented in Table 1. Obviously, the fastest algebraic fluorosBancroft (FB) method reported recently is not capable of real-time processing experimental images from fast EMCCD working at full frame rate.
We evaluated the image analysis speed of the MaLiang method by performing an entire PALM image analysis routine with first 1000 experimental image frames ( Table 2). The total execution time with the MaLiang method is found to be 1.50 s, an 8-fold speed gain over the FB method (11.83 s) and a 22-fold gain over the GF method (34.02 s). We found that the program written in C runs about 2.5-fold faster than that in Matlab. ~112 500 (11000) a Specifications are adapted from the Andor iXon DU897 as a typical EMCCD camera and the Andor iXon DU860 as a fast EMCCD camera [19]. b Abbreviation for frames per second. c Theoretical number of molecules that need to be localized. The calculation was based on the analysis by Hess et al., who assumed that a good density of localized molecules per frame is ~1 µm −2 considering a trade-off between imaging speed and localization efficiency [6]. The number in parenthesis is estimated for heterogeneous biological structures in live cell imaging experiments, assuming that 10% of the image contains the majority of molecules [20]. .04 a For the simulations, the time consumption was determined from the average time required for localization in a simulated image frame with 512 x 512 pixels and 100 molecules, based on 30 simulated images. To estimate the analysis time for the experimental data, the first 1,000 experimental image frames were used.
The high gain in image analysis speed comes from a combination of three factors: GPUbased parallel localization, appropriate convolution template and optimized number of iterations. First of all, we realized that integrating a fitting-based method into an image analysis method would be essential if high localization precision is desired. The only problem is how to minimize the negative effects from the slow fitting process. GPU-based parallel localization is an ideal solution, since the GPU is capable of processing multiple molecules simultaneously, which is desired by localization-based super resolution microscopy. Simulation results show that it is easy to analyze an image frame with up to 100 molecules (Figs. 3a-3b) without noticeably increasing either localization or total computation time. Moreover, the appropriate form of the convolution template is also helpful for maximizing the speed of GPU-based localization (Section 2.2). Third, although localization precision generally increases with a larger number of iterations in the fitting-based method (maximum likelihood in our case), it is found that ten iteration steps are sufficient for achieving satisfactory localization precision, while >10 iterations can only slightly improve the precision (Fig. 4). Therefore, the number of iterations was set to ten for all of the following image analysis.
The speed gain in analyzing experimental data is verified by simulation data (Figs. 3c-3d and Table 2). We found that more than 30 simulated image frames (512 x 512 pixels) with 100 molecules in each frame could be analyzed within 0.11 second, corresponding to an imaging analysis capacity of ~20,000 molecules per second. The gain in the total computation time of the MaLiang method is 11 over the FB and 22 over the GF methods. The speed is fast enough for real-time processing experimental images from typical fast EMCCD cameras, while neither localization precision nor field of view is compromised.

Capacity for tracking dynamic processes
We checked the capability of our high-precision, high-speed image analysis method for discovering fast cellular process in living cell imaging. Sample drift in the PALM instrument was corrected according to Betzig et al. [21], using the bead marked with magenta circle in Fig. 5. The drift-based contribution to position errors in PALM data is significantly reduced after sample drift correction (Fig. 6). The standard deviation of the bead positions obtained from 8,000 imaging frames is 7.5 nm, so that under the present imaging conditions, intracellular structure motions larger than 20 nm should be resolvable [22]. The motion of the partial F-actin filament in Fig. 5b is found to be < 20 nm (Fig. 5d), indicating that this structure is almost stationary during the long image acquisition process. However, another Factin filament (Fig. 5e) was found to display > 50 nm motion (Fig. 5g), as detected between data sets 5 and 7. To quantify the sensitivity toward motion detection between the fast image analysis methods (MaLiang and FB), we calculated the Z-statistics from the positions of molecules for the two successive data sets in Fig. 5f. It is found that the MaLiang method has better performance in discovering such motions than the FB method (Fig. 7), which is advantageous for dynamic processes in live cell imaging. Fig. 5. Capability of detecting intracellular motions with the MaLiang method. The full image in (a) was overlaid from four PALM images reconstructed from data sets 1, 3, 5 and 7. The green region in (a) was enlarged to (b), and the latter was cleaned up (c) and further analyzed (d). Individual PALM images for data set 1 (blue), 3 (green), 5 (red) and 7 (black) are shown in (c), where data points heavily deviating from the mean contour lines were eliminated (see Section 2.5 for details). The projection of all molecules in (c) to an orthogonal direction of the mean contour line from data set 1 is shown in (d). Similarly, (e)-(f) are for the yellow region in (a). The bead marked with a magenta circle in (a) was used to correct sample drift. The scale bar is 3 µm for (a) and 500 nm for (b) and (e). Fig. 6. Sample drift correction. (a) Sample drift was determined by tracking a luminescent bead during the entire PALM image acquisition process. The smooth drifts were used in the sample drift correction, so that the influence from localization error to the localization precision of the PALM data could be minimized. Localization of the bead before (b) and after (c) drift correction indicates that the drift-based contribution to position errors in PALM data is significantly reduced. The standard deviation of the bead position in (c) is 7.5 nm. Fig. 7. Comparison of the motion detection sensitivity between the MaLiang (MLG) and fluoroBancroft (FB) methods. The motion detection sensitivity is characterized by Z-statistics (y-axis), where bigger value represents larger difference between two data sets. Movement between data set 1 and 3 is represented as motion 1, while motion 2 and 3 corresponding to movements from data set 3 to 5, and 5 to 7, respectively.

Summary
Our MaLiang method, based on the maximum likelihood algorithm and GPU computation, is capable of ultra-fast localization-based super resolution image analysis with high localization precision. Compared with the fast fluoroBancroft method reported recently, our method is about eight times faster in the full routine for analyzing experimental image frames without compromising either localization precision or field of view. The MaLiang method has been shown to have excellent performance in discovering motions in live cell imaging.