Optimal 3D single-molecule localization in real time using experimental point spread functions

We present a real-time fitter for 3D single-molecule localization microscopy using experimental point spread functions (PSFs) that achieves optimal 3D resolution on any microscope and is compatible with any PSF engineering approach. This allowed us to image cellular structures with a 3D resolution unprecedented for astigmatic PSFs. The fitter compensates for most optical aberrations and makes accurate 3D superresolution microscopy broadly accessible, even on standard microscopes without dedicated 3D optics.

As an alternative to simple Gaussian PSF models, fitting methods using experimentally acquired PSF have been developed that can in theory achieve a higher precision, such as PSF correlation 7 , phase retrieval 8,9 , or interpolated PSFs 3,10-13 . In practice however, at the moment these methods are limited in their usability due to either a) low accuracy and robustness, b) a challenging process to generate an accurate PSF model 14 , c) slow speeds preventing online fitting during data acquisition or d) lack of camera-specific noise models, limiting the use of increasingly popular sCMOS cameras. Additionally, non-intuitive interfaces, restrictive licenses, and dependencies on specific programming languages and libraries fundamentally complicate their use, especially for users without an expert programming background. Thus, simple Gaussian, and not experimental PSF models are still generally used in 3D SMLM. Particularly for the majority of labs that have microscopes without perfect optics, this leads to a resolution that is very much worse in z than in x and y such that little meaningful 3D information is obtained.
Here, we present a software that overcomes these limitations and makes experimental PSF fitting generally accessible and practically useable, and thereby enables 3D SMLM with optimal z-resolution on any microscope (Supplementary Software 1,2). It contains an intuitive tool to robustly calibrate the experimental PSF and a fitter for cubic spline (cspline) interpolated PSF models that reaches the necessary fitting speeds for real-time localization (>10 5 fits/second). It also achieves the highest possible localization precision, the Cramér-Rao lower bound (CRLB) on simulated ( Supplementary Fig. 2) and experimental ( Supplementary Fig. 3) data.
With this new fitter, we were able to resolve very fine structural details on biological structures (Fig. 1, Supplementary Fig. 4), which were previously accessible only by extremely complex interferometric microscopes ( Supplementary Fig. 5). We were able to resolve in 3D the hollow cylinder of immunolabeled microtubules both with DNA-PAINT 15 ( Fig. 1a-b) and dSTORM 16 (Supplementary Fig. 6) using the simple astigmatic 3D method. In comparison to the commonly used Gaussian fit 17 , our new fitter achieved a higher precision and avoided distortions (Fig. 1b, c). Furthermore, we could visualize the spherical geometry of clathrin-coated pits without distortions and found that almost all localizations were in the clathrin coat, highlighting the high localization accuracy (Fig 1d).
We achieved this high performance with our fitter by a systematic optimization in which we overcame previous bottlenecks in the use of experimental PSFs. Firstly, we optimized the precision by developing a robust implementation of maximum likelihood estimation (MLE) for spline-interpolated PSF models ( Supplementary Fig. 7). Compared to the simple Gaussian PSF models, our fitter reaches substantially higher localization accuracies on both simulated ( Supplementary Fig. 8) and experimental (Supplementary Fig. 9 and Fig. 1b, c) data. Moreover, our fitter avoids the systematic error of Gaussian PSF models in estimating the number of photons per localization 18 (Supplementary Fig. 10). We note that the quality of the PSF model is vital to avoid artifacts commonly observed when using experimental PSFs ( Supplementary Fig. 11). To avoid these artifacts, we developed a simple and userfriendly tool to robustly create accurate experimental PSF models from several bead stacks (Supplementary Software 1,2). Secondly, we optimized the speed of our fitter (Fig. 1e, Supplementary Fig. 12). High speeds are essential to enable fitting during data acquisition. This allows monitoring the image quality in real time, and to stop acquisitions as soon as a sufficient number of molecules have been localized 19 or if the sample is deemed unsatisfactory. By implementing our fitter on the Graphical Processing Unit (GPU), we obtained fitting speeds more than a hundred times faster than for the fastest previously available implementation 13 . This now enables online analysis, even of dense structures and large fields of view (Fig. 1e).
Thirdly, we extended our fitter to sCMOS cameras (Fig. 1e), which offer fast imaging speeds and large fields of view, making them exquisitely suitable for SMLM 20 . For this, we included an sCMOS-specific noise model, which was previously limited to simple Gaussian PSF models 20 , in our fitter for experimental PSFs. We then validated that our fitter avoids camera noise-induced localization errors ( Supplementary Fig. 13) and is fast enough for online analysis also for sCMOS cameras (Fig. 1e).
Taken together, our software retains the ease-of-use and accessibility of astigmatic 3D SMLM while approaching a 3D image quality that has to date only been achieved with highly complex 4Pi microscopes 21 .
However, astigmatism, as any other PSF engineering approach, requires dedicated 3D optics. Many users do not have access to these microscopes and are thereby limited to acquiring two-dimensional data. But even unmodified PSFs contain information on the z-position of the fluorophore 22 , which can be estimated from the PSF size, or by using the recently published photometry approach 18 . However, the z-resolution or axial range remain limited and these methods cannot distinguish fluorophores above and below the focus because of the high symmetry of the PSF.
Here we overcome these limitations and extract accurate and precise z-positions by fitting 2D SMLM data with an experimental model of the unmodified PSF. This allows us to exploit subtle differences between the upper and lower halves of the PSF to correctly localize the fluorophore. To this end, we developed a bi-directional fitting approach, in which we fit once with a z starting parameter above the focus and a second time below the focus and choose the solution with the maximum likelihood.
We achieved a z-resolution almost as good as that of astigmatic PSFs (Fig. 2a, b, Supplementary Fig. 4, Supplementary Fig. 8, Supplementary Fig. 14). We then directly compared our approach with existing methods based on PSF size (Fig. 2c) or photometry (TRABI) 18 (Fig. 2d). We found that only our software could resolve the nucleoplasmic and cytoplasmic rings of the nuclear pore complex, which are axially spaced apart by 53 nm 23 . While close to the focal plane the z resolution was slightly decreased, and 5% of missassignments lead to a faint mirror image (Fig. 2b), our fitter enabled high-resolution 3D imaging directly on standard microscopes without any 3D optics.
Besides astigmatism, a variety of sophisticated PSF engineering approaches have been developed (double helix 2 , self-bending 24 , tetrapod 4 , phase-ramp 3 etc.), which increase the depth of field beyond ~1 μm, but require complex data analysis. As our fitter is directly applicable to all those PSFs ( Supplementary Fig. 14), it will allow many more labs to exploit the advantages of advanced PSF engineering for large sample volumes. Moreover, for the first time it allows the use of advanced PSF engineering with sCMOS cameras while accurately accounting for the camera noise.
To summarize, we presented a fast, robust and precise single-molecule fitter for arbitrary PSF models. This allowed us to achieve a substantially improved 3D resolution and image quality using engineered astigmatic PSFs or unmodified PSFs from a standard microscope. As deformations of the PSF are included in the experimental PSF model, our fitter is robust with respect to field-independent aberrations, leading to a high accuracy even for objectives with a mediocre PSF or imperfect alignment of the microscope ( Supplementary Fig. 14, 15). The presented framework is not restricted to bead-stack based PSFs, but can be used in the same way to obtain and rapidly fit a spline-interpolation of an arbitrary analytical or phase retrieved PSF model, for which aberrations can be calculated and added computationally.
To enable the broad community to profit from these innovations, we developed an easy-touse fitting software that allows anybody to use the fitter directly on their own data. Additionally, we provide our CPU-based C-code and the GPU-based CUDA-code with extensive example code as open-source (Supplementary Software 1,2, github.com/jries/ fit3Dcspline.git). It can be easily incorporated in any programming language as it is not library dependent, and thus will greatly improve speed and accuracy of any existing singlemolecule fitting software.
With this, we hope that our software will transform 3D SMLM from an experts-only technique into a high-resolution imaging methodology that is broadly accessible.

Robust averaging of experimental bead stacks
Stacks of beads, immobilized on a coverslip, were acquired in a range of ±1000 nm with respect to the coverslip. A spacing in z between 10 nm and 50 nm works well. Beads in each stack were segmented in a maximum intensity projected image by maximum finding and thresholding. Sub regions around each bead location were cropped. Next, we aligned all beads with sub-pixel accuracy by 3D cross-correlation using a single bead as reference. To gain precision, another round of 3D alignment of the central part of each stack was performed using the average of the aligned bead stacks as the reference. We scaled up the central part of the cross-correlation by a factor of 20 by cubic spline interpolation and determined the x, y, and z shifts from the position of the maximum 26 . The bead stacks were shifted using cubic spline interpolation. Iteratively, bead stacks which showed a large dissimilarity from the average were identified based on the maximum value of the crosscorrelation and the mean square error and excluded from the average. To eliminate the background, the minimum value of the bead stack was subtracted and the amplitude was normalized by the total (summed up) intensity of the central slice. We further regularized the bead stack by smoothing it in the z-direction with a smoothing B-spline 27 . In the presence of field-dependent aberrations, systematic fitting errors can be corrected in a post-processing step based on a precise calibration 28 . In presence of strong field-dependent aberrations, we suggest to perform the calibration and fitting only locally on small sub-regions of near uniform aberrations.

Calculation of cspline-interpolated PSFs
Spline functions are piecewise polynomials for which high order derivatives are continuous at the knots, where the pieces connect. Cubic splines are the most commonly used splines, e.g. in computer graphics, geometric modeling, etc. Recently, this type of approximation theory has also been used for single molecule localization 10,12,13 . We implemented the cspline interpolation both in terms of cubic splines and cubic B-splines. A B-spline interpolation is generally less memory intensive since only one B-spline coefficient is needed in each spline interval. In comparison, (d + 1) n coefficients are required in each spline interval for spline polynomials, where d is the spline degree and n is the dimension.
However, our implementation of a 3D fit based on cubic splines is about 2.5 times faster than the cubic B-spline form due to the fact that cubic splines are more explicit and less calculations are needed to calculate spline values and derivatives. Therefore, the software used in this work is based on cubic splines with 64 coefficients in each voxel of the 3D PSF stack.
Similar to Ref. 13, the 3D PSF is described by a three dimensional cubic spline for voxel (i, j, k) as follows: where Δx and Δy is the pixel size of the PSF in the object space in x and y directions, respectively. Δz is the step size in the objective space in z direction. x i , y j and z k are the start positions of voxel (i, j, k) in x, y and z directions, respectively.
In order to calculate the cspline coefficients, the 3D PSF stack was firstly built by averaging the bead stacks from different fields of view by 3D cross correlation and by regularization, as described above. The spline coefficients were built based on the averaged and smoothed 3D PSF stack. As 64 cspline coefficients are required to describe each voxel, we up sampled (cubic spline interpolation) each voxel 3 times in x, y and z directions, respectively. The 64 up sampled coordinates (including boundary of neighboring voxels) were used to calculate the 64 cspline coefficients. The code to calculate the spline coefficients from bead stacks can be found in the Supplementary Software 1.

z-calibration of astigmatic Gaussian PSF models
Our PSF calibration tool also allows extracting z-positions using two widely used algorithms: a) calculate the z-positions directly from the calibrated σ x (z) and σ y (z) returned by the elliptical Gaussian fit; b) determine the z-positions by directly fitting the single molecules with the calibrated astigmatic Gaussian PSF model 29 .
For both calibrations, the bead stacks are fitted with an elliptical Gaussian PSF model and shifted in z according to their true z-positions where σ x (z) == σ y (z). The outliers are removed based on the root mean error of σ x (z) and σ y (z) with respect to the average curves.
For algorithm a), we calculate dσ 2 (z) = σ x (z) 2 − σ y (z) 2 and interpolate the functional relationship z(dσ) 2 by a smoothing cubic B-spline. This B-spline interpolation is then used to directly read out z from dσ 2 .
For algorithm b), σ x (z) and σ y (z) are fitted with a polynomial approximation for the astigmatic Gaussian model: The parameters σ 0x , A x , B x , σ 0y , A y , B y , γ and d are input parameters for the Gaussian fitter, which directly returns the z-coordinates of the fluorophores. We follow the formula in Ref. 25 to calculate the derivatives of the parameters. However, the iterative process was reimplemented using the Levenberg-Marquardt (L-M) algorithm.

Newton and Levenberg-Marquardt iterative schemes for MLE
Maximum likelihood estimation is the method of choice for fitting data with Poisson statistics 30 . The objective function for MLE is given by 31 : where μ k is the expected number of photons in pixel k from the model PSF function, x k is the measured number of photons. By minimizing χ mle 2 , we obtain the maximum likelihood for the Poisson process.
Methods for nonlinear optimization are usually iterative. For Newton iterative schemes, the search direction Δθ i of each iteration is given by 25 where θ i is the i -th free fit parameter. However, computing the second derivatives is often quite difficult and can be destabilizing when the model fits badly or is contaminated by outlier points 30 .
An alternative method is the L-M algorithm. The L-M algorithm is often used for least squares fitting as it is quick and robust. With relatively simple modifications 31 , the L-M algorithm has also been used to minimize χ mle 2 . In the L-M algorithm, the second derivatives term is neglected and only the first derivatives are used. In the L-M algorithm, the update Δθ i is given by where, H i,j is the Hessian matrix without the second partial derivatives term, defined as , λ is the damping factor, I is a diagonal matrix equal to the diagonal elements of the Hessian matrix.
This method is more robust since a damping factor is introduced and the second derivatives do not contribute. This damping factor is increased (multiplied by 10 in this work) if an iteration step does not decrease χ mle 2 or H i,j is not positive definite.

GPU Implementation
This GPU implementation of the iterative method follows the framework developed for fitting a Gaussian PSF model using a GPU 25 , however using the L-M algorithm. We note that a general framework for L-M fitting on the GPU has been recently published 32 . Unlike previous work for EMCCD and sCMOS noise models 20,25 , where the shared memory was used to store the molecule candidate data and readout noise map, we kept the data in the GPU global memory. Each thread is pointed to each molecule candidate and performs all the computations for each molecule candidate. No thread synchronization is required. 64 threads per block were used. The overall speed is about 1.9 times (small window size) to 47.9 times (large window size) faster ( Supplementary Fig. 12) than for the original code where shared memory was employed for the sCMOS noise model. We assume that this is due to the compiler optimization where more registers are used and the time for copying data from the global to the shared memory is saved. Both, the CPU based C-code and the GPU based CUDA-code were compiled using Microsoft Visual Studio 2010. The software was called via Matlab (Mathworks) mex files. It was run on a personal computer using an Intel(R) Core(TM) i7-5930 processor clocked at 3.50 GHz with 64 GB memory. An NVIDIA GeForce GTX 1070 graphics card with 8.0 GB memory was used for GPU based computation.

Simulation of realistic single molecule data
To simulate single molecule images using a realistic PSF model, we used the csplineinterpolated PSF model generated from experimental bead stacks as described above. We generated the single molecule image from this PSF model at a random 3D position, multiplied it with the number of photons/localization and added a constant background. Finally, we applied Poisson noise to the images. For simulations of sCMOS data, we added a pixel-dependent Gaussian noise. The code to simulate PSFs from the calculated cspline coefficients can be found in the Supplementary Software 1.

MLE fit using an sCMOS camera noise model
sCMOS cameras have become more and more attractive for localization microscopy due to their fast data acquisition even for large fields of view, low readout noise and relatively low price. However, their intrinsic pixel-dependent gain, offset and readout noise can create a localization bias, which has to be corrected when localizing the single molecule 20 . Gain g k and offset o k in pixel k can be taken into account when converting the camera image I k ADU in analog digital units (ADU) into photons: The readout noise, however, has to be taken into account during the fitting in the noise model and can be calculated from many dark camera images as the pixel-wise variance.
Here, we use the same model as proposed by Huang, et al. 20 which approximates the normal distributed readout noise (var k , in units of photoelectrons) with a Poisson distribution. By adding a pixel-dependent constant, var k , to the measured photoelectrons, one can expect the new value to approximate a Poisson distribution with a mean of u k + var k . Here, u k is the expected photon number in pixel k of the PSF model function. Therefore, in comparison to the conventional MLE fit for EMCCD data, only one more parameter, var k , is required for sCMOS data. Also var k is only kept in the global memory of the GPU. Compared to the EMCCD noise model, the speed performance of the algorithm was only reduced by less than 25% by additionally accounting for the pixel dependent readout noise ( Supplementary Fig.  12).

3D Fitting of SMLM data acquired with standard microscopes without 3D optics
As our code includes fitting with arbitrary PSFs, it is directly applicable on 2D data, acquired with an unmodified PSF in a standard microscope. A model for the unmodified PSF can be calculated directly from bead stacks, in an analogous way to engineered PSFs. However, the unmodified PSF has a high symmetry with respect to the focal plane (Supplementary Figure 1), making it difficult for an iterative fitting procedure to converge through the focal plane. To overcome this problem, we fit every localization twice: once with a starting parameter for the z-position 500 nm above the focal plane, and once with a starting parameter 500 nm below the focal plane. The maximum likelihood is then used to select the better fit. As a real PSF is not completely symmetric 33 , this breaks the degeneracy previously encountered when extracting z-positions in 2D data sets from only a single photometry or PSF size parameter 18 .
Due to the rather large size of the calibration bead (100 nm) and small inaccuracies during the averaging of many bead stacks, the cspline PSF model is slightly blurred compared to a single-molecule PSF. This had no apparent effect on 3D data, but in 2D data it lead to an accumulation of fitted localizations at the focal plane. To overcome this problem, we filtered the raw images with a Gaussian kernel (standard deviation σ < 0.5 pixels), thus applying the blur in the PSF model to the data. To find the right σ, we fitted a subset of the data with several values for σ = 0, 0.1, …, 0.5 and selected the σ value for which we found neither an accumulation nor a depletion of localizations around the focal plane.

Post processing
As the positions used above are all based on the objective positions, which differ from the true absolute positions due to refractive index mismatch, we further multiply the z-positions with a refractive index mismatch factor of 0.75 1 . Then, x, y, and z-positions were corrected for residual drift by a custom algorithm based on redundant cross-correlation. Localizations persistent in consecutive frames were grouped into one localization, and superresolution images were constructed with every localization rendered as a 2D elliptical Gaussian with a width proportional to the localization precision.
For measurements deep in the sample in combination with oil objectives, aberrations induced by the refractive index mismatch can be corrected for as described in Ref. 34.

Sample preparation of clathrin-coated pits in SK-MEL-2 cells
All samples were imaged on round 24 mm high precision glass coverslips No.

Sample preparation for imaging of the nuclear pore complex and microtubules
Wildtype U-2 OS and genome-edited U-2 OS cells that express Nup107-SNAP (as previously described in Ref. 37

Code availability
Source code for the software used in this manuscript is contained in Supplementary Software 1 and updated versions can be freely downloaded at github.com/jries/ fit3Dcspline.git.

Data availability statement
The datasets generated and analysed during the current study are available from the corresponding author upon reasonable request.

Supplementary Material
Refer to Web version on PubMed Central for supplementary material.  (a) Nup107-SNAP-AlexaFluor647 was imaged using dSTORM on a standard microscope without 3D optics. (b) Side-view reconstruction of the region denoted in (a). The nucleoplasmic and cytoplasmic rings of the nuclear pore complex, spaced 53 nm apart, can be easily resolved. The arrows denote nuclear pore complexes and their mirror images caused by misassignments. From the respective number of localizations, we estimated the fraction of misassignments to be ~5%. (c) Side view reconstruction of the same region using the size of the PSF S PSF from a fit with a symmetric Gaussian PSF model as a measure for the z-position. (d) Side view reconstruction of the same region using the photometry-based intensity ratio 18 P TRABI as a measure for the z-position. Corresponding localization precisions and profiles can be found in Supplementary Fig. 4. Scale bars: 1 μm.