Three dimensional single molecule localization using a phase retrieved pupil function.

Localization-based superresolution imaging is dependent on finding the positions of individual fluorophores in a sample by fitting the observed single-molecule intensity pattern to the microscope point spread function (PSF). For three-dimensional imaging, system-specific aberrations of the optical system can lead to inaccurate localizations when the PSF model does not account for these aberrations. Here we describe the use of phase-retrieved pupil functions to generate a more accurate PSF and therefore more accurate 3D localizations. The complex-valued pupil function contains information about the system-specific aberrations and can thus be used to generate the PSF for arbitrary defocus. Further, it can be modified to include depth dependent aberrations. We describe the phase retrieval process, the method for including depth dependent aberrations, and a fast fitting algorithm using graphics processing units. The superior localization accuracy of the pupil function generated PSF is demonstrated with dual focal plane 3D superresolution imaging of biological structures.


Introduction
Single molecule localization based superresolution operates on the principle of localizing the individual fluorescent molecules that label a sample [1][2][3][4][5]. In the image analysis process, the actual position of each fluorescent molecule is estimated with sub-diffraction precision. The resulting coordinate estimates are then combined to produce an image with given sufficient labeling density and localization precision [6], with sub-diffraction resolution. For 2D imaging (such as in total internal reflection mode), simple PSF models, (such as a 2D Gaussian model), have been found to be sufficient for imaging under typical conditions [7]. 3D imaging, such as that facilitated by dual focal plane [8,9] astigmatic imaging [10], and techniques that make use of more complex engineered PSFs [11,12] require a 3D PSF model, and therefore an accurate description of the 3D PSF is crucial to optimizing the localization accuracy in image processing. The 3D PSF model can be based on the experimental measurement of, for example, a subdiffraction sized fluorescent bead [8,13,14]. In this case, the system aberrations and fluorescence emission properties are inherently included. However, noise in the measurements can introduce inaccuracies into the model and, because the PSF can only be measured at a finite number of points, a continuous description of the PSF throughout 3D space requires interpolation between the discrete measurement points. The measured PSF is also difficult to modify in order to take into account additional aberrations, such as depth dependent spherical aberration caused by refractive index mismatch. Storage and use of a large calibration PSF stack is computationally cumbersome and can limit the speed of analysis -a very practical concern when data sets may contain images of more than 10 6 fluorophores.
Alternatively, the PSF model can be based purely on a theoretical model. One widely used theoretical model is the 3D Gaussian model [10,15]. The Gaussian model requires little computational overhead and therefore offers significant speed advantages in the localization algorithm. On the other hand, the Gaussian model only provides a good approximation to the 3D PSF within a limited spatial range near the focus. Furthermore, the Gaussian model also assumes that the optical system is free of aberrations, which in practice is rarely the case. More realistic theoretical PSF models are described by taking into account the vector property of light [7,16,17] and considering the fluorophore as a dipole emitter. The vectorial models provide a complete description of the PSF patterns from dipole emission and can be used to model fixed or freely rotating dipoles. However, these models exclude system specific aberrations that are important in practice.
To overcome the limitations of the PSF modeling approaches described above, a phase retrieval method for wide field microscopy was developed to retrieve the pupil function of the optical system [18,19]. The phase retrieval process makes use of images acquired with the optical system in question, and therefore inherently includes system specific aberrations. A realistic PSF computed from the retrieved pupil function has been used in 3D image deconvo-lution [19], and more recently, modified phase retrieval methods were investigated specifically for STED microscopy [20] and confocal microscopy [21].
A phase retrieval method was also recently described for use in 3D superresolution imaging using an engineered 'double-helix' PSF [22]. In that work, a phase retrieval process was used to model the PSF between the discrete measurement positions along the optical axis. The method, however, relies on local interpolation between z planes separated by about 100 nm, thus effectively using a different pupil function for each z plane and retaining the need to store and access experimentally acquired PSFs during localization.
In this paper, we describe a process for using phase retrieved pupil functions for 3D superresolution imaging with no need of measured PSFs during localization. We describe the phase retrieval process, PSF generation, and 3D single molecule localization. We also demonstrate how to account for depth-dependent spherical aberration induced by sample media refractive index mismatch and elucidate the role of supercritical angle fluorescence in phase retrieval and PSF generation. Finally, we demonstrate improved accuracy of the approach compared to unaberrated PSF models by performing superresolution using a dual focal plane geometry, imaging both immobilized beads and cellular structures.

Phase retrieval of the pupil function
According to scalar diffraction theory [23], the relationship between the pupil function and the point spread function is [19], I PSF (x, y, z) = P(k x , k y )e i2π(k x x+k y y) e i2πk z z dk x dk y 2 , where P(k x , k y ) is the pupil function in k space with a length k = n/λ , n is the refractive index of the immersion medium, and I PSF (x, y, z) is the 3D intensity PSF in Cartesian space. The defocus phase is described by the exp(i2πk z z) factor, where k z = (k 2 − k 2 x − k 2 y ) 1/2 . Our approach for phase retrieval and PSF generation is based on that described by Hanser et al. [19] where an iterative algorithm is used to retrieve P(k x , k y ) from a series of images of a small fluorescent bead collected with various amounts of known defocus. As in [19], we make the assumption that the fluorescent bead used for phase retrieval measurements contains a large collection of randomly oriented fluorophores and therefore the dipole nature of the emission of individual fluorophores results in a radial modification of the Optical Transfer Function (OTF). As in [19], we account for these effects with an OTF rescaling performed after calculating the integral in Eq. (1). OTF rescaling will be discussed further in a later section.
In this section, we describe the process of obtaining the pupil function from an experimentally measured PSF. The phase retrieval process consists of four steps: 1) PSF measurement; 2) Data preprocessing; 3) Phase retrieval; and 4) Expansion into Zernike polynomials. Each of these steps is described in the following subsections.

Data collection (PSF measurement)
Images of a single, isolated bead are collected and used for phase retrieval. The bead is approximately centered in the field of view and placed in focus. During data acquisition, the stage is moved through a range of ±1 µm about the focus position in 100 nm steps. At each axial position (which we take to be the z axis), 100 images are collected, each measuring 128 × 128 pixels. See section 5 for further details.

Data preprocessing
For each z position, we compute the average of the collected images, yielding a single image at each sampled z position. The resulting images are cropped to a size R 1 × R 1 , where R 1 ≈ 40 pixels, corresponding to 4 µm. This cropping preserves the core structure while eliminating those parts of the image that are indistinguishable from noise. The in-focus image is fitted to a 2D Gaussian intensity distribution and the position is used to define the center of the cropped frame for all images [ Fig. 1(a)].
After cropping, a mean background value is subtracted from each of the images. The mean background value is calculated for each image by computing the mean value of the pixels at each edge of the corresponding cropped image and choosing the smallest of the resulting four values. After the mean background has been subtracted, any resulting negative pixel values are set to zero, and a binary circular mask with a diameter of R 1 pixels is multiplied onto each image, effectively setting all pixel values outside the circle to zero [ Fig. 1(b)]. Finally, the size of each image is restored to the full 128 × 128 pixels by padding with zero-valued pixels [ Fig.  1(c)].
In this application of superresolution imaging, we use a dual focal plane setup (see section 5.1) and the preprocessing steps described above are applied to the recorded PSF images at both planes separately.

Phase retrieval process
The phase retrieval process is an iterative process that retrieves the pupil function of the optical system from a set of measured PSF images. The algorithm was originally proposed by Gerchberg and Saxton (Gerchberg-Saxton algorithm [24]), and further implemented in high numerical aperture microscope system by Hanser et al. in [18,19]. In our dual focal plane system, we apply a similar procedure to the measured PSF for each plane separately, thereby obtaining a pupil function for each focal plane. As detailed in [19], the phase retrieval algorithm takes as input the numerical aperture of the objective lens NA, the emission wavelength λ , the refractive index of the objective immersion medium n, the actual CCD pixel size ∆d, and the lateral magnification M. At each iteration of the procedure, the magnitude of the amplitude PSF is replaced by the square root of the measured intensity PSF. However, the magnitude of the generated PSF model never reaches zero at any point in space. This conflicts with our data preprocessing scheme for the measured PSF, in which negative pixel values and extended pixel values are set to zero. To prevent mis-convergence due to the relatively large number of zero-valued pixels following the data preprocessing described in the previous section, the zero-valued pixels in the extended PSF are set to the value of the corresponding pixels in the phase retrieved pupil function generated PSF (PR-PSF) from the previous iteration. This approach has the advantage of preserving the parts of the measured and processed PSF images [ Fig. 1(d)] that would otherwise be set to zero. Because the PR-PSF is still very different from the measured PSF during the first few iterations, this additional preprocessing step can cause the calculation to diverge if applied too early in the iterative scheme. We applied the modification only after a certain number of iterations i 0 . We then determined the optimal value of i 0 through a process described in section 2.5.

Zernike decomposition and compensation for x, y, z shifts
Zernike polynomials are a set of orthogonal polynomials that are widely used for describing aberrations in optical systems. We decompose the PR pupil function into Zernike polynomials, which has the following advantages: First, the fitted coefficients of Zernike polynomials indicate particular aberrations present in the optical system; and second, the diffraction integral can be written as a sum of one dimensional integrals, simplifying the PSF calculation (see Appendix A). We decompose both the magnitude and phase parts of the pupil function into the first 49 Zernike polynomials (the polynomials are ordered as in [25]).
The first few coefficients of the phase-fitted Zernike polynomials are used to estimate how well the measured PSF is centered on the cropped region and how well the calculated in-focus PSF image matches with the measured in-focus PSF.
To estimate the lateral shift of the PSF relative to the center of the cropped region, we calculate: where C 2 and C 3 are the 2nd and 3rd Zernike coefficients respectively. The axial shift was calculated by fitting the defocus phase 2πk z z shift to the 4th Zernike polynomial. The first-order Taylor expansion of the defocus phase can be expressed by the 4th Zernike polynomial plus a constant phase, so that the axial shift is found by minimizing the integral, where W 4 is the 4th Zernike polynomial and φ 0 is a constant phase. We found that the first order approximation is sufficient to minimizing defocus aberration, so that the 4th Zernike coefficient is less than 0.01. The lateral shift was minimized by shifting the original PSF images accordingly before cropping. The axial shift was minimized by redefining the z positions of the measured PSF. After correcting for x, y, z shift, the phase retrieval method described above was repeated, resulting in a Zernike expanded pupil function with lateral shift and defocus near zero.

Parameter optimization
Many of the parameters required by the phase retrieval process can be obtained by direct measurement or from product specifications; however, the optimal values of the parameters i 0 , R 1 , n and λ , as defined in the previous sections, may depend on the specifics of a particular sample and the experimental conditions. The optimal values of i 0 and R 1 are primarily affected by the spectra of the sample beads and the effective pixel size on the sample plane; n can be affected by the type and temperature of the immersion oil; and λ is primarily affected by the finite width of the beads' emission spectra and the used emission filters. Given that we ultimately desire minimum error in 3D localization, we optimize i 0 , R 1 , n and λ by minimizing the z localization error of the same bead data used in the phase retrieval process. The optimization is performed by the following steps: 1) Phase retrieval of the pupil function; 2) 3D localization of the bead data recorded at the z positions from -0.8 µm to 0.6 µm, in 0.2 µm intervals, and the bead data used in localization were preprocessed and averaged from the original data (section 5.4); 3) Calculation of the sum of square error (SSE) of the found z positions compared with the actual z positions, which is the set stage positions in z plus founded z shift (see section 2.4); 4) Update of the optimization parameters using a Markov chain Monte Carlo (MCMC) method [26], and return to step 1. The complete optimization typically requires about 30 iterations.

PSF generation
Under scalar diffraction theory, the PSF can be generated from Eq. (1), or when using Zernike polynomials, from Eq. (27) (see Appendix A). This PSF, directly calculated from the pupil function (PR-PSF), does not incorporate effects that originate from fluorophore dipole emission and index mismatch aberration. In this section, we describe modifications to the PR-PSF that address these issues as well as derive the aberration phase caused by index mismatch and clarify the role of supercritical angle fluorescence.

OTF rescaling accounts for dipole PSF broadening
Scalar diffraction theory does not account for fluorophore dipole emission, and results in a PSF with sharper structures than the measured PSF. Although the pupil function could be modified to correspond to a static dipole of arbitrary orientation [19], we proceed under the assumption that the fluorophore is freely rotating and appears as a collection of randomly oriented dipoles. In order to account for the effects of dipole emission, a 2D empirical function is applied to the PR-PSF, which operates like a Gaussian filter.
where F { f (k x , k y )} denotes a 2D Fourier transform, and symbol ⊗ denotes a convolution. G(k x , k y ) is the empirical rescaling function. This modified PR-PSF (mPR-PSF) is smoother than the PR-PSF and the full width of half maximum (FWHM) of the in-focus PSF is wider, which matches with the vectorial PSF model of a fluorophore with isotropic dipole emission [7].
mPR-PSF could be generated from PR-PSF that either comes from PR pupil function or Zernike fitted pupil function, however, as detailed in section 6.3.3 and Appendix A, we used the latter case for mPR-PSF calculation. G(k x , k y ) is a 2D Gaussian function fitted to the OTF ratio, which is defined as the measured 2D in-focus OTF divided by the retrieved 2D in-focus OTF. The fitting parameters are the magnitude I and the width of the Gaussian function in both the x and y axis, σ x and σ y . Typically, we find σ x = σ y . Because the covariance, σ xy , only results in a negligible term in F {G(k x , k y )}, we didn't include it in the Gaussian function.

Index mismatch aberration and supercritical angle fluorescence
Due to their high numerical aperture, oil immersion objectives are often used even when imaging cellular samples mounted in a water based medium. Imaging away from the cover glass will induce additional spherical aberration that is more pronounced as the distance from the cover glass increases. However, the resulting aberration can be accounted for with a depth-dependent modification of the pupil function.
The beads used for the phase retrieval process are adhered to the cover glass. Therefore, light coming from the bead would travel a negligible distance in the sample medium (reflection and transmission still occur at the interface) and consequently there would be no aberration caused by refractive index mismatch. Therefore, the phase retrieval result does not include such aberration. We adopt Gibson-Lanni's model [27] for estimating the aberrations induced by refractive index mismatch. We simplify this approach by using a two medium optical system consisting of the sample medium and the immersion medium, with refractive indices n 2 and n 1 , respectively, and assuming the refractive index of the cover glass and the objective lens are the same as that of the immersion medium. In Fig. 2(a), point P is the designed focal position of the objective lens, which is at the cover glass-sample interface, and t * g ,t * i are the designed thicknesses of the cover glass and immersion medium, respectively. In order to image a point source emitter inside the sample medium, the cover glass (dashed red lines) was moved closer to the objective lens, while the thickness of the cover glass remained unchanged (t g = t * g ). For a single emitter O (red dot, Fig. 2(a)) located at a distance z from the interface, the optical path difference of the corresponding ray coming from emitter O and designed focus P is, and the phase aberration at the pupil will be For high numerical aperture, for example NA = 1.45, and with n 2 = 1.33, n 1 = 1.52, the cos(θ 2 ) term could be imaginary, given that θ 1 satisfies the relation n 1 sin(θ 1 ) ≤ NA. This occurs when θ 1 is larger than the critical angle. Eq. (6) is still valid and the resulting imaginary phase will give rise to an exponential decay term in the z direction, which is physically the result of evanescent waves from near field dipole emission. If the fluorophore is very close to the cover glass (within about λ /2), the evanescent wave will penetrate into the cover glass and transmit at θ 1 above the critical angle, a phenomenon known as supercritical angle fluorescence (SAF) [28][29][30][31]. If θ 1 satisfies the relation n 1 sin(θ 1 ) ≤ NA, the SAF will be captured by the objective lens. Since our measured PSF for the phase retrieval process is taken from the sample with the beads adhered to the cover glass in aqueous solution, we assume that the transmission distribution at the cover glass-sample interface is recovered from the phase retrieval algorithm and also includes the transmission pertaining to the SAF effect. Because of the decay term, the numerical aperture will appear smaller when imaging depth increases, up to about 1.5 λ , where the numerical aperture converges to the refractive index of sample medium. No additional modification beyond the usage of the complex aberration phase in Eq. (6) is needed to correctly account for SAF effects. See Appendix B for the derivation of this result. located at an arbitrary z position. t * g and t * i are the designed thicknesses of the cover glass and the immersion medium respectively, while t g and t i are the corresponding thicknesses when the objective lens is moved away from the designed position. We assume the refractive index of the cover glass, the immersion medium, and the objective lens are the same and equal to n 1 , while n 2 is refractive index of sample medium. (b) In the paraxial range, the image of a fluorophore located at reference position z o is at the designed focus position P.
Whole-cell 3D imaging may require imaging depths up to tens of microns from the cover glass surface. Due to refractive index mismatch, the stage position that produces the best focus can be far from the actual depth of the emitter [32]. The quantity (t * i −t i ) is the actual movement of the stage in z, which is known and can be denoted as the stage position, z stage . For 3D emitter localization, it is convenient to introduce a reference position, z o , that is near the stage position and is defined as, and a relative z position, z , Figure 2(b) illustrates the physical meaning of the reference position, equivalent to the actual focal plane in [33]. With the fluorophore O located at the reference position, its paraxial image will be at the designed position P. After introducing the reference position, Eq. (6) can be rewritten in terms of z o and z : In Eq. (9), the third term can be interpreted as a relative defocus phase, while the first two terms represent the aberration phase introduced by the refractive index mismatch at a given stage position. Therefore, for accurate 3D whole-cell localization, ϕ aber should be added to the pupil function (see Appendix A), and the actual fitting parameter should be z .

Calculation of the PSF using Graphics Processing Units
In order to make our mPR-PSF based localization algorithm practical for localizing thousands or even millions of emitters, we implemented the PSF calculation on Graphics Processing Unit (GPU) hardware using NVIDIA's CUDA Architecture [34]. Our previous 2D localization algorithms have shown approximately two orders of magnitude speed improvement using GPU implementations as compared to CPU implementations [35,36]. In this case, the PSF generation itself is the predominant computational demand and only the PSF generation is implemented on the GPU. In the diffraction integral, we take advantage of the Zernike expanded pupil function to express the PR-PSF calculation as a one-dimensional integral (see Appendix A). The integral along the radial dimension is computed numerically using a quadrature method based on rectangle rules. The calculation is parallelized in CUDA such that one 'block' calculates a 2D PR-PSF with a particular defocus and one 'thread' calculates one pixel of the 2D PR-PSF. To make more efficient use of the GPU, many PR-PSF calculations are done in parallel, which in the localization algorithm corresponds to several individual fits, each needing 16 2D mPR-PSF images per iteration (See section 4). The OTF rescaling is performed in a second kernel call that performs a real space convolution of the PR-PSF to produce the mPR-PSF and truncates edge pixels to match the data region used in localization.

3D localization algorithm
Using the mPR-PSF obtained via the algorithm described in section 3.1, it is straightforward to model the signal from a single emitter (µ) for any position in space (x, y, z), any photon count (I), and any background level (bg): The localization algorithm described in this section is particular to a dual focal plane setup, but can be easily modified for other optical configurations. It is based on the maximum likelihood estimator (MLE) assuming a Poisson noise model [35][36][37] and finds the MLE using a Newton-Raphson method to iteratively update the fitting parameters. Under a Poisson process, the likelihood function for a dual focal plane method is, where N 1q and N 2q are the expected photon counts at the qth pixel in plane 1 and plane 2 respectively, while µ 1q , µ 2q are the expected photon counts of a single emitter model at qth pixel. The fitting parameter θ includes the x, y, z position of a single fluorophore in plane 1, the expected total photon counts, I 1 , for a single fluorophore in plane 1, and the background photon counts in plane 1 (bg 1 ) and plane 2 (bg 2 ). The corresponding values for plane 2 are easily computed once the values for plane 1 have been determined: The x 2 , y 2 position in plane 2 can be calculated from x 1 and y 1 by means of a linear transform matrix, and the z 2 position in plane 2 is related to z 1 by a known plane separation. The expected total photon count in plane 2 (I 2 ) is related to I 1 by a ratio factor I ratio . The background levels in plane 1 and plane 2, bg 1 and bg 2 , are fitted separately because they could come from auto-fluorescence or out-of-focus emission, and thus can vary independently.
To maximize this likelihood function, we use the Newton-Raphson method to find the zero of the objective function, where the symbol ∆ represents forward difference on discrete functions. The fitting parameter θ is updated through, Since the mPR-PSF can only be described by pixelized images, the derivatives of the log likelihood function, ln(L(θ )), are calculated numerically.
To ensure a proper convergence, the Newton-Raphson method requires an adequate initial guess of the parameters in question. Therefore, before the Newton-Raphson iterative calculation, we perform some basic calculations on the fitting data to ensure an appropriate initial guess of the parameters. The fitting subregions measure 16 × 16 pixels with an effective pixel size of about 106 nm on the sample plane. Each subregion usually contains a single fluorophore, located near the center. The initial x, y position of the fluorophore to be localized is calculated from the center of mass (COM) of a 5 × 5 pixel area at the center of the subregion in plane 1, which reduces the effect of fluorophores near the border of the subregion. The initial background value is estimated by calculating the mean value of the pixels at each border of the subregion and choosing the smallest of the four values. The background estimation is done for plane 1 and plane 2 separately. The initial intensity is estimated by summing over the subregion for plane 1 after subtracting the initial plane 1 background estimate. We computed the initial z localization estimate using a modified method based on [38]. We compute the center intensity ratio, Z est , which is the ratio of center intensities in plane 1 and plane 2, where the center intensity in each plane is calculated by summing over an 11 × 11 pixel region at the center of the subregion in each plane after subtracting the estimated background value. We found that this center intensity ratio is linear in z and can therefore be used as an initial z estimator. Using crimson bead data for Z est calibration, we obtained the following z dependence: where p 1 , p 2 are coefficients of linear fit. The z estimation for whole-cell imaging is more complicated. As introduced in section 3.2, with a given reference position z o , the actual fitting parameter in z is z . Thus, the z localization for whole-cell imaging consists of two steps: finding z for a given z o , and recovering the actual z position using Eq. (7) and Eq. (8).
The initial z estimation is the same as described above. However, different z o values will yield very different p 1 and p 2 values. Using results from simulated data, we found the linear relationships between z o and p 1 , p 2 : where the S i are coefficients of linear fits. For the dual focal plane method, the transform matrix that links the x, y position between the two planes can also affect the fitting result. The transform matrix consists of a series of linear transforms: scaling, translation, and rotation. In order to obtain the correct transform matrix, a set of bead data is taken by moving a single bead along x, y in equal increments across the full imaging area. Then the x, y positions of bead in both planes were found by performing a 2D Gaussian fitting of a 20 × 20 pixel subregion around the center of the bead image. Having obtained this set of coordinates in x, y, we were able to calculate the transform matrix using the fminsearch function in Matlab (MathWorks Inc.).

Dual focal plane setup
A schematic illustration of our dual focal plane setup is shown in Fig. 3. The emission beam was collimated through lens L1, and divided into two optical paths using a beam splitter (BS). The transmission path was a 4f configuration, with f=125 mm (L1, L2). A mirror, M2, was placed at the Fourier plane. In the reflection path, lens L3 (f=1000 mm) was placed about 90 mm before the imaging lens L2 in order to create a second focal plane at the CCD. A filter wheel was placed before L1 for the selection of different emission spectra. We used a bandpass filter (FF01-692/40-25, Semrock) for imaging bead samples and cell samples labeled with Alexa 647 fluorophores. The camera was an EMCCD camera (iXon 860, Andor Technologies PLC.), with a CCD size of 128 × 128 pixels and a pixel size of 16 µm (the actual pixel size is 24 µm, we used a 1.5 magnifier before L1, which resulted in a effective pixel size of 16 µm). The microscope was an inverted microscope (IX71, Olympus America Inc.). We used a dichroic mirror (650 nm, Semrock) and a TIRF objective (UAPON 150XOTIRF, Olympus America Inc.) with NA = 1.45 for recording data used in Fig. 4, 5, 6, 9, 10, while a TIRF objective (UAPON 100XOTIRF, Olympus America Inc.) with NA = 1.49 was used for obtaining data for Fig. 11 and Fig. 12. Our setup was slightly modified during the paper preparation: L1 and L2 had f=100 mm, M1 was removed, M2 was placed at 45 • relative to the incoming beam, the camera was replaced by an EMCCD camera (iXon 897, Andor, Andor Technologies PLC.), with a CCD size of 512 × 512 pixels and a pixel size of 16 µm. The new setup has a relative magnification close to 1 between both planes, so that the overall magnifications used through out the paper were not the same, as specified in each result.

Sample preparation
For the phase retrieval process, we used 20 nm diameter beads (FluoSpheres F-8782, crimson, Invitrogen) and 40 nm diameter beads (FluoSpheres F-8789, dark red, Invitrogen) diluted to 1 : 10 9 in deionized water. We added 5 µL of this diluted bead solution to 500 µL 1X PBS (phosphate-buffered saline) in a single well of a 8-well chambered coverslip (155411, LabTek, Nunc) and added 10 µL 1M NaCl to improve adherence of the beads to the coverslip surface. For measuring PSFs with refractive index mismatch aberration, we added dilute 40 nm diameter beads solution to 1X PBS medium with fixed RBL cells.
RBL cells in an 8-well chambered coverslip were also used for microtubule imaging. In this case, the labeling steps were : 1) cells were rinsed with 1X PBS at 37 • C and extracted with 0.1% Triton X-100, 3% BSA (bovine serum albumin) in 1X PBS for 10 s at room temperature (RT); 2) cells were fixed with 4% PFA (paraformaldehyde) in 1X PBS for 15 minutes at RT,

Data acquisition
The bead sample was excited with a 637 nm laser (laser diode, HL63133DG , Thorlabs, with home build collimation optics) at an excitation intensity of 0.1 kW/cm 2 . Data was acquired at a frame rate of 50 Hz, 100 frames per z position. The z = 0 µm position was set at the in-focus position of cover glass at plane 1, and data used in generating results in section 6 were acquired at z positions either from -1 µm to 1 µm, or from -2 µm to 2 µm, in increments of 100 nm by moving the sample with a piezo-driven nano-stage (Nano-LPS100, Mad City Labs Inc.) along z-axis. The cell sample was first excited with a 637 nm laser at low intensity (0.03 kW/cm 2 ), in order to find a region of interest. Then the excitation intensity was increased to 1.1 kW/cm 2 , and the sample was illuminated slightly out of TIRF for a few minutes, allowing most of the fluorophores to switch to the dark state. At this point, data acquisition was started at a frame rate of 50 Hz, with 2500 frames per z position. The whole-cell image data was taken from the cover glass to the top of the cell, with a z step size of 1 µm. During data acquisition, a 405 nm laser (DL-405-020 , CrystaLaser) was used to accelerate switching the fluorophores from the dark state to the active state. The laser excitation intensity was gradually adjusted from 0 to 6 W/cm 2 to ensure a proper density of excited fluorophores.

Data preprocessing for 3D localization
Our 3D localization algorithm relys on that the photon counts at each pixel are Poisson distributed over time (expected photon counts in section 4). Thus, a CCD offset was subtracted from original data, and the results were multiplied by a fitted gain factor, as detailed in [35,39]. After converting pixel values of original data into expected photon counts, we applied the method described in [36] to identify the single emitters in each frame. The method basically performed two filtering steps to reduce the Poisson noise and smooth out the data, then found the pixel coordinates of local maxima and set them as the centers of fitting regions. Each fitting region measured 16 × 16 pixels, corresponding to a size of 2.87 µm 2 at the sample plane. Then, all the fitting regions and the pixel coordinates of the corresponding upper left corner of each fitting region are fed into the 3D localization algorithm.
In the case of 3D localization of bead data, the pixel coordinates of each fitting region center were set to a constant, and each PSF was centered in the fitting region, by shifting the single frames or the averaged bead data in the x-y plane with the same amount used in the phase retrieval process. The averaged bead data were generated by averaging over 100 frames at each axial position.

Drift correction
We used drift correction for constructing a superresolution image from image data acquired from a sample with fluorescence labeled microtubules. The data was collected in 27 consecutive sequences, where each sequence consisted of 1000 frames. For each sequence, a 2D image was reconstructed from all the accepted fitting results, by replacing each fitting point with a circularly symmetric Gaussian, with a sigma four times smaller than the radius of the Airy pattern. A reference image was generated by averaging over all 27 reconstructed images. The shifts between all 27 reconstructed images and the reference image were calculated by using the function findshift in the DIPimage toolbox [40]. The findshift function utilizes the method of gradient-based shift estimation [41]. The x, y shifts were calculated by reconstructing the images in the x-y plane, while z shifts were obtained from the images reconstructed in the x-z plane.
6. Results and discussion 6.1. Phase retrieval result Figure 4 shows the pupil function obtained from the phase retrieval process and Zernike fitting, as detailed in section 2, and the mPR-PSF generated from the Zernike fitted pupil function with OTF rescaling (section 3.1). mPR-PSFs match well with measured PSFs in both planes of the dual focal plane setup, and are free of noise and artifacts. Fitting the pupil functions with Zernike modes, eliminated the noise visible in the PR pupil functions. The pupil functions for plane 2 are slightly smaller than for plane 1, due to a slightly larger magnification. We found that the optimization procedure (see section 2.5) yielded the following values for the unknown parameters in the phase retrieval process of the data shown in Fig. 4: i 0 = 7, R 1 = 44, n = 1.5132, and λ = 670 nm. However, the optimization parameters vary slightly for different measured PSF data sets under the same experimental conditions, so we typically choose the phase retrieval result that produces the best fitting accuracy (section 6.3.1) for 3D localization.  The data were acquired at z positions from -2 to 2 µm, with 100 nm step size. The images were generated by summing over the 100 frames taken at each z step. The displayed depths are the z positions found in the sample medium using the method presented in section 4. The mPR-PSF was retrieved from the measured PSF at the cover glass where the depth is 0 µm, and the aberration phase is given by Eq. (9). The S-PSF was generated from an unaberrated pupil function. Figure 5 shows different PSF models with refractive index mismatch aberration. Comparing the leftmost column to the center column in Fig. 5, it is clear that the mPR-PSF with aberration phase [Eq. (6)] matches the measured PSF quite well. The depth values listed in Fig. 5 indicate the bead's distance from the cover glass in the sample medium. These values were found by fitting the measured PSFs using the method described in section 4. The refractive index of the sample medium and immersion medium were 1.33 and 1.52 respectively. For a bead at the cover glass, the measured and mPR-PSF show small spherical aberration. This spherical aberration is partially canceled by the refractive index mismatch aberration at the depth of ∼ 1 µm, so that the axial dimension of the PSFs are smaller at a depth of 0.87 µm. The rightmost column was generated using the scalar PSF (S-PSF) model, which is calculated from the Fourier transform of an unaberrated pupil function. The scalar PSFs are sharper, as they do not include OTF rescaling (see section 3.1). A comparison of all three columns shows that the mPR-PSF matches the measured PSF well, and thus ensures better localization accuracy than the S-PSF.
In calculating both the mPR-PSF and the S-PSF, we used NA = 1.46, λ = 670 nm, a lateral magnification of M = 149.5 at both plane 1 and plane 2, and a CCD pixel size of ∆d = 16 µm.

Localization of beads and simulated data
6.3.1. Comparison of bead data localizations using various PSF models Figure 6 shows the 3D localization results for one set of bead data, where beads were imobilized on a cover slip and imaged repeatedly at various stage positions. This data was also used to generate the phase retrieval results shown in Fig. 4. Figure 6(a) compares z localization of bead data using various PSF models. The blue crosses and the superimposed red line show the fitting results obtained by using mPR-PSF (Fig. 4) as the PSF model for localization of the collected single frame data and the averaged data respectively, where the averaged data is the collected data after averaging over 100 frames at each z position. The green dashed line shows the fitting results obtained by using the S-PSF model with OTF rescaling (see section 3.1) (mS-PSF), while the brown dashed line shows the fitting results obtained by using only the S-PSF model. Both the green and brown dashed lines show the fitting results on averaged data. The plots show that our localization algorithm based on the mPR-PSF model performs well within a z range of ∼ 1.4 µm, compared to the S-PSF model, which produces large deviations [ Fig.  6(b-d)] in fitting accuracy. It is interesting to note that the S-PSF model plus OTF rescaling (mS-PSF) greatly improves the fitting accuracy, especially in z, which is also true for the PR-PSF model (not shown). This indicates that both the mS-PSF and mPR-PSF models are closer to the realistic PSF. The mPR-PSF model performs by far the best in fitting accuracy. Over a depth of ∼ 1.4 µm, the average z-position deviates by less than 10 nm from the set z-positions [ Fig. 6(b)], which is well below the typical axial localization precision achieved with single molecules; the localization precision in z and x, y are around 15 nm and 4 nm respectively [ Fig.  6(e,f)], under an expected photon count of around 4000 photons at plane 1. It is important to note that because the beads adhere to the cover glass, the mPR-PSF model is only correct for analyzing data with samples located at or near the cover glass when imaging samples with refractive index mismatch. Thus, when imaging samples far from the cover glass, the mPR-PSF needs to be modified to account for aberrations caused by refractive index mismatch (see section 3.2 and section 6.2) .

Theoretical localization precision
The localization accuracy can be estimated by calculating the Cramér-Rao lower bound (CRLB) [37,42], where F is the Fisher information matrix. For a discrete likelihood function, each element in F is given by Assuming there is no correlation between any two different pixels and under a Poisson process, we have Figure 7 shows the theoretical estimation error from the CRLB (solid line) of the 3D localization based on the mPR-PSF and the standard deviation (circles) of the found positions of simulated data using the same mPR-PSF model, which are in good agreement.  Figure 8 shows the speed of GPU-based localization for different numbers of Zernike coefficients and Newton-Raphson iterations. The plots were generated by fitting 1000 simulated single emitters at random x, y, z positions. We used the following procedure: First, localization was performed using only the first Zernike coefficient and 25 iterations to generate an initial guess for the second fit. We systematically varied the number of Zernike coefficients and the number of Newton-Raphson iterations for the second fit, and calculated the corresponding total fitting times and the averaged SSE (sum of square errors) over all localizations. The SSE is defined as, where N q and µ q are pixel values of the simulated data and the fitted model, respectively. The fitting time plots show that the total fitting time increases almost linearly with both the number of Zernike coefficients and Newton-Raphson iterations. From the SSE plots, we found that more than 25 Zernike coefficients ( Fig. 8(a), fixed at 15 iterations) or more than 5 iterations ( Fig. 8(b), fixed at 9 Zernike coefficients) do not decrease the SSE value significantly. We also noticed that the SSE drops significantly when using 9 Zernike coefficients while still being ∼ 2 times faster than when using 25 Zernike coefficients. We, therefore, chose 15 iterations and 9 Zernike coefficients as the preferred parameters for the second fit, which yielded a fitting time of 15.5 s for 1000 localizations. 6.4. 3D whole-cell reconstruction Figure 9 shows 3D whole-cell reconstructions using various PSF models. From the four images, it is obvious that using the mPR-PSF combined with the aberration phase [Eq. (6)] gives the best results [ Fig. 9(a)]: the z localization is continuous throughout the whole cell, and the number of accepted fits is greater than for the other three images [ Fig. 9(b-d)] under the same rejection threshold. It is important to note that, with refractive index mismatch aberration, the actual z position should always be positive, a negative z position has no physical meaning under supercritical angle fluorescence. We therefore rejected all the fits with negative z positions. The second best fitting result uses the mPR-PSF without the aberration phase. In this case, the z fitting is continuous up to 3 µm inside the sample medium: above this, the z fitting creates a separate band at each reference z position, z o [Eq. (7)], and the whole cell appears a bit stretched along the z axis. The seperate bands imply that mPR-PSF alone doesn't match well to the realistic PSF at depth greater than 3 µm. The bottom two images show the fitting results using the S-PSF model with no aberration: one [ Fig. 9(c)] takes the reference z position as defined in Eq. (7), and one [ Fig. 9(d)] assumes the reference z position is the same as the stage position.
For the S-PSF model, the z fitting is discontinuous throughout the whole cell, and the separate bands are much thinner compared to Fig. 9(b). This indicates that the S-PSF only matches to a realistic PSF in a small z range at each reference z position. Only about 7% of the total fits are accepted, compared to 60% in Fig. 9(a). Furthermore, when Eq. (7) is not taken into account, the cell is stretched to occupy the entire axial distance through which the stage was scanned, while the actual size of the cell is much smaller. These observations illustrate that a realistic PSF model is essential in 3D localization. Figure 10 shows the 3D localization of microtubules in a RBL cell using different PSF models. We see that the mPR-PSF model produces a relatively continuous and complete reconstruction of the labeled structures. In Fig. 10 Fig. 11(c) multiplied by the same decay term as in Fig. 11(b). In Fig. 11(a,c), with a single emitter located at the cover glass, both the theoretical and retrieved pupil functions have a bright ring near the edge due to supercritical angle fluorescence (SAF). However, Fig.  11(a) has greater contrast, and the ring is brighter towards the edge. This indicates a higher transmission loss in SAF during the experiment, so that the effect is weakened. In Fig. 11(b,d), with a single fluorophore at depth d = λ , both pupil functions become smaller due to the decay term introduced from index mismatch aberration (see section 3.2). For the theoretical model [ Fig. 11(b)], the intensity decreases towards the edge, while there is still a partial bright ring left in the Zernike fitted PR pupil function [ Fig. 11(d)]. This is probably caused by the noisy PR pupil function and the limitations of Zernike expansion in approximating effect of SAF. It is important to notice that here we used an objective lens with NA = 1.49, and a magnification of 100, while we, in Fig. 4, used an objective lens with NA = 1.46 and a magnification of 147. Comparing Fig. 11(c) with the Zernike fitted pupil function in Fig. 4, the SAF effect is more obvious in the former case. We assume that the phase retrieval process can retrieve the SAF effect, but under different optical conditions, the SAF effect may vary in strength. However, the following Figs. illustrate that a pupil function with a reduced SAF effect can also give acceptable fitting accuracy. Figure 12 shows the Zernike expansion of the pupil function to different Zernike orders n [25], where the number of Zernike coefficients is (n + 1) 2 . The SAF effect is more apparent with higher order, and beyond n = 7, the pupil functions are quite similar, which indicates that using n = 7 is sufficient to approximate the SAF effect. However, in 3D localization, using a large number of Zernike coefficients will significantly decrease the localization speed. We found that increasing Zernike order doesn't linearly increase fitting accuracy. Thus, it is better to use the lowest Zernike order that does not sacrifice fitting accuracy (see section 6.3.3).  Fig. 11. n is the order of Zernike coefficients, and each image shows the Zernike expansion up to nth order. The number of Zernike coefficients is (n + 1) 2 . In generating Zernike fitted pupil functions, we used λ = 670 nm, NA = 1.49, a lateral magnification of M = 100, and a CCD pixel size of ∆d = 16 µm. The refractive index of immersion medium is 1.52.

Conclusion
In this paper, we have demonstrated a new 3D localization method based on a modified phaseretrieved PSF, and successfully applied this method to 3D whole-cell imaging. The mPR-PSF is constructed from an experimentally obtained PSF, and therefore contains the aberrations of the optical system. Our PSF model is continuous and accurate in z over a range of ∼1.4 µm and allows for fast on-the-fly modeling of the realistic PSF, thereby eliminating the need to store and access a large experimental data set during localization. The demonstrated biological applications show that our localization algorithm can accurately localize a sufficient amount of fluorophores and generate a complete and accurate reconstruction of biological structures over large depths, something which is not possible with the tested simpler PSF models. We also showed that for 3D whole cell imaging, it is necessary to account for the aberration caused by refractive index mismatch. The results again illustrate that an accurate PSF model is essential for 3D localization. By implementing the PSF calculation on GPU, we were able to speed up the localization algorithm nearly 100 fold, thereby allowing to obtain 100,000 localizations in less than half an hour.