3D multifocus astigmatism and compressed sensing (3D MACS) based superresolution reconstruction

: Single molecule based superresolution techniques (STOR-M/PALM) achieve nanometer spatial resolution by integrating the temporal information of the switching dynamics of ﬂuorophores (emitters). When emitter density is low for each frame, they are located to the nanometer resolution. However, when the emitter density rises, causing signiﬁcant overlapping, it becomes increasingly difﬁcult to accurately locate individual emitters. This is particularly apparent in three dimensional (3D) localization because of the large effective volume of the 3D point spread function (PSF). The inability to precisely locate the emitters at a high density causes poor temporal resolution of localization-based superresolution technique and signiﬁcantly limits its application in 3D live cell imaging. To address this problem, we developed a 3D high-density superresolution imaging platform that allows us to precisely locate the positions of emitters, even when they are signiﬁcantly overlapped in three dimensional space. Our platform involves a multi-focus system in combination with astigmatic optics and an (cid:96) 1 -Homotopy optimization procedure. To reduce the intrinsic bias introduced by the discrete formulation of compressed sensing, we introduced a debiasing step followed by a 3D weighted centroid procedure, which not only increases the localization accuracy, but also increases the computation speed of image reconstruction. We implemented our algorithms on a graphic processing unit (GPU), which speeds up processing 10 times compared with central processing unit (CPU) implementation. We tested our method with both simulated data and experimental data of ﬂuorescently labeled microtubules and were able to reconstruct a 3D microtubule image with 1000 frames (512 × 512) acquired within 20 seconds.


Introduction
Compared with the conventional light microscopy, single molecule localization based superresolution, such as STORM and (f)PALM [1][2][3], provides more than a 10-fold improvement of spatial resolution, with a typical resolution of 20 nm. In single molecule localization microscopy, a random subset of fluorophores (emitters) are activated, imaged, and localized to nanometer resolution. This procedure is repeated allowing different subsets of emitters to be switched on and localized. A final superresolution image is reconstructed from a large number of frames. However, due to the iterative nature of the procedure, single molecule localization microscopy has poor temporal resolution, which greatly limits its application from fast dynamic processes.
One way to improve the temporal resolution is to increase the density of activated emitters in each frame so the total number of frames required to locate the same number of emitters is significantly reduced. However, a high density of activated molecules causes the signals to overlap, making it much harder to locate the emitters precisely. For 2D high density STORM/PALM imaging, a number of algorithms have been developed to efficiently locate overlapping emitters without sacrificing the accuracy [4][5][6][7][8][9][10][11]. However, a challenge still remains for 3D high density STORM/PALM imaging.
Very recently a couple of algorithms have been developed for 3D STORM/PALM high density imaging [12][13][14][15]. The challenge for 3D high density localization algorithms arises from the fact that the 3D point spreading function (PSF) is extended along the axial direction, and the PSFs of nearby emitters have more overlapping regions and similarities. All of the current 3D high density localization algorithms are associated with matched optics, i.e., matched 3D PSF. In [12], the authors used the astigmatic optics to acquire the depth information and applied the DAOSTORM technique to extract emitter positions; in [13], biplane optics in combination with compressed sensing techniques was applied. In [15], the authors applied the engineered double helix PSF. Additionally, in [14], a hybrid optical system, biplane with astigmatism, was applied.
In this work, we developed a multi-focus astigmatism compressed sensing based 3D imaging platform (3D MACS) that allows us to efficiently locate high density emitters. We applied a multi-focus technique, in combination with astigmatic optics to enhance the asymmetry of the PSF along the axial direction. Accordingly, we developed a new algorithm which takes advantage of compressed sensing but with a much faster 1 -Homotopy optimization method. More importantly, we developed a 2 norm based debiasing method, which not only reduces the grid introduced bias, an intrinsic drawback of compressed sensing based methods, but also significantly increases computation speed. With 3D MACS, we were able to reconstruct superresolution images with as high as 10 emitters/µm 2 density. By employing 3D MACS, we were able to reconstruct 3D microtubule images within a couple of seconds, which makes this method suitable for live cell imaging.

3D Multi-plane astigmatism imaging system
Our 3D multi-plane astigmatism STORM system ( Fig. 1(a)  electron multiplying CCD cameras (iXon Ultra 897, Andor Technologies PLC.) are used to image three different focus planes with all of the EM gain set to 255 and the distance between each focus plane is 180 nm. The three cameras are synchronized by an external hardware trigger. Three identical plano-convex cylindrical lenses with focus length of 1000 mm (Thorlabs) are placed on the emission paths to introduce asymmetry of the 3D PSF. All instruments and acquisition are controlled by home-written LabVIEW programs (National Instruments Corp.).

3D astigmatic PSF and multi-plane alignment
To acquire the calibration curves of all the three cameras, we attached 100 nm fluorescence beads (Bangs Laboratories Inc.) to a cover slip and imaged the beads at different z positions by moving the piezo stage along the axial direction. Figure 1(b) shows images of the beads where each camera is focused on a different z plane. For each focus plane, the bead images are fitted to an asymmetric 2D Gaussian function (Eq.1) to get w x and w y .
where I is emitter intensity, b is the background, (x 0 , y 0 ) is the center position of the molecule, and w x and w y are the width of the PSF in the x and y directions respectively. The dependence of w x and w y on the emitter axial position z 0 ( Fig. 1(c)) is fitted using Eq.2, where w 0 is the width of PSF at focus, c is the offset from the focal plane, d is the focus depth, A and B are used to correct for the non-linearity of the optical system. The fitted parameters are used to generate simulated z-dependent PSF in the simulation.
To align the three cameras, we took images of fluorescent beads that are sparsely attached to a coverslip. For a typical 512 × 512 image, the average number of beads is about 50. We moved the piezo stage along the z direction 100 steps with a step size of 10 nm. The total travel distance (1000 nm) covered the whole depth of the image (800 nm). The process was repeated 10 times to get an average. The centers, (x 0 , y 0 ), of the fluorescence bead images are determined using Eq.1. All the bead centers from the second and third cameras are paired with those on the first camera. Local weighted mean transformation [16] is applied to register the three camera images. The registration error, σ r = σ 2 x + σ 2 y , for the second and third cameras using the first camera as the reference is within 6 nm (Fig. 2).

3D 1 -Homotopy compressed sensing
For a signal x ∈ R n , which is itself sparse or can be made sparse after a transformation, a measurement system that acquires n linear measurement can be mathematically represented as: where A is an m × n matrix, y ∈ R m is the observed corrupted noisy signal and x ∈ R m is the signal. The compressed sensing (CS) theory states that given x is sparse or can be made sparse after a transformation, x can be precisely recovered from the observation y by solving the following 1 optimization [17]: where ε is a constant that controls the trade off between the fidelity to the data and the sparsity of the signal. A large ε enhances the sparsity level of x and allows larger deviation from the acquired data, while a small ε results in higher fidelity to the acquired data and lower sparsity of x.
As first described in [8], an observed camera image y is a linear measurement of the underlying sparse emitters that locate approximately on a discrete grid spacing much smaller than the camera pixel size. Each camera frame is the convolution of the point spread function (PSF) in the matrix form of A, and x with each entry representing the brightness of a molecule located at the point. In the case of 2D imaging, the PSF was approximated by a 2D Gaussian function [8].
In the case of 3D single focus astigmatic configuration, Eq.(1) was used to approximate the system PSF, with w x and w y as a function of the axial position z. Here, y is concatenated from the measured image to form a one-dimensional vector, and the true emitter location x is also a one-dimensional vector. In case of the 3D multi-focus astigmatic imaging, y is concatenated from the three camera images to form a one-dimensional vector. The true emitter location x is also a one-dimensional vector that is concatenated from a 3D high-resolution grid, and the measurement matrix A is of the size m × n where m is the number of pixels in y and n is the number of pixels in x. The measurement matrix is determined by the system PSF. Typically m is much smaller than n which means the measurements are much smaller than the signal.
Several methods have been developed to reconstruct the original signal x through the 1 minimization (Eq.(4)). The interior point method was used in [8,13] and utilized the implementation of the CVX package [18]. Another method is the 1 -Homotopy (L1H) algorithm that was used by Babcock et al. in [9]. The authors showed that the L1H algorithm is roughly 10-250 fold faster than the CVX method without sacrificing the localization accuracy [9]. We apply the L1H algorithm in our 3D image reconstruction to make use of the rapid reconstruction rate compared to other methods. Here, the choice of the ε in Eq.(4) is critical. For simulated images, ε of value 1.5 is used to account for the Poisson noise. For real data acquired by the cameras, a factor of √ 2 is applied due to the noise introduced by the gain registers and hence ε of value 2.1 is used [19].
To reduce the computational complexity of the large optimization problem of 3D image reconstruction, which grows exponentially with the image size, we divide the image into small blocks and perform the optimization on these blocks individually. Since the cylindrical lens introduces de-focusing, the system PSF now spans a larger area than in 2D STORM. Hence, we use block size of 20 × 20 and have 4 pixels overlapping with neighboring blocks on each side. For each pixel, we divide it into a 8 × 8 × 11 voxel, which spans from -0.4 µm to 0.4 µm in the z direction. Therefore, the measurement matrix A is of size (20×20×3, 20×20×8×8×11), which is (1200, 281600). Using floating-point numbers, A takes 1.35 GB of memory, which can easily fit into the memory of a modern computer. In our experiment, we used a Nvidia Tesla K40 GPU that has a global memory of 12 GB. The acquired x represents a 3D up-sampled grid with the value at each entry representing the intensity of the emitter located at the corresponding grid. Algorithm 1 shows the detailed implementation of 3D compressed sensing.

Debiasing
The finite discrete grid formulation in CS introduces bias to the estimation of the emitter locations and reduces the resolution of the final reconstructed superresolution images [20]. Here we introduce a debiasing step followed by a 3D centroid calculation, which significantly reduces the bias and improves localization accuracy. After the CS step, we extract the support of the reconstructed x (denote as x S ) and the corresponding measurement matrix A S . The support of a vector is the set of all the non-zero elements of the vector. First, we dilate x S and then minimize the least square objective y − A s x s 2 2 to obtain the best fit to the data over the dilated support of x under non-negative constraint [21]. In the dilation step, we use a total of 75 voxels, 5 along the x and y directions and 3 along the z direction. The physical size of the dilated volume is 45 nm × 45 nm × 240 nm. We used a larger physical length (240 nm) to compensate the lower resolution along the z direction. We also tested other dilation sizes and found that the performance is not sensitive to the dilation sizes. The 3D weighted centroids of the optimized x S are returned as the estimated positions of the emitters. Algorithm 2 shows the detailed implementation of the 3D local debiasing.

Single-focal SEA and Multi-focal SEA
To compare 3D SACS with single emitter fitting based methods, we implemented the SEA method that was developed for single focal plane reconstruction (SF-SEA) [1,22]. Briefly, wavelet based de-noising followed by thresholding was used to find the candidate emitters. Then the candidate emitters were fitted with a 2D Gaussian function to get w x and w y , which were used to determine the z location using the calibration curve defined in Eq.(2). SF-SEA was not designed to work for the multi-focal system. To compare with the 3D MACS, we developed and optimized a SEA method for the multi-focal system (MF-SEA). Specifically, the following optimization was performed.
where θ = (I, x 0 , y 0 , z 0 , b) and f θ ,m (i, j) is defined by Eq. (1). i, j, and m are the indices of the pixel and the focal plane, respectively. y m represents the image from the focal plane m.
The essence of the MF-SEA is that we optimize the cost function Eq.(5) with all the focal planes taken into consideration simultaneously instead of optimizing the cost function for each focal plane individually. Our current version of MF-SEA works only for the isolated emitters without overlapping. Future work, such as including the Bayesian Information Criterion (BIC) for model selection, may be developed to improve the method to locate overlapping emitters in 3D.
Following the optimization is the filtering step. We used the ratio of the residual to the original image to characterize the quality of fitting. We defined, which is used to determine whether the candidate emitter would be kept. A low β value represents higher fidelity to the data. We kept candidate emitters with β less than 1.5 to be consistent with the criterion used in 3D MACS and 3D SACS (Eq. (3)).

Cell culture and immuno-fluorescence staining
To prepare samples to test our system on experimental data, Dulbeccos modified Eagles medium (DMEM, Invitrogen) is supplemented with 10% FBS, 100 U/mL penicillin and 100 µg/ml streptomycin and warmed to 37 o C before adding to cells. 250,000 HeLa cells are seeded onto a 35-mm, No. 1.5 glass window dish (MatTek Corp) at 37 o C with 5% CO 2 . After culturing cells for 24 hours, cells are fixed at room temperature in a fixation buffer (3% paraformaldehyde and 0.1% glutaraldehyde in PBS) for 10 minutes, and then rapidly washed 3 times with PBS. Cells are blocked in blocking buffer (3% bovine serum albumin (BSA) and 0.2% Triton X-100) for 1 hour at room temperature. After blocking, α-tubulin (Sigma Aldrich) is diluted in blocking buffer and applied to cells for 1 hour at room temperature. Cells are washed 3 times 5 minutes each in PBS on a rocking platform and subsequently treated with appropriate secondary antibody, in this instance Mouse-IgG, conjugated to AlexaFluor647 for 30 minutes. Antibodies are secured through a post-fixation step using the fixation buffer for 5 minutes. Imaging buffer (10% glucose, 50 mM Tris pH 8.5, 10 mM NaCl, 14mg Glucose Oxidase, 50µL 20mg/mL catalase, and 1X β -mercaptoethanol (Sigma Aldrich)) is used throughout experimentation.

Results and discussion
3.1. 3D single-focus astigmatism compressed sensing (3D SACS) and multi-focus astigmatism compressed sensing (MACS) methods To test our 3D SACS and 3D MACS methods, we simulated N molecules that were randomly distributed in a volume of 0.9 µm × 0.9 µm × 0.8 µm. These molecules were imaged either on one camera (3D SACS) or on three cameras (3D MACS) with the experimentally measured PSF model (Eq. (1) and (2)). In all the simulations, the pixel size was 75 nm, which matches our experimental pixel size. In the case of MACS, the distance between two consecutive focus planes was 180 nm. All the parameters were chosen to match the measured experimental values. An average photon number of 1500 per emitter was used in the simulation. In the case of 3D MACS, the total number of photons was evenly distributed on the three cameras. Both Poisson noise and Gaussian noise were added to model photon noise and camera readout noise respectively. By adjusting the number of emitters N per frame, we were able to control the emitter density. As shown in Fig. 3(a) and 3(b), at a low emitter density (3 emitters/µm 2 ), both SACS and MACS were able to correctly identify all the emitters with similar localization accuracy. However, when the density was increased (8 emitters/µm 2 ), MACS performed markedly better. As shown in Fig. 3(c) and 3(d), MACS was able to recover all the emitters without any false positive; while SACS gave two false positive emitters. When the density was increased even more, the differential behavior of MACS and SACS became more obvious. Astigmatism induced by the cylindrical lens breaks the symmetry of the PSF, providing depth information. However, as noticed in [13], the oval shape of the PSF caused by astigmatism makes it more difficult to distinguish overlapping emitters. For example, an off-focus emitter forms an elliptical spot on the camera, while two overlapping on-focus emitters can also form an elliptical image. In this case, it is impossible to differentiate whether the observed spot is formed by an off-focus emitter or two overlapping on-focus emitters, unless extra information  . We used an ellipsoid to represent the x, y and z locations of an emitter. The center of ellipsoid was determined by the x and y positions and the shape was determined by the z position. Red represents the results from our calculation and white represents the actual location from the simulation. When the emitter density increases (c, d), 3D MACS (lower right panel of (c)) correctly identified all the emitters, while SACS clearly showed ambiguity (lower panel of (d)) with two wrongly detected emitters (colored yellow).
is provided. Since we apply three different focuses in MACS, the two on-focus emitters will look different from the off-focus emitter on the other two focus planes. Therefore, we are able to eliminate ambiguity and improve MACS performance over SACS, as shown in Fig. 3(c) and 3(d), by using additional information provided by extra cameras at a cost of lower photons per camera. In our study, we used a Gaussian PSF model to generate the measurement matrix A in Eq.(3). Due to the aberration induced by the refractive index mismatch, particularly at deeper site in the sample, the Gaussian PSF model can have limitations. The Gibson-Lanni [23,24] PSF model could be applied when the astigmatism induced by the cylindrical lens is taken into consideration. Adaptation of the Gibson-Lanni PSF model to our 3D MACS can potentially lead to an improvement in 3D resolution.

Debiasing improves localization accuracy and speeds up reconstruction
As noted in [8] and further discussed in [10,14], CS is based on a discrete formulation although the emitters locate over a continuum. This formulation intrinsically introduces bias and reduces the localization accuracy. In [8], the authors used geometry centroids from the reconstructed high resolution images as the coordinates of the emitters, which partially resolved the bias issue. The bias introduced by the discrete grid becomes dominant when the grid size gets bigger. In our 3D MACS, the up-sampling factors in the x, y, and z direction are 8, 8 and 11 respectively, which gives rise to a 9 nm×9 nm×80 nm voxel. Due to the larger length along the z direction, the bias is much bigger along the axial direction.  show localization error along the axial direction before and after the debiasing step respectively. Before debiasing, the periodic bias is evident. This is introduced by the finite size of a voxel. Emitters that are closer to the borders of a voxel tend to fall onto the edges of the voxel. After debiasing, the periodic bias was eliminated as shown in Fig. 4(b).
We also assessed how debiasing improves localization accuracy along the x and y directions. As shown in Fig. 4(c), debiasing provides limited improvement along the two directions, consistent with the observation in [8], because of the much smaller length scale (9 nm). In contrast, the improvement along the axial direction is apparent due to the larger length of the voxel along the z direction. As expected, decreasing the voxel length along z by increasing the up-sampling factor improves the localization accuracy ( Fig. 4(d)). However, the improvement curve levels off when the up-sampling factor gets to about 10, in which case the voxel size along z direction is small enough to minimize the bias. Debiasing not only improves the localization accuracy but also speeds up the reconstruction process. The most time consuming part of the reconstruction process is the 1 optimization step, which heavily depends on the choice of the value of ε. Choosing a lower value of ε increases the optimization fidelity, minimizing the error between the estimated emitters and the true locations of the emitters, but at the cost of a longer computation time. On the other hand, a higher value of ε accelerates the computational speed at the cost of a larger localization error. We tested the effect of ε on the computation time and performance. As shown in Fig. 5(a), the computation time reduces significantly with the increase of ε, while the localization errors ( Fig. 5(b)) slightly increase but still within the acceptable range (<30 nm in x, y and z directions). In the simulation, the emitter density we used is 8 emitters/µm 2 . We also tested the false negative (FN) and false discovery rate (FDR, Fig. 5(c) and 5(d)). Both the FN and FDR stay below 20% when epsilon increases from 1.5 to 2.5. All these results suggest that we were able to gain computational speed without sacrificing performance.

Performance evaluation of 3D SACS and 3D MACS
To demonstrate the performance of our 3D SACS and 3D MACS methods, we generated a series of simulated 20 × 20 camera pixel STORM movies across a range of emitter densities (0.1 emitter/µm 2 to 10 emitters/µm 2 ) and analyzed them with SF-SEA, MF-SEA, 3D SACS, and 3D MACS. Figure 6(a) demonstrates that both 3D SACS and 3D MACS could be used to identify as high as 10 emitter/µm 2 density of emitters with acceptable lateral and axial localization accuracy. Both SF-SEA and MF-SEA could not be used when emitter density is higher than 1 emitter/µm 2 . In addition, 3D MACS shows superior performance in identifying high density emitters and better x, y and z localization accuracy than 3D SACS. Figure 6(b) also shows that the false discovery rate of 3D MACS is much lower than 3D SACS. The false discovery rate is defined as the percentage of the incorrectly detected emitters in all detected emitters. It is important to note that in Fig. 6(b), the false discovery rate of both SF-SEA and MF-SEA are lower than the 3D MACS. This lower false discovery rate is due to SEA eliminating any ambiguous detection, resulting in a low identified emitter density as shown in Fig. 6(a). Figure 6(c) and 6(d) show the localization error in x, y and z directions for 3D MACS, 3D SACS, SF-SEA and MF-SEA. 3D MACS achieves the lowest localization error with 3D SACS performing slightly worse. We also studied the computation time as a function of emitter density using 3D MACS and 3D SACS methods implemented on a GPU (Nvidia Tesla K40). Figure 7 shows that the compu-  (20 × 20) increases as the emitter density increases. Due to the increased computational complexity, 3D MACS takes about double the time of 3D SACS. However, when the improved localization accuracy and detection rates are considered (Fig. 6), in general, we recommend using 3D MACS rather than 3D SACS wherever practical. In addition, the computation time can be further decreased with the utilization of multiple GPUs in parallel, such as the Nvidia Maximus multi-GPU system (Nvidia, CA). To evaluate the performance of 3D MACS on structures with variable emitter densities, we simulated three different types of structures: a sphere, three crossed lines, and a 3D stellate (Fig. 8). The radius of the sphere is 60 nm, the minimum distance in z direction between the two crossed lines is 460 nm and the distance between two consecutive branches of the stellate is 57 nm. In the simulation, the densities used are all 8 emitters/µm 2 . The number of frames used is 3000. For MF-SEA, due to the low detection rate under high emitter density, the reconstruction of all the structures are not satisfactory. On the other hand, 3D MACS is able to reconstruct them clearly. For the sphere (Fig. 8(a)), the XY, XZ, and YZ cross sections of the sphere clearly show that the spherical structure is resolved. The three crossed lines (Fig. 8(b)) simulate linear structures that are frequently observed in biological experiments, such as the microtubule structures. The artificial 3D stellate structure (Fig. 8(c)) demonstrates the ability of 3D MACS in resolving gradually changing structures.

XY
In this study we used either one or three focal planes, however, our method can be applicable to other multifocal microscopy system [15,25]. With the increase of the number of focal planes, the coverage of the sample depth increases. For example, Abrahamsson et al [26] have developed a system using a 9-plane setup to track molecules with depth of 2.25-18 µm. Since the total number of photons of a fluorophore is limited, the number of photons collected at each focal plane decreases with the increase of number of focal planes, which will compromise the localization accuracy. Therefore, it is important to balance the number of focal planes and their spacing toward achieving an optimum spatial resolution [27]. Figure 9(a) shows the reconstructed 3D microtubule image of a fixed HeLa cell using 1000 raw frames with our 3D MACS method. The frame rate used in our experiment was 56 frames per second (fps), which means that we were able to acquire the image in less than 20 seconds. Even with only 1000 frames, we were able to reconstruct very continuous 3D microtubule images. Figures 9(b) through 9(d) show the zoom-in region, averaged image, and one typical raw frame, respectively. Fig. 9(e) shows the YZ cross section of the region specified by the selected region in (b). The overlay of two microtubules is clearly seen in the cross section and Fig. 9(f) is the line profile along z direction. We observe 35 nm and 36 nm of full width at half maximum (FWHM) in z direction with the distance between two microtubules as 112 nm.

Conclusion and future work
We have developed a multi-focus astigmatism imaging system for 3D superresolution imaging (3D MACS and 3D SACS). The designed optical system, in combination with 1 -Homotopy procedure followed by 2 debiasing, allows us to acquire 3D superresolution images in a much shorter time span with high emitter densities, as high as 10 emitters/µm 2 . The introduced 2 debiasing step and the 3D weighted centroid refinement allow us to reduce the stringency of ε (Eq.(3)) to achieve faster reconstruction speed. We also implemented the algorithm on a GPU to reduce reconstruction time. Currently, we implemented our algorithm mostly using Matlab. By porting the algorithm completely to C/C++, we expect to further accelerate the reconstruction.