Parametric comparison between sparsity-based and deep learning-based image reconstruction of super-resolution fluorescence microscopy

: Sparsity-based and deep learning-based image reconstruction algorithms are two promising approaches to accelerate the image acquisition process for localization-based super-resolution microscopy, by allowing a higher density of fluorescing emitters to be imaged in a single frame. Despite the surging popularity, a comprehensive parametric study guiding the practical applications of sparsity-based and deep learning-based image reconstruction algorithms is yet to be conducted. In this study, we examined the performance of sparsity- and deep learning-based algorithms in reconstructing super-resolution images using simulated fluorescent microscopy images. The simulated images were synthesized with varying levels of sparsity and connectivity. We found the deep learning-based VDSR recovers image faster, with a higher recall rate and localization accuracy. The sparsity-based SPIDER recovers more zero pixels truthfully. We also compared the two algorithms using images acquired from a real super-resolution experiment, yielding results agreeing with the results from the evaluation using simulated images. We concluded that VDSR is preferable when accurate emitter localization is needed while SPIDER is more suitable when evaluation of the number of emitters is critical.


Introduction
The past two decades have seen revolutionary breakthrough in the resolution of fluorescence microscopy, which was restricted by the diffraction limit of visible lights at approximately 250 nm. Such breakthrough has led to answers to many important questions. For example, super-resolution microscopy revealed the detailed structure of focal adhesion on the scale of nanometers [1]. Discoveries like this would not be possible without super-resolution microscopy. Among the super-resolution microscopy techniques developed over the years, the single-molecule localization microscopy (SMLM) [2,3] is the most commonly used. To resolve one emitter from another localized within the distance of diffraction limit using this method, emitters which stochastically "blink" between the bright ("ON") and dark ("OFF") states are employed. Multiple images of the same field are then acquired, and each of the image contains a paucity of "ON" fluorophores. The accurate position of these few "ON" emitters can then be determined by deconvolution in the frequency domain or 2D profile fitting in the spatial domain, if the point spread function (PSF) of the emitter is given. The super-resolution image is obtained by projecting all the deconvolved or fitted images onto a single plane. However, because low emitter density is required, this deconvolution approach dictates a time-consuming process of acquiring a large number of images, each containing few emitters. In general, the acquisition time ranges from minutes to hours. The long image acquisition time precludes dynamic cellular processes at the timescale of seconds or faster to be studied using super-resolution microscopy. Therefore, algorithms capable of resolving single emitters at higher density, compared to deconvolution/profile fitting based on PSF, are highly desirable. Furthermore, algorithms capable of resolving single emitters at higher density might be applied to enhance fluorescence images in general. This application will be especially helpful for research scientists who are hindered by the lack of access to super-resolution imaging instruments due to the high cost or high skill level required to operate the instrument.
In this study, we examined the performance of two alternative approaches than deconvolution/profile fitting, one based on sparsity and the other deep learning, by evaluating their capability of resolving single emitters at higher density. Both sparsity-based and deep learning-based algorithms have been used in reconstructing images of fluorescence microscopy, including localization-based super-resolution images [4][5][6]. The problem of super-resolution image reconstruction is essentially ill-posed for images that contain a high density of emitters, as many high-resolution (HR) images are possible solutions that result in the same low-resolution (LR) image. To identify a likely solution among many possible candidates, sparsity-based approaches introduce constraints on sparsity, assuming the image with the lowest emitter density is the likeliest solution. To obtain the likely solution, deep learning-based approaches compute the product of operator matrices and the low-resolution image, where the values of individual elements, or weights of individual nodes, in the operator matrices are optimized during training. Training is the process during which numerous pairs of corresponding low-and high-resolution images are used to deduce the operator matrix. Recently, many deep learning- [7][8][9][10][11][12][13][14] and sparsity-based [4,5,[15][16][17][18] methods have been developed for SMLM. These recent methods have demonstrated potential advantages over conventional, single-molecule fitting methods. The premise of deep learning-and sparsity-based methods is that it might allow more emitters to exist at the "ON" state simultaneously, thereby significantly shortening the time required for imaging in super-resolution experiments, with comparable resolution to single-molecule fitting methods at low emitter densities. For example, Nehme et al. reported a deep learning algorithm in 2018 which could localize emitters in high density (9 µm −2 ) accurately within 31 nm [7]. Similarly, a sparsity-based algorithm developed by Hugelier et al. in 2016 reached ∼50 nm accuracy at high density scenario [4]. Notably, deep learning-based methods have also proven to cost less computational resources, allowing even smartphones to gain super-resolution capabilities [13]. Yet, programs implementing these two new approaches have not been examined as comprehensively as the ones implementing the conventional single-molecule fitting approach. Notably, while SMLM 2013/2016 challenges [19,20] offered extensive insight to the capabilities of many SMLM algorithms, no deep learning-based algorithm was evaluated in these reports. Moreover, a not uncommon scenario was not examined in the challenges, in which the average emitter density in the whole field is not considered high but all emitters are concentrated in a few local regions of the field due to the structural nature of the subcellular organization. In such case, when all emitters of the "ON" state are simultaneously concentrated in a local region, the emitters appear to be connected against of a vastly dark background (see Fig. 1 "connected emitter"). When most emitters are activated simultaneously, this problem can be viewed as a Single Image Super-Resolution (SISR) task. Over the years, many deep learning algorithms have been developed for SISR given their capability of efficiently extracting features that map LR images to HR images [21], yet the accuracy and efficiency of the SISR algorithm in reconstructing SMLM images have not been examined or compared to other algorithms in any of the previous studies. To fill this knowledge gap, we evaluated the performance of sparsity-based and deep learning-based algorithms when adjusting two variables in the image: emitter density and connectivity. The emitter density, or sparsity, is highly correlated with the difficulty to accurately reconstruct high-resolution images, as more emitters present in an image makes the precise localization of each individual emitter more difficult. The connectivity of emitters further imposes challenges to image reconstruction algorithms. This is because the overlapping PSFs of connected emitters result in high emitter density locally, even when global emitter density of the image is relatively low. However, the effect of connectivity on accuracy of computational super-resolution algorithms has rarely been explored. This gap might be a result of the stochastic photo-activation of emitters in typical SMLM experiments, which makes activated emitters rarely connected even though all emitters may exist in proximity of each other.
A deep learning-based method Very Deep Super Resolution (VDSR), designed for SISR image reconstruction, and two sparsity-based methods, SParse Image DEconvolution and Reconstruction (SPIDER) and GrEedy Sparse PhAse Retrieval (GESPAR) were tested in this study. VDSR was chosen because it has been widely applied in enhancing resolution in a variety of applications. While algorithms with better performance on specific tasks have been developed more recently, VDSR remains one of the best performing deep learning-based algorithms overall [6,21] and is likely the most available to all scientists through MATLAB. It was previously demonstrated that both SPIDER and GESPAR could effectively enhance image resolution provided the true signals (i.e. non-zero pixels) in the images are sufficiently sparse. SPIDER [4] has shown superior performance over FALCON [15], the latter being the best performing compressed sensing (sparsity-constrained deconvolution) algorithm in the SMLM2016 challenge [20]. GESPAR [5] is another well-cited compressed sensing algorithm that was included in the study initially, despite showing less satisfactory performance compared to the other methods.

Simulated fluorescence microscopy images with various degrees of sparsity and pixel connectivity are generated for systematic evaluation
In order to compare the performance and precision of the algorithms in recovering images of higher resolution from lower resolution, we synthesized ground truth images to simulate the scenario where every single fluorophore occupied a single pixel of a prescribed coordinate. These images were to be compared with the recovered images by different algorithms, for the purpose of evaluating image recovery performances of the algorithms. The size of the ground truth images was 1024 by 1024 pixels. The sparsity level of the ground truth images, namely the percentage of non-zero pixels, ranged from 0.61% to 6.9% (0.61%, 0.86%, 1.2%, 1.7%, 2.4%, 3.4%, 4.9%, 6.9%). For each sparsity level, three types of images were synthesized: the non-adjacent single-emitter type, the single-emitter type, and the connected-emitter type. In the non-adjacent single-emitter images, each non-zero pixel is surrounded by eight neighboring pixels of zero intensity ( Fig. 1 for schematics). In the single-emitter images, non-zero pixels were placed randomly with no restrictions. In the connected-emitter images, each non-zero pixel is surrounded at least by one other non-zero pixel. In total, 24 categories of images were synthesized, with each representing a prescribed combination of sparsity and pixel connectivity. For each category, 10 images were synthesized for the testing. The coordinates of the non-zero pixels in the ground truth images were determined by random number generation (see Methods). Next, based on the ground truth images, images of lower resolution were generated to simulate images acquired by fluorescence microscopy. The ground truth images were first subjected to a Gaussian filter (σ=1.0077), and the blurred images were down-sampled by a factor of 2. The resulting images, with their size of 512 by 512 pixels, were to be used as inputs to be processed by sparsity-and deep learning-based methods.
The recovered images were then examined against the ground truth images to evaluate the performances of the different methods. Five main metrics were used to assess the performance: computational time, recall, localization accuracy, true positive rate (TPR), and true negative rate (TNR) ( Fig. 1(C), (D)). The computational time was defined as the mean time to recover an image on a personal computer with 8 GB RAM and a 2.60 GHz Hexa-core processor. Since the deep learning-based VDSR requires a significant period of time for model training before it is capable of recovering images, we report the computational time for VDSR both including and excluding training time. Recall and localization accuracy were defined in a manner similar to those used by Ground truth images and corresponding low-resolution images for testing reconstructing algorithms, with schematics for performance evaluation metrics. (A) Representative ground truth images generated at 3 different sparsity levels with different pixel-connectivity conditions are shown. The red arrows in the non-adjacent single emitter images indicate that two neighboring emitters are separated at least by one zero pixel. The red arrows in the single emitter images indicate that two neighboring emitters are adjacent to each other and not separated by zero pixels. (B) Low-resolution images are generated by Gaussian blurring and down-sampling from the corresponding ground truth images shown in (A). (C) The performance metrics recall and localization accuracy are defined as shown in the schematics. The emitters 1 and 2 (orange) in the recovered image are both considered representative of the ground truth emitter A (blue), and emitter 4 representative of ground truth emitter C; because they are within the range of localization accuracy tolerance from emitter A or C. To calculate the localization accuracy, a virtual emitter (green) is created, and the distance between the virtual emitter and emitter A is defined as the localization accuracy. We note emitter 2 is not considered representative of emitter B, because emitter 2 is closer to ground truth emitter A, even emitter 2 is also within the range of localization accuracy tolerance from emitter B. Emitter 3 in the recovered image is considered false positive because it is not within the range of localization accuracy tolerance from any ground truth emitter. (D) The definition of true emitter, false emitter, and failure are illustrated schematically. A true emitter is a recovered emitter overlapped with a ground truth emitter. A false emitter is a recovered emitter not overlapped with a ground truth emitter. A failure is a ground truth emitter failed to be recovered. These metrics are more stringent than the ones illustrated in (C), because it was assumed that each true pixel only could be recovered once by an algorithm. The performance metrics true positive rate (TPR) and true negative rate (TNR) are calculated based on the percentage of true emitters recovered and true negative emitters recovered, respectively.
Hugelier et al. [2]. To calculate recall and localization accuracy, each recovered emitter was first paired to the closest ground truth emitter within the range of PSF radius ( Fig. 1(C)). The recall was then defined as the number of ground truth emitters paired with at least one recovered emitter divided by total ground truth emitters in the image. The example shown in Fig. 1(C) would have a recall of 66.7%. The localization accuracy was defined as the mean distance between all recovered-ground truth emitter pairs. If multiple recovered emitters were paired to the same ground truth emitter, a virtual emitter at the geometrical mean of these recovered emitters would be used to calculate the localization accuracy. To calculate TPR and TNR, pixel-by-pixel comparison was performed on the recovered and ground truth images ( Fig. 1(D)). Pixels that were non-zero in both images were true emitters, pixels that were only non-zero in the recovered images were false emitters, and pixels that were only non-zero in the ground truth images were failures. The TPR was defined as the number of true emitters divided by the number of all emitters in the ground truth image. TPR is a metric similar to recall, but imposes a higher penalty if the coordinate of the recovered emitter is not identical to that of the ground truth emitter. The TNR was defined as the number of true negative pixels divided by the number of all zero pixels in the ground truth image. The example shown in Fig. 1(D) would have a TPR of 50% and TNR of 95.7% (22/23). These metrics can be used selectively to guide the image reconstruction of specific measurements. For example, if one is to perform molecule counting [5,6] based on the reconstructed images, the method with a higher sum of recall rate and TNR should be used. If one is to measure the distance of two molecules tagged by different fluorophores but located in a macromolecular complex [1,7], then the method with better localization accuracy should be used.

Deep learning-based image recovery is faster than sparsity-based recovery by at least three orders of magnitude
To provide a practical guide for selecting the optimal image processing method to enhance resolution, the time required to complete the processing for each method was recorded. If a method requires significant longer time to complete the same task compared to other methods, the disadvantage should also be taken into account. The recovery of high-resolution images was performed with the same set of images in all 24 categories for each algorithm tested. For VDSR, an additional 240 images were synthesized for the training purpose. Of these 240 training images, 10 images for each combination of sparsity level and connectivity constraint were included. 50 epochs of training were performed over the duration of 26,460 seconds (7.3 hours), reaching a final value root-mean-square error (RMSE) of 1042.3. RMSE, defined as √︂ ∑︁ n i=1 (y i −ŷ i ) 2 /n, where y i andŷ i respectively represent the actual residual image and network prediction, is a widely used metric to measure the "loss" of the true information by comparing the recovered image and the ground truth. We note that training after the 20th epoch had very little improvement over RMSE of the model. The loss for validation set followed the same trend and no overfitting was observed. For SPIDER, the value of the sparsity parameter κ was optimized for all sparsity levels in recovering single-emitter images. For GESPAR, no optimization was required, as the recovery was conducted by exhaustive examination of all possible combinations of non-zero pixels present in the image. Because such an exhaustive examination requires a prohibitively large number of iterations, exceeding the memory capacity of the computer, each blurred and down-sampled image was divided into patches of 32 by 32 pixels for the recovery.
The time to recover all 240 low-resolution images (10 images in each of the 24 categories) was approximately 191 seconds (0.05 hours) for VDSR excluding training time and 515,370 seconds (143 hours) for SPIDER. Of the 240 images tested, GESPAR failed to recover 230 images over hundreds of hours, which were more than sufficient for the other methods. As a result, only 10 recovered images from 0.61% non-zero pixel density, non-adjacent single emitters category were produced. The time for these 10 images to be processed for recovery was approximately 577,000 seconds (160 hours). Comparing the time required for each method, we concluded that VDSR, the deep learning-based method is superior to SPIDER and GESPAR, the sparsity-based methods in terms of processing time. On average, it requires 2,150 seconds for SPIDER, 57,700 seconds for GESPAR when successful, 0.796 seconds for VDSR (excluding training time), or 111 seconds for VDSR (including training time) to recover an image of 1024 by 1024 pixels. VDSR is 2,700 times faster than SPIDER, and 72,500 times faster than GESPAR excluding training time. After considering the training time, VDSR is still 19 times faster than SPIDER and 520 times faster than GESPAR. We concluded that GESPAR is extremely time-consuming, thus impractical for the task of recovery of images with the sizes commonly seen in fluorescence microscopy. As result, we only were able to evaluate the performance of GESPAR against VDSR and SPIDER in the category of 0.61% non-zero pixel density, non-adjacent single emitters in the following sections.

VDSR recovers true signals from low resolution images with high recall efficiency
The rate of emitters in ground truth image successfully recovered, or recall, is a commonly used parameter in evaluating the performance of image reconstruction algorithms. A high recall value indicates that a significant portion of the ground truth is faithfully represented by the recovered image. An emitter in the ground truth image is regarded as successfully recalled if it is paired to at least one emitter in the recovered image within the range of localization accuracy tolerance from the ground truth emitter ( Fig. 1(C)). The recall was calculated as:

Recall = number of ground truth emitters paired to recovered emitters number of emitters in ground truth image
The true positive rate (TPR) is defined as the successful recovery of emitters as a percentage of all ground truth emitters using pixel-by-pixel comparison ( Fig. 1(D)). TPR is a metric similar to recall but more stringent than it, because it incurs penalty if the recovered emitter is not located at the exact coordinates of the ground truth emitter. TPR was calculated using the formula:

TPR = number of true emitters in recovered image number of emitters in ground truth image
As the emitter density increases, it is expected that the algorithms become less effective, because pixels containing blurred emitters close to each other may appear as a single emitter. Indeed, both the deep learning-based VDSR and sparsity-based SPIDER demonstrated high recall and TPR at low densities, but their performance decreased monotonically as the emitter density increased (Figs. 2, 3(A), 3(C)).
The recall of SPIDER was 96.6% for the lowest density (0.61% non-zero pixels) non-adjacent single-emitter category and dropped to 49.5% for the highest density (6.9% non-zero pixels) single-emitter category. VDSR scored 99.9% and 69.9% on those two categories, respectively. In fact, VDSR performed consistently better than SPIDER on recall across all categories. For the connected-emitter images, SPIDER could only recall less than 34.8% of the true emitters, while VDSR recalled more than 85.0%. The relatively low performance of SPIDER in recovering connected-emitters even at low density indicated an additional constraint in its applications other than sparsity. GESPAR scored 55.5% recall in the 0.61% density, non-adjacent single-emitter category, 41.4% lower than SPIDER and 44.4% lower than VDSR.
Similar to the results regarding recall, VDSR consistently scored higher TPR relative to SPIDER. In particular, VDSR scored 100% in TPR for the lowest density (0.61%), non-adjacent single-emitter category, whereas SPIDER scored 87.0%. For the highest density (6.9%), singleemitter category, where VDSR performed worst in TPR with a score of 45.1%, SPIDER merely got 21.1% (Fig. 3(C)).

VDSR recovers true signals from low resolution images with higher localization accuracy
The localization accuracy, measuring the mean distance of recovered emitters to their corresponding ground truth emitters, estimates the similarities between the ground truth and recovered images. The localization accuracy was calculated as the mean deviation between all ground truth emitter-recovered emitter pairs: Localization accuracy = total deviation distance number of paired emitters For localization accuracy, the emitters recovered by SPIDER and deemed as true signals with tolerable location inaccuracy ( Fig. 1(C)) deviated 10 to 71 nm from the true emitters in non-adjacent single-emitter setting, while VDSR in the same setting had the accuracy of 0.1 to 50 nm ( Fig. 3(B), Fig. S2). Both methods exhibited good localization accuracy at low non-zero pixel density. By comparison, localization accuracy of VDSR was 30% to 99% smaller than SPIDER, depending on the category of the tested images. Once more, VDSR scored consistently better than SPIDER on localization accuracy among all the categories except in the lowest density (0.61%), connected-emitter category. GESPAR in the lowest density (0.61%), non-adjacent single-emitter category had a localization accuracy of 75 nm, 116% less accurate than SPIDER and 58,000% than VDSR in the same category. We note that SPIDER did exhibit better localization accuracy relative to VDSR in the lowest density (0.61%), connected-emitter category. Despite having a localization accuracy of 4.6 nm (59% smaller than VDSR) in this category, SPIDER only recalled 34.8% of all true emitters (62.6% lower than VDSR). While all the recovered emitters were placed accurately with the true emitters, the low recall rate and localization accuracy of SPIDER imposes a limitation in image recovery when accurate localization of recovered emitters is critical for data interpretation. Recall is defined as the rate of recovered emitters within the range of localization accuracy tolerance from any ground truth emitter (see Fig. 1(C)). (B) The localization accuracy can be evaluated by the mean localization accuracy in recovered images. The localization accuracy indicates how far a recovered emitter deviates from its associated true emitter (see Fig. 1(C)). (C) True positive rate of recovered emitters is defined as the rate of recovered emitters with identical coordinates of the corresponding ground truth emitters (see Fig. 1(D)). (D) True negative rate is defined as the rate of zero pixels in the recovered image that are also of zero value in the corresponding ground truth image.

SPIDER recovers more true zero pixels at high emitter density
True negative rate (TNR) signifies whether the algorithm recovers emitters from where they should not be. Since the non-zero pixels do not exceed 7% of the total image area, most pixels in the ground truth and recovered images are zero-valued. As a result, the TNR was expected to be close to 1 in all categories. Both SPIDER and VDSR exhibited almost 100% TNR in the lowest density (0.61%) of single-emitter and non-adjacent single emitter categories. At the highest density, the TNR of SPIDER dropped to 96.5% for non-adjacent single-emitter and 97.7% for single-emitter. VDSR had 91.5% and 97.0% TNR for the same categories (Fig. 3(D)). A closer look at the recovered images revealed that VDSR recovered twice as many emitters as there were in the ground truth images for the highest density, non-adjacent single-emitter category (Fig.  S1A). The relatively lower TNR from numerous false emitters reveals a limitation of VDSR when evaluation of the number of emitters is critical. For the images containing connected emitters, the TNR of both SPIDER and VDSR remained strongly above 99.7% across all density levels. GESPAR scored 99.1% TNR in the lowest density, non-adjacent single-emitter category, being the worst among the three algorithms we tested. In corollary, false positives, defined as the percentage of recovered emitters not found within the range of localization accuracy tolerance from any ground truth emitter, for all categories was close to 0 (Fig. S1B). We note that SPIDER has slightly more false positives than VDSR, suggesting SPIDER recovered emitters deviated from ground truth emitters too far that some recovered emitters could not be found within the range of localization accuracy tolerance of any ground truth emitter A ( Fig. 1(C)). This result is not surprising, especially when the ground truth emitters are adjacent. Adjacent ground truth emitters may be eliminated by SPIDER in favor of emitters that were further apart to promote sparsity, which is a term included in its penalty [2].

Image reconstruction from a real super-resolution experiment
To evaluate the performance of SPIDER and VDSR in a more realistic context, we applied the algorithms to the raw images of microtubules acquired using STORM [22]. The introduction of sparsity or deep learning-based algorithms to super-resolution microscopy is motivated by the prospective of reducing time required for image acquisition. In other words, if there exists an algorithm which permits recovering single emitters from a stack of raw images with higher emitter density, the typically minute-or hour-long image acquisition time may be reduced to seconds. To simulate this scenario, the raw image stack containing 9990 frames, acquired sequentially, was condensed to a new image stack containing 37 frames. Each frame of the new stack was synthesized by projecting 270 consecutive frames in the raw image stack onto the same plane. This condensed stack of images with 270-fold higher emitter density can be regarded as an equivalent to a stack of images containing emitters with certain photochemical kinetics in favor of "ON" (bright) state [23,24]. Experimentally, emitters with longer "ON" state can be realized by modulating the excitation and spin energy states of the fluorophores, or changing the binding affinity between the DNA paint strand and its target strand [23,24]. In addition, we also projected the whole 9990 frames into one single image to be tested, in order to evaluate the feasibility of reconstructing super-resolution images from single-shot images with emitter density nearly three orders of magnitude higher. The rationale of testing both the condensed stack and the projected image was analogous to the rationale by which we chose to test simulated images of both adjacent emitters and connected emitters in the previous section. The performance was then evaluated by comparing the recovered images by SPIDER and VDSR with those recovered by three well-established super-resolution algorithms, Octane, QuickPALM, RapidSTORM [25][26][27] and an algorithm called PeakFit, which is under active development and has shown superior performance over other SMLM algorithms on 2D super-resolution datasets in SMLM2016 challenge [20]. Four different algorithms were used to avoid introducing bias belonging to a particular algorithm. The raw image stack containing 9990 frames was used for image recovery by Octane, QuickPALM, and RapidSTORM. The super-resolution rendering from these algorithms was provided online as a part of the SMLM2013 results, and we applied PeakFit to the raw image stack to obtain the fourth super-resolution image. The condensed image stack and the projected single images, with 270-fold or 9990-fold higher emitter density respectively, were used for image recovery by SPIDER and VDSR.
Compared to the reconstructed images by Octane, QuickPALM, RapidSTORM, and PeakFit, we observed low recall and TPR in the recovered image by SPIDER from single projected image of microtubules, consistent with the simulation results where connected emitters were present (Fig. 3). The recall, localization accuracy and TPR of SPIDER were significantly higher in recovered image by SPIDER from the condensed image stack (Fig. 4(B)). Nonetheless, the best performance from SPIDER still fell short compared to that of VDSR. In average, the recall and localization accuracy of VDSR was 31% and 20% higher relative to SPIDER, respectively. The recall and TPR were higher in images recovered from the condensed stack than from the single projected image for both SPIDER and VDSR. The best localization accuracy and TNR, however, were accomplished by VDSR in the recovered image from the single projected image. Despite the superior performance of VDSR, we found that VDSR is not capable of recovering images of pixel sizes varying from the one used for training. When the LR image size was increased by a factor of two during pre-processing using linear interpolation, changing the pixel size from 200 nm/pixel to 100 nm/pixel and the preprocessed image was used for reconstruction, we observed highly inaccurate recall by VDSR, giving rise to 4 times as many as the emitters in the ground truth image. This indicates a single VDSR network cannot be used to recover HR images with an arbitrary scale factor. On the other hand, SPIDER offers the flexibility of recovering images of various pixel sizes, in contrast to the intolerance of VDSR which can only perform satisfactorily at the specific pixel size prescribed during the training (Fig. S3). Our results imply that multiple VDSR networks need to be trained dependent on the imaging parameters in order to recover super-resolved images.

Discussion
In this work, we have conducted a parametric study to systematically compare the performance of sparsity-based and deep learning-based super-resolution image reconstruction algorithms. The performance was evaluated with both simulated and real microscopic images. The deep learningbased algorithm VDSR has shown superiority in processing time, emitter recall, localization accuracy and TPR. Based on these metrics, VDSR is recommended when accurate localization of recovered emitters is critical and when computational resource is limited. Since the loss evaluated by RMSE for VDSR showed little improvement after the 20 th epoch, we recommend training VDSR for 20 epochs instead of 50 to further reduce the training time. The sparsity-based algorithm SPIDER has higher TNR and never recovers more emitters than there should be with optimized parameters. These metrics make SPIDER a more desirable choice between the two when evaluation of the number of emitters is critical. Another sparsity-based method, GESPAR, is significantly more time-consuming than SPIDER and VDSR, and may require much more computing resources if high-resolution images are to be recovered in comparable timescale. The high demands on time or computing resources thus render GESPAR less favorable, given the motivation of applying alternative algorithm for image reconstruction is to save time or to reduce the operational cost. The parameters of each algorithm were optimized for all settings to avoid overtraining the neural network and to put all algorithms on equal conditions. To our best knowledge, this study is the first parametric study comparing the performance of different algorithms in recovering high-resolution microscopic images from lower resolution. Both sparsity-and deep learning-based algorithms have demonstrated remarkable image reconstruction capabilities in certain conditions. Our results provide a practical guide to establish a customized framework of super-resolution imaging experiments with specific research objectives. For example, if faster image acquisition is desirable, and it is critical to obtain accurate distance measurement between two species of fluorescently-tagged molecules, like in the case of mapping the kinetochore architecture [28] or focal adhesion organization [1], VDSR can be adopted for the task of image reconstruction. If counting the copy numbers of certain molecule species [29,30] in live cells is of critical importance, but the accuracy of their exact positions is secondary, the algorithm with highest sum of recall and TNR at the prescribed emitter density should be used. Algorithms capable of correctly discerning connected emitters with accuracy in the same image frame will be most suitable for recording the fast cellular processes in subcellular structures consisting of proteins of interests in high density, without resorting to the time-consuming stochastic-based multi-frame acquisition. The prospective applications include integrin dynamics during focal adhesion turnover [31] or tubulin rearrangement during flagellum/cilium motion [32].
The differential strength of SPIDER and VDSR might stem from the different cost functions, also known as loss functions or penalties, implemented in the algorithms. The cost function implemented in SPIDER is C SPIDER = ||x − Cŷ|| 2 + λ|ŷ| 0 , and in VDSR during training is C VDSR = 1 2 ||y −ŷ|| 2 , where x represents the input (low-resolution, PSF blurred) image, y represents the corresponding ground truth image,ŷ represents the recovered image as an estimate of the ground truth, C represents the PSF, and λ is the sparsity penalty coefficient for SPIDER. The higher TNR score of SPIDER observed in most conditions can be attributed to the number of emitters taken into account during the iterative cost function minimization in SPIDER computation, but not in VDSR training. By including the number of emitters as part of the cost function in a deep learning-based algorithm, one might further improve its performance by increasing the TNR score.
In addition, our study demonstrates that it may be feasible to obtain images with details beyond the diffraction limit computationally with appropriate experimental design to achieve favorable emitter density and connectivity conditions. We note that for VDSR, access to a super-resolution microscope might still be prerequisite in order to establish a training dataset, should simulated training sets not be adequate to achieve desirable accuracy [33]. However, the possibility of accomplishing super-resolution computationally, once the training is concluded, might still appeal to research scientists who cannot afford the costly rate to use super-resolution microscopes frequently.
We note that variables such as scaling factors between HR and LR images, pixel sizes, or signal-to-noise ratios (SNR) were not explored, as these variables are beyond the scope of the study primarily concerning connectivity. It is a well-known limitation that VDSR is not compatible with arbitrary scale factors in super-resolution image reconstruction. Recently, other deep-learning models have been developed to mitigate this problem [34]. We note that the precision of emitter locations recovered by VDSR or SPIDER is limited by the pixel size of the reconstructed image. While methods based on deconvolution/fitting using Gaussian or PSF kernels can possibly provide emitter coordinates with much higher precision compared to the pixel size in the reconstructed image by VDSR or SPIDER if SNR is sufficiently high with accurate knowledge of the imaging system [35], to increase the precision of emitter location, one has to set the matrix size of the reconstructed images higher by increasing the numbers of rows and columns. Yet this practice does not guarantee a better accuracy, as many possible emitter coordinates will result in the same cost function values. It would also be relevant to investigate different noise levels in LR images to be recovered. Conventionally, fast acquisition of super-resolution images using fluorophores with short ON state requires a fast frame rate to achieve short total time of image acquisition. It is because with only few ON emitters in the field, a fast frame rate collects few photons per frame, reducing the signal-to-noise (SNR) ratio to the extent that noise subtraction is challenging. In this study we aim to explore an alternative scenario where a different class of fluorophores are used, which by their nature of photochemistry stay in the "ON" state for relatively long period of time. As a result, more ON emitters per frame can be recorded, reducing the number of frames required to survey the whole fluorophore population in the field. In this scenario, acquisition time can be shortened without adopting a fast frame rate which will decrease the SNR to the extent that noise subtraction is unreliable. A more comprehensive study should consider including these factors when comparing the performance of algorithms.
The premise of SMLM, by which each emitter is assumed to equally represent a fluorescenceemitting single molecule regardless of fluorescence intensity, establishes that the location accuracy of single emitter is an essential performance metric in this study. On the other hand, intensity accuracy, which measures whether emitter intensity is faithfully recovered in reconstructed images, is not a critical metric in the context of SMLM. It is because while intensity accuracy can provide information about the time ratio of an emitter in ON or OFF states during image acquisition, not many known applications demand such information. Therefore, intensity accuracy was not evaluated in this work.

Dataset preparation
The ground truth images were generated by randomly filling non-zero pixels to a zero matrix of the size 1024 by 1024 repeatedly. For single-emitter and non-adjacent single-emitter images, a pair of random integers (x, y) between 1 and 1024 were generated each time. This new coordinate was discarded if it was adjacent to (in non-adjacent single-emitters) or overlap with (in single-emitters) an existing non-zero pixel. If a new coordinate was not discarded, the pixel it represents will become a non-zero pixel. For connected-emitter images, two pairs of random integers (x1, y1), (x2, y2) that were 10 pixels apart were generated each time. All pixels between these coordinates will become non-zero pixels to ensure the non-zero pixels are adjacent to each other. The random generation process was repeated for each image until the sparsity level for that category is reached. The low-resolution images were generated by Gaussian filtering (σ=1.0077) the ground truth images, adding Gaussian noise (mean=10/255, variance=(5/255) 2 ), followed by down-sampling by a factor of 2. The size of the Gaussian filter was chosen such that the low-resolution images would be representative of fluorescent microscopy images taken from an objective with 63× magnification (pixel size = 180.6 nm), 1.4 NA at λ=600 nm. All images were 8-bit.

Recovery from low-resolution images
Briefly, the VDSR network consists of 20 2-D convolutional layers, each with 64 3 × 3 kernels and followed by a ReLU layer, except the last convolutional layer which has a single 3 × 3x64 kernel to reconstruct the image. Images were padded with zeros (width=1) to keep output sizes constant after each convolutional layer. The final layer is a regression layer computing the mean-squared error (MSE) between the residual image and the network output. The weights were randomly initialized with He's method [36] and the biases were initialized to 0. The network was trained using stochastic gradient descent with an initial learning rate of 0.1 and a momentum of 0.9. L2-norm with a threshold value of 0.01 was used for gradient clipping.
Prior to the performance evaluation in recovery of high-resolution images, VDSR was trained on a separate training dataset generated with the same settings as described in the "dataset preparation" sub-section. The training images were randomly cropped into patches with the size 41 by 41 pixels. 128 training patches (20.5% total image area) and 16 validation patches (2.6% total image area) were produced from each training image. The σ (standard deviation) of the Gaussian PSF and 'zoom' parameters in SPIDER were set to match the conditions under which the dataset was generated. The κ, or sparsity parameter in SPIDER was optimized from a number of trial runs, such that the algorithm recovers as many emitters as possible at the highest sparsity level, without compromising its performance at lower sparsity levels significantly. The GESPAR parameter, the number of emitters to fit in a single patch, was determined automatically in the recovery by dividing the total intensity in each image patch by the mean total intensity of a blurred emitter in medium sparsity level image. A script was then used to automatically recover all low-resolution images using each algorithm. The recovery time each algorithm consumed in recovery was obtained from the timestamps on the recovered images.

Real super-resolution experiment
The super-resolution image set of microtubules were obtained from an open online superresolution database [19], courtesy of Nicolas Olivier and Suliana Manley at Ecole Polytechnique Fédérale de Lausanne (EPFL). The images were acquired using a 100x objective with 1.46 NA at λ=635 nm. The pixel size was 100 nm/pixel, comparable to the settings used in our simulated low-resolution images.
The size of the raw microtubule image was 128 by 128 pixels. These images were used to be recovered by Octane, QuickPALM, RapidSTORM, and PeakFit. The size of the recovered images by Octane, QuickPALM, and RapidSTORM was 1280 by 1280 pixels. The size of the recovered image by PeakFit was 1024 by 1024 pixels. The projected image and condensed stack were down sampled from the raw images to 64 by 64 pixels, so that the image quality was consistent with that of the simulated images prepared for SPIDER and VDSR. The size of the recovered images by SPIDER and VDSR was 128 by 128 pixels. To compare the accuracy of reconstruction across Octane, QuickPALM, RapidSTORM, and PeakFit to those reconstructed by SPIDER and VDSR, the images recovered by Octane, QuickPALM, RapidSTORM, and PeakFit were down sampled to the size of 128 by 128 pixels, so that all the recovered images were of the same size for performance comparison.

Software
All computational tasks including image generation, neural network training, and performance evaluation, were performed with MATLAB version R2020a (MathWorks) unless otherwise noted. Projection of microtubule images (Fig. 4(A)) was performed using ImageJ. The super-resolution localization and rendering of microtubule images by PeakFit was performed using the ImageJ plugin GDSC SMLM, developed by A. Herbert at the University of Sussex. The corresponding codes generated in this study are included in the Supplement 1 Code 1 [37].
Funding. National Institute of Biomedical Imaging and Bioengineering (R21EB029677).

Disclosures. The authors declare no conflicts of interest.
Data availability. Data underlying the results presented in this paper can be recreated from codes included as supplemental.
Supplemental document. See Supplement 1 for supporting content.