Advancing biological super-resolution microscopy through deep learning: a brief review

Biological super-resolution microscopy is a new generation of imaging techniques that overcome the ~200 nm diffraction limit of conventional light microscopy in spatial resolution. By providing novel spatial or spatiotemporal information on biological processes at nanometer resolution with molecular specificity, it plays an increasingly important role in biomedical sciences. However, its technical constraints also require trade-offs to balance its spatial resolution, temporal resolution, and light exposure of samples. Recently, deep learning has achieved breakthrough performance in many image processing and computer vision tasks. It has also shown great promise in pushing the performance envelope of biological super-resolution microscopy. In this brief review, we survey recent advances in using deep learning to enhance the performance of biological super-resolution microscopy, focusing primarily on computational reconstruction of super-resolution images. Related key technical challenges are discussed. Despite the challenges, deep learning is expected to play an important role in the development of biological super-resolution microscopy. We conclude with an outlook into the future of this new research area.


Introduction to biological super-resolution microscopy
Fluorescence microscopy is a light microscopy technology that plays a critical role in life sciences by capturing spatial or spatiotemporal information of biological processes [1,2].Its molecular specificity, low invasiveness, and multiplex capability make it a powerful tool for studying structure and function of biological processes in space and time at the molecular level under physiological conditions.However, the spatial resolution of conventional fluorescence microscopy is limited by the diffraction of visible light to ~200 nm.Under this resolution limit, many important molecular level details of biological processes are indistinguishable.Super-resolution microscopy overcomes this limit (Figure 1), routinely reaching spatial resolutions in the range of 20-70 nm [3][4][5], with some techniques reaching spatial resolutions of <10 nm in certain applications [6,7].Depending on their modes of image formation, the wide variety of super-resolution microscopy techniques generally fall under two categories: patterned illumination and single molecule localization.
Super-resolution by patterned illumination was pioneered by stimulated emission depletion (STED) microscopy (Figure 1C), which uses an intense doughnut-shaped depletion laser beam to create an emission region smaller than the diffraction limit [8,9].However, strong depletion illumination causes photobleaching and phototoxicity.A similar but more general photoswitching-based technique called reversible saturable optical linear fluorescence transitions (RESOLFT) was developed later, allowing substantially reduced depletion laser intensities [10].Resolutions of these two techniques typically reach tens of nanometers.Under STED and RESOLFT, image acquisition requires point scanning of the field-of-view (FOV).In comparison, structured illumination microscopy (SIM) [11] is a widefield-based technique that uses patterned illumination to increase the spatial frequency that can be captured.It can use conventional fluorophores to image a large FOV on a millisecond timescale (Figure 1A) [12,13].However, it can only reach ~100 nm in spatial resolution.SIM using nonlinear illumination and photoswitchable proteins (NL-SIM) can further improve the resolution to ~50 nm [14], but to reach a higher resolution remains challenging.Under STED and RESOLFT, super-resolution images are directly acquired by pointscanning and photo-switching at defined spatial coordinates [3].Under SIM and NL-SIM, super-resolution images are reconstructed through computational processing of acquired raw images.Super-resolution by single molecule localization, often referred to as single molecule localization microscopy (SMLM), differentiates single switchable fluorophores within the diffraction limit by their blinking events.Stochastic optical reconstruction microscopy (STORM) (Figure 1E) [15] and photoactivation localization microscopy (PALM) [16] are two representatives.STORM uses switchable organic dyes while PALM uses photoactivatable proteins.SMLM can reach a spatial resolution of ~20-30 nm.However, blinking events under SMLM occur at random spatial coordinates.
Typically, thousands of raw images or more need to be collected so that enough localized single molecules can be accumulated to faithfully reconstruct the real geometry and fluorescence signal distributions of samples.For this reason, SMLM techniques require long acquisition time, and their low temporal resolution severely limits their applications in live cell imaging [4].
Super-resolution by patterned illumination and single molecule localization have complementary technical strengths.Recently, several techniques have been developed to combine these strengths by integrating the two strategies.In nanoscopy with minimal photon fluxes (MINFLUX) (Figure 1D), stochastic single molecule photoswitching is combined with patterned illumination-based localization to reach a resolution of ~1 nm [6].In repetitive optical selective exposure (ROSE), multiple excitation illumination patterns are combined with stochastic single molecule photoswitching to reach a lateral resolution of ~5 nm [7] and an axial resolution of ~2 nm (Figure 1B) [17].For these two techniques, enough single molecules must be localized for faithful reconstruction of super-resolution images, same as for STORM and PALM.
The super-resolution microscopy techniques introduced so far require specialized optics, specialized fluorophores, or both.Different from these techniques, several computational super-resolution techniques overcome the diffraction limit by analyzing random fluctuations of single fluorophore signals.Super-resolution optical fluctuation imaging (SOFI) [18] and super resolution radial fluctuations (SRRF) (Figure 1F) [19] are two representatives.They can be combined with other super-resolution techniques such as STORM and PALM or conventional widefield and confocal microscopy.Reconstruction of super-resolution images using these techniques requires computational processing of acquired raw images.Overall, along with the representative super-resolution microscopy techniques introduced here, many variants have been developed over the past two decades.See e.g., [20] for a case study.Comprehensive reviews of super-resolution microscopy techniques can be found in e.g., [3][4][5].
The wide variety of super-resolution microscopy techniques differ in their image formation and acquisition.However, from a computational point of view, there are important commonalities in their image reconstruction, which is the end goal of all super-resolution microscopy techniques.For the representative super-resolution microscopy techniques introduced here, Table 1 summarizes and compares their principles and performance goals of image reconstruction.The focus of this Review is on deep learning [21] based image reconstruction techniques for super-resolution microscopy.

Introduction to deep learning for image processing and computer vision
Deep learning refers to a class of machine learning or artificial intelligence techniques that compute using artificial neural networks with many layers, often called deep neural networks (DNNs) [21].Honored by the 2018 ACM Turing Award, it has revolutionized how we analyze and understand images and has been used with tremendous success in virtually all kinds of image processing and computer vision tasks, such as image classification [22], object detection [23,24], image segmentation [25], object tracking [26,27], image registration [28,29], image denoising [30], and image synthesis [31,32].The most commonly used types of DNNs for such tasks are convolutional neural networks [21] and, for image sequences, recursive neural networks [33].
Other types of neural networks such as graph neural networks [34,35] have also been used for various applications.Most of the deep learning-based image processing and computer vision techniques are developed for natural images.The first step in solving an image processing or computer vision problem using deep learning typically is to decide on a learning strategy, such as supervised learning, semi-supervised learning, or unsupervised learning [36][37][38].The decision is usually based on the availability of labeled training data and the cost of producing new labeled training data.However, training with unlabeled data can help prevent overfitting [36][37][38].DNNs are trained with fully labelled data in supervised learning, partially labelled data in semi-supervised learning, and unlabeled data in unsupervised learning.
The next step is to choose an existing DNN architecture or to develop a new or custom DNN architecture, also referred to as a model, for the best performance.Indeed, many DNN architectures have been developed over the past decade (Figure 2).In image classification, for example, the ImageNet Large Scale Visual Recognition Challenge (ILSVRC) [39] has played a particularly important role in driving the development of new models and in starting the deep learning revolution.Representative models coming out of this competition include AlexNet [40], VGG [41], Inception (Goog-LeNet) [42], and ResNet (Figure 2A) [43], to name a few.In object detection, image objects are located and classified by assigning rectangular bounding boxes.One-stage detectors refer to models that combine localization and classification into one step.Representative one-stage detectors include YOLO [44], SSD [45], RetinaNet [46] and CornerNet [47].Two-stage detectors separate object localization and classification into two steps.Representative two-stage detectors include R-CNN [48], Fast R-CNN [49], Faster R-CNN (Figure 2C) [50], R-FCN [51] and Mask R-CNN [52].In semantic image segmentation, individual pixels belonging to the same object are grouped and assigned the same semantic label.Representative models include fully convolutional network (FCN) [53], SegNet [54], U-Net (Figure 2B) [55].More recent segmentation models use multiscale image features.Representative models include Pyramid Scene Parsing Network (PSPN) [56], Adaptive Pyramid Context Network (APC-Net) [57], Multi-Scale Context Intertwining (MSCI) [58] and High-Resolution Network (HRNet) [59].Most recently, the Transformer architecture [60,61], originally developed for natural language processing, has found substantial success in image processing and computer vision tasks such as image classification [62,63].
The selection or development of a DNN model is usually accompanied by the selection or development of an application-oriented loss function [64,65].Training of DNNs is essentially a process that optimizes their connection weights to minimize or maximize the loss function, called the cost function in optimization.Various optimization methods can be used for the training [66,67].Configuration parameters used in training of DNNs, called hyperparameters, require tuning [68,69].Overall, in assessing a deep learning technique, key components to be considered include its learning strategy, network architecture, loss function, and training data.Its optimization method and hyperparameters are key to its implementation.

Introduction to deep learning for processing fluorescence microscopy images
In addition to its tremendous success in processing natural images, deep learning has also found great success in processing fluorescence microscopy images [70][71][72][73].From a user's perspective, deep learning techniques offers at least two important advantages over traditional image processing techniques.First, they offer superior performance.
A striking example is the synthesis of realistic fluorescence microscopy images [74] using generative adversarial networks (GANs) [32].It is not feasible for traditional image synthesis techniques to achieve the same level of fidelity.Second, deep learning techniques are more user friendly.Once DNNs are trained properly, they can be used without parameter tuning.In contrast, parameter tuning is often essential for traditional image processing techniques.Despite the great success of deep learning in processing both natural images and fluorescence microscopy images, it is important to note the differences between these two types of images.First, fluorescence microscopy images have simpler structures and semantics than natural images.For each color (i.e.wavelength) channel, the semantic label for each pixel is binary, either foreground or background, and the image background is composed of structure-free regions of noise.Second, fluorescence microscopy images have greater pixel depths and wider dynamic ranges than natural images.Third, fluorescence microscopy images have noise properties that differ substantially from those of natural images [75].Fourth, blurring of image objects is common in fluorescence microscopy images because of limited depth-of-field.Overall, it is essential to consider these distinct properties in developing deep learning-based processing techniques for fluorescence microscopy images.It is also essential to consider the technical limitations of deep learning techniques, which are discussed in Section 3.

Organization and aim
This Review is organized as follows: Section 1 provides necessary background information on biological super-resolution microscopy and deep learning.Section 2 surveys representative works in using deep learning techniques to advance reconstruction of super-resolution microscopy images.Section 3 concludes with a discussion of key technical challenges and an outlook on how deep learning could shape the future of super-resolution microscopy.Overall, this Review aims to provide a concise, indepth, and up-to-date survey of related works for researchers and practitioners interested in super-resolution microscopy and deep learning.

Deep learning for reconstruction of super-resolution microscopy images
Super-resolution microscopy techniques are defined by their capability to overcome the diffraction limit in spatial resolution.However, spatial resolution is not the only performance metric required by real-world applications.Other important performance metrics include temporal resolution, length of image acquisition, and area of image acquisition (i.e.FOV), etc.In addition to diverse performance requirements, superresolution microscopy techniques are subject to various constraints that define their performance envelopes.For example, SMLM techniques such as STORM and PALM are limited in their temporal resolutions.Samples are limited in the amount of photons they can tolerate before photobleaching and photodamage [76].Overall, for super-resolution microscopy techniques, trade-offs must be made between performance metrics within their performance envelopes to meet requirements of different applications.
Deep learning techniques provide a potentially transformative solution to enhance performance of super-resolution microscopy techniques and to push their performance envelopes [73].Because image reconstruction is at the core of all super-resolution microscopy techniques, we focus on examining recent advances in using deep learning to advance reconstruction of super-resolution microscopy images.

Deep learning-based enhancement of spatial resolutions
Interestingly, the term "super-resolution" was first coined for natural images and is defined as overcoming resolution limits of optical imaging systems by image processing [77][78][79].This definition certainly is also valid for biological super-resolution microscopy.However, unlike in biological super-resolution microscopy, what resolution qualifies as "super-resolution" for natural images is more loosely defined and is often judged by human perception.Research on this topic has a long history and dates back at least to 1980s [79].Conventional super-resolution techniques developed before the rise of deep learning techniques are examined in several reviews [77][78][79].However, it is deep learning that has enabled transformative performance advances [80,81].A wide variety of deep learning-based super resolution techniques have been developed for natural images [80][81][82].Among them, single-image super-resolution techniques that require just one input image have been extensively studied [82][83][84] and have recently been used for enhancing the resolution of fluorescence microscopy [85,86].
Overall, the implementation of deep learning-based super-resolution techniques for both natural images and fluorescence microscopy images follow the same supervised learning scheme that consists of two steps: training data preparation and model training.In training data preparation, paired and aligned low-resolution and high-resolution images of the same FOV are produced by experiments, computer simulation, or a mixture of both.In model training, the paired images are used to train DNNs to learn the mapping between the low-resolution image domain as the input domain and the high-resolution image domain as the output domain.After the models are trained, they are used to transform an input of low-resolution images into an output of highresolution images.In this way, deep learning enables computational reconstruction of synthetic high-resolution images.This process is presumably more convenient and cost-effective than physical acquisition of real high-resolution images.Specifically for fluorescence microscopy images, deep learning based super-resolution techniques provide a computational solution that is potentially transformative in enhancing spatial resolutions and overcoming the diffraction limit if properly implemented.
To date, a significant number of studies have reported using deep learning to enhance spatial resolutions in image reconstruction for biological super-resolution microscopy.Overall there are two application scenarios.Under the first scenario, deep learning is used directly to reconstruct super-resolution microscopy images of higher resolutions.For example, Nehme and colleagues have reported using DNNs to enhance the performance of single molecule localization for reconstruction of STORM images in 2D and 3D [87,88].Deep learning achieves higher localization accuracy than conventional point spread function (PSF) fitting under high fluorophore density and low SNR with real-time speed and no parameter tuning for 2D STORM [88].They extend their work to 3D STORM by combining PSF engineering with deep learning-based single molecule localization and PSF pattern recognition [87].Indeed, deep learning is well suited for recognition of complex patterns of engineered PSFs and has achieved superior detection accuracy and speed in image reconstruction in several other studies, e.g., [89,90].Under the second scenario, deep learning is used to computationally reconstruct a high-resolution image from a low-resolution image.For example, Wang and colleagues use a GAN network to transform low-resolution images into high-resolution images across different modalities, such as from low-NA widefield to high-NA widefield, from confocal to STED, and from conventional TIRF to TIRF-SIM [86].Fang and colleagues use a U-Net type model to enhance resolutions of electron microscopy images and fluorescence microscopy images [85].Qiao and colleagues use two deep learning models for enhancing performance of SIM under low signal-to-noise-ratios (SNRs) and long intervals of imaging [91].
Key components of reviewed studies under the two scenarios are summarized and compared in Table 2. Several observations can be made from the comparison.First, if paired training data is available, deep learning can be used to enhance resolution across different modalities, including widefield microscopy, confocal microscopy, SMLM, SIM, and transmission electron microscopy, demonstrating the versality of the approach.The training data may be produced by a mixture of simulation and experiments or entirely by simulation.Overall, generalization capability and robustness of the proposed models are not thoroughly characterized in the reviewed studies.Second, performance metrics used for super-resolution microscopy images are similar as those used for natural images, such as PSNR (peak signal-to-noise ratio) and SSIM (structural similarity index).These metrics may not be well suited for fluorescence microscopy images because of their differences from natural images.Third, generation of artifacts has been reported in all the studies reviewed.Currently there is no systematic solution to this problem.

Deep learning-based noise reduction
Because of the basic principle of their image formation, SMLM techniques require long image acquisition, which severely limits their applications in live cell imaging.
Increasing fluorescence labeling density can accelerate SMLM, but it will also cause overlap of single fluorophore signals in the diffraction limit.Nehme and colleagues have shown that deep learning is capable of localizing single fluorophores at higher density [87,88].Still, the allowed density has a limit.In addition to increasing labeling density, a variety of chemical and physical strategies have been proposed to overceome the limitation in temporal resolution.Brighter fluorophores, stronger laser and faster cameras have all been tried to reduce acquisition time [92][93][94].However, fast acquisition may reduce image quality while strong laser may induce photobleaching and photodamage.Overall, none of these strategies address the fundamental constraint of SMLM, namely large numbers of single molecules must be localized for faithful image reconstruction.To this end, several studies have tried to reduce the number of required localized single molecules using compressed sensing [95,96] and sparse support [97].Recently, DNNs have demonstrated superior performance and great potential in reconstructing SMLM images from sparse data [98,99].Ouyang and colleagues use the pix2pix GAN to generate high-resolution SMLM images from sparse images of localized single molecules [98].In comparison, Gaire and colleagues use a much simpler residual learning architecture for reconstructing high-resolution SMLM images from sparse data for up to three color channels [99].
So far, we have focused on deep learning for reconstruction of single super-resolution images from sparse data.In live cell imaging, acquired videos has substantial spatial and temporal continuity.Deep learning has been demonstrated to extract information in videos to enhance spatial and temporal resolution of confocal and light-field microscopy [85,100].However, specifically for SMLM, deep learning faces new challenges in assigning localized single fluorophores to moving structures (Figure 3)

Other applications of deep learning
SNR is an important performance metric for fluorescence microscopy.Low SNR images are difficult to analyze.Increasing laser power is a direct way to improve the SNR but will also increase the likelihood of photobleaching and photodamage.Classical denoising algorithms has been used to enhance SNR of fluorescence images under low laser power [101][102][103].Deep learning-based methods have been shown to have superior performance over classical methods in recent studies [104].So far, deep learning-based denoising has been used in image acquisition of several super-resolution microscopy modalities, including SIM [105], STED [106], and SMLM [107].It is expected that deep learning based denoising will be widely adopted in super-resolution microscopy to increase SNR and reduce photodamage of biology sample.

Other applications of deep learning
Multicolor super-resolution microscopy uses different fluorescence probes to reveal multiple molecular structure at same time.

Discussion and outlook
Today, models with millions parameters are common in deep learning [110].In a somewhat extreme case, the GPT-3 model for natural language processing contains 175 billon parameters [111].The enormous numbers of parameters, which far exceed those of traditional image process and computer vision algorithms, are one of the key factors that give DNNs the power to handle challenging image processing and computer vision tasks [112,113].Indeed, as demonstrated by the works reviewed here, deep learning has great potential in pushing the performance envelope of super-resolution microscopy.In the meantime, however, it also faces critical technical challenges.First, to minimize artifacts in reconstruction of super-resolution images is a key challenge for deep learning-based performance enhancement techniques [114].So far, although progress has been made [115], characterization and minimization of artifacts remain an open problem.Second, to ensure generalization of DNNs for superresolution microscopy is also a key challenge.DNNs trained on images collected on selected microscopes under selected conditions may not perform well on images collected on other microscopes under other conditions [116].Third, to ensure robustness of DNNs for super-resolution microscopy is another key challenge.Fluctuations in imaging conditions are common in fluorescence microscopy, especially in live cell imaging.Deep-learning algorithms are data driven and are known to be sensitive to such fluctuations.In fact, if not trained properly, performance of DNNs can collapse under such fluctuations [117].Lastly, to ensure interpretability of DNNs for superresolution microscopy is also a key challenge in real-world applications [118].Currently, deep learning lacks a rigorous theoretical foundation [119] for in-depth understanding of its basic properties such as generalization, robustness, and interpretability.
Generation of artifacts is a common issue in solving inverse problems such as image reconstruction [120].Looking into the future, we expect that this issue will be gradually resolved through the synergy of multiple measures, including but not limited to rigorous experimental control, incorporation of realistic physical models, and improvement in design of DNN architectures and loss functions.The challenges of ensuring generalization, robustness, and interpretability of DNNs are universal for all deep learning applications, not just super-resolution microscopy.They will be gradually overcome by advances in the general theory and practice of deep learning as well as customized solutions for super-resolution microscopy.Overall, despite the challenges, we believe deep learning is set to play an indispensable and transformative role in advancing super-resolution microscopy as the next generation of light microscopy technology.

Fig. 2 .
Fig. 2. Representative architectures of deep neural networks.(A) ResNet: a feedforward architecture with residual blocks, often used for image classification.(B) U-Net: an encoder-decoder architecture with skip connections at different scales, often used for image segmentation.(C) Faster R-CNN: a two-stage detector architecture, often used for object detection.

Fig. 3 .
Fig. 3.A cartoon illustration of single molecule localization on a curvilinear structure in a fixed cell versus a live cell.In the fixed cell, localized single molecule coordinates are samples of the same static underlying structure.In the live cell, localized single molecule coordinates are samples of the varying underlying structure at different time points.

Table 1 .
Comparison of image reconstruction for representative super-

resolution techniques Modality Principle of Image Reconstruction Main Performance Goals for Image Recon- struction Image Formation Representative Techniques
[109]9,108]t multicolor imaging leads to overlap of emission spectra which causes channel mixing and reduces the final resolution.Deep learning-based methods are used in separating mixing spectra during image acquisition in recent studies[92,99,108].These strategies have shown excellent performance in decreasing cross-color contamination, accelerating image acquisition, and increasing the final resolution of super-resolution microscopy.In addition, deep learning has found many other applications in fluorescence microscopy.So far, a common strategy for 3D fluorescence microscopy is to use specialized optics and engineered PSF.A recent study has shown that deep CNN can extract information from single 2D image to reconstruct a 3D filed of the sample[109].This may bring in a new approach for 3D super-resolution microscopy.

Table 2 .
Representative studies using deep learning to enhance spatial and temporal resolution of super-resolution microscopy