Artefact removal in ground truth deficient fluctuations-based nanoscopy images using deep learning

: Image denoising or artefact removal using deep learning is possible in the availability of supervised training dataset acquired in real experiments or synthesized using known noise models. Neither of the conditions can be fulfilled for nanoscopy (super-resolution optical microscopy) images that are generated from microscopy videos through statistical analysis techniques. Due to several physical constraints, a supervised dataset cannot be measured. Further, the non-linear spatio-temporal mixing of data and valuable statistics of fluctuations from fluorescent molecules that compete with noise statistics. Therefore, noise or artefact models in nanoscopy images cannot be explicitly learned. Here, we propose a robust and versatile simulation-supervised training approach of deep learning auto-encoder architectures for the highly challenging nanoscopy images of sub-cellular structures inside biological samples. We show the proof of concept for one nanoscopy method and investigate the scope of generalizability across structures, and nanoscopy algorithms not included during simulation-supervised training. We also investigate a variety of loss functions and learning models and discuss the limitation of existing performance metrics for nanoscopy images. We generate valuable insights for this highly challenging and unsolved problem in nanoscopy, and set the foundation for the application of deep learning problems in nanoscopy for life sciences.


Introduction
Super-resolution optical microscopy, also called nanoscopy, breaks the resolution limit of optical microscopes.Among the wide variety of nanoscopy techniques [1][2][3][4][5], an interesting family of methods is rooted in the statistics of the intensity fluctuations seen in fluorescence microscopy due to photokinetics of photon-emitter molecules [6][7][8][9][10][11][12][13].We refer to these methods as fluctuations based nanoscopy methods (FNMs).FNMs take in a temporal image stack (10s to 100s of frames) of a fluorescently labeled sample and generate a nanoscopy image.They provide a good trade-off between resolution, acquisition time and instrument or fluorescent label customization.However, as reported in [14], artefacts appears to be an unavoidable feature of FNMs.
Common causes of artefacts in microscopy data are hot or dead pixels, camera or sample drift, and microscope aberrations.Such artefacts may be greatly reduced and even completely removed by changing or upgrading the hardware.Artefacts arising from dynamics samples, such as population of diffusing particles may be exploitable, such as in fluorescence correlation spectroscopy [15].However, the photon and electronic noise in the microscopy data cannot be completely eliminated unless sophisticated denoising approaches are used [16][17][18].While such denoising solutions may be useful for microscopy data, they introduce non-linear computational distortion in the raw microscopy data.This distortion renders denoised microscopy image stack as unsuitable for FNMs see Supplement 1 and artefact removal in nanoscopy images generated by them should be done separately.Moreover in FNMs, the high spatio-temporal density of intensity fluctuations and the base statistical technique introduce custom artefacts.This implies that artefact removal techniques need to be customized for each FNM individually for best performance.
Here, we demonstrate artefact removal for one such FNM, namely multiple signal classification algorithm (MUSICAL) [8], although the concept is generalizable to any FNM.Two examples of reconstructions of MUSICAL in the presence and absence of noise are shown in Fig. 1.To quantify the noise we use the signal-to-background ratio or SBR (defined in section 2), which corresponds to 3 in these examples.SBR 3 indicates conventionally large amount of noise, which may be introduced due to small exposure time and small light dose input to the sample.Both these conditions are desirable for long-duration imaging and reduced rate of photobleaching, and crucial for imaging living samples under low photochemical toxicity.Therefore, performing image acquisition under these conditions for nanoscopy and removing artefacts subsequently through artefact removal approaches can prove quite valuable in nanoscopy applications.Example noise-free and noisy data are provided in supplementary Fig. S1 (a,c).
Artefact removal in nanoscopy images can be considered as a version of denoising problem in the sense that artefacts are associated to the noise characteristics and have to be removed from the image similar to the need of removal of noise in denoising.An important deviation must be noted though.Artefacts may not be completely stochastic in nature as opposed to the general noise distributions since they encode the stochastic parameters of noise and photokinetics as well as the systematic distortion introduced by the microscope or algorithm.Nonetheless, for simplicity of reference, we use the terms noisy, noise-free, and denoising for artefact-ridder, artefact-free, and artefact removal respectively.Also, unless specified otherwise, these terms apply in the context of the processed nanoscopy images rather than the raw microscopy image stacks used for generating the nanoscopy images.
As explored in [19], there is indeed scope of customization of MUSICAL but the non-linear treatment of noise component of the raw image stack implies that denoising cannot be integrated within the physics-based mathematical framework of MUSICAL.Further, such customization may vary from case to case and may be sensitive to several algorithmic parameters.As noted before in [20][21][22], microscopy and nanoscopy data poses several challenges as compared to the normal computer vision data because of absence of color, texture, and edge features.While the color feature is still absent in nanoscopy images, the background noise may contribute a texture and the resolution enhancement may introduce sharp structures although explicit edge features may not be present.The absence of edge features and sharp gradients rules out time-tested image processing and hand-crafted kernel approaches [23,24] for microscopy.Furthermore, their performance is also expected to be not versatile and sensitive to several parameters.
In order to obtain an artefact removal approach that is robust and versatile across a wide variety of structures and microscope parameters, we consider deep learning as a suitable option.Our motivation is rooted in recent developments in deep learning for similar computer vision tasks on one hand [25,26] and applications of deep learning for challenging microscopy problems [27,28] on the other hand.Super-resolution with lower frame requirement using deep learning been shown by [29,30] and background estimation using U-Net type deep learning network for single molecule localization microscopy [31].Deep learning architectures known as autoencoders [32] are good candidates for artefact removal in nanoscopy data.In contrast to the traditional convolutional neural networks which, given an image matrix as the input, output a single dimensional vector at the end, autoencoders output a matrix with the same shape as the input.Furthermore, the small latent space of autoencoders (such as shown in block C1 of Fig. 2) is an efficient way of exploiting the sparsity and lack of feature variety which is characteristic of microscopy and nanoscopy images.

Proposed approach
The general approach for performing deep learning is to create a large supervised dataset containing pairs of input and the expected output (referred to as the ground truth).In the case of autoencoder for denoising, the pair comprises of the noisy image and the noise-free image.Indeed the expectation is that both these images are real microscopy data measured on the actual sample.However, this is impossible to achieve in the case of fluctuations based nanoscopy images for multiple reasons: • No noise-free data: Microscopy images are inherently noisy due to shot noise arising from the stochastic nature of photons and electronics in the optical system.The low photon count in microscopy, as compared to real world images, imply that neither of the noise sources are negligibly small.
• High SBR data comes with photobleaching: A high dose of light can be used to increase SBR and thereby weakly emulate noise-free raw data.However, the sample experiences higher rate of photobleaching under higher does of light.Therefore, the two sets of images taken with different light doses inevitably have differences between emitters' emissions.
• Impossible to replicate the fluctuation statistics: The emission of photons from fluorescent molecules is a stochastic process.Therefore, it is impossible to replicate the temporally precise series of emissions between the two set of measurements.
• Versatility needs data from many instruments: In order for ensuring versatile application, the training data needs measurements taken from microscopes with different optical parameters and cameras with different pixel sizes.However, such diversity of microscopy setups are generally not available even in a collection of microscopy labs and facilities.
Therefore, we take a simulation-supervised approach similar to [33,34], where we used advanced simulation approaches to construct the supervised training dataset.Our approach is shown in Fig. 2. We first create a simulated training dataset as illustrated in block A and B of the 2. For this first two sets of raw microscopy image stacks are simulated using precisely the same physical characteristics, with the exception that one raw microscopy image stack is noisy since it is generated by the noise-free raw microscopy image stack passing through a noise simulator.Both the raw microscopy image stacks are individually processed using MUSICAL to obtain corresponding noisy and noise-free nanoscopy images.Since the simulated dataset is used for training, it is imperative for the simulated dataset to emulate the relevant aspects of reality as closely as possible while retaining enough diversity across the simulated conditions.Several thousands of such pairs are generated and used as the supervised training dataset for the autoencoder.Then, through a good choice of autoencoder architecture and the loss functions, the autoencoder is trained for denoising the nanoscopy images.In the test phase or actual field use, raw microscopy image stack obtained by a real microscopy experiment is processed through the MUSICAL algorithm to obtain a noisy nanoscopy image, which is then passed through the trained autoencoder to generate the corresponding noise-free nanoscopy image.We discuss the details of the various blocks in the subsequent sub-sections.
2.1.Raw microscopy data simulator (Blocks A1-A4 of Fig. 2) A1: Sample simulator − The concept is that the shape and size hypotheses created by prior studies are used to simulate sample geometries.In this work, we consider three types of subcellular structures, namely actin filaments [35][36][37], mitochondria [38,39], and vesicles [40][41][42].However, the setup is easily scalable to include other types of sub-cellular structures.First, the 3D geometries of the structures are simulated.Then, the positions of fluorescent molecules (called emitters for simplicity) are stochastically generated as labeling the structures.
For simulating an actin filament, a 3D smooth curve is created by selecting certain number of spline control points and then fitting a spline through them.The number of control points is also selected randomly from the range [3,6].The maximum length allowed for a filament is kept at 5 µm.The emitters are placed randomly across the length of the spline curve with linear density of 100 emitters/µm.This is based on two assumptions.First, the periodicity of binding sites in actin is 5-7 nm.Second, the labeling efficiency is never 100%.Assuming 30-50% labeling efficiency, the selected emitter density is reasonable.
For simulating a single mitochondrion, first the spline similar to actin filament is considered.Then, a curvilinear cylinder of radius 150 nm is fit over it by convolving a cylinders of the chosen radius and height 1 nm over the spline.The selection of the diameter 300 nm is close to diffraction limit of most microscopes and its outer membrane label is not distinguishable in raw microscopy data with noise, but is expected to be reconstructed as outer boundary by a nanoscopy method.Further, as seen in Fig. 1, under significant noise, the membrane boundary may not be explicitly reconstructed.So, we consider this radius as a border line situation of failure of MUSICAL under noise.However, other ranges of diameters may be included in the future.After constructing the geometry, the emitters are distributed randomly on the surface of the geometry with an emitter surface density 500 emitters/µm 2 .This emulates outer membrane label of mitochondria.The emitter density is chosen heuristically based on expert input.
A vesicle is simulated as a sphere of radius randomly chosen from the range [25,500] nm.The emitters are distributed on its surface with an emitter density of 2000 emitters/µm 2 , chosen heuristically.The surface labeling emulates the membrane of vesicles.
There may be multiple instances of a structure in an image region, however only one type of structure is expected in one fluorescent color channel.Therefore, we simulate multiple actin filaments, multiple mitochondria, or multiple vesicles in each example.The number of them in a single image is chosen randomly from the range [3,10], [1,4], and [10,30] for actin filaments, mitochondria, and vesicles respectively.We impose some boundaries on the 3D space in which the sample may be present.These are x, y ∈ [−2.5, 2.5] µm and z ∈ [−500, 500]nm where z = 0 represents the focal plane of the microscope.
A2: Photokinetics simulator − In reality, there are multiple distributions associated with the emissions of photons when the fluorescent molecule is active, the fluorescent molecule entering, dwelling, and exiting the dark states, the photobleaching etc. [43][44][45].However, at image acquisition rates of milliseconds to seconds, the need of knowing and simulating individual distributions is obviated, and simpler probability distributions can be used to represent the macro-behaviour of fluctuations in photon emissions arising from photokinetics.This simplification may not apply if specific dyes are used with long dark states, but this is neither the requirement of fluctuations based nanoscopy techniques nor are the regime in which they provide a particular advantage over other localization based methods [4,5].
Therefore, we use the simpler photokinetic model based on the implementation of [46].In this model, a single emitter is characterized with a 2-state model.The states are simply called on in which the molecule is producing photons, and off in which case no photons are emitted.The time the emitter stays in each state is modeled with an exponential distribution controlled by two parameters called τ on and τ off .These correspond to the mean time the emitter spends in each state.The emission rate of photons is considered constant and therefore, the number of photons emitted while the emitter is in the on state is just the rate by the total time.As a result, the duty cycle is then τ on /(τ on + τ off ).All emitters are considered identical and therefore all of them in a sample have the values of τ on and τ off .In order to emulate a range of photokinetic behaviour, we choose the values of τ on and τ off as integers taken from the ranges [1,5] and [1,20], respectively.It is of interest to observer that the pair (τ on , τ off ) having a value (5,1) indicates extremely dense fluctuations, i.e. an extremely challenging condition for fluctuation based nanoscopy techniques where they do not provide significant resolution enhancement.On the other hand the pair having value (1, 20) is a conducive regime for such techniques.
A3: Microscope simulator − The imaging function of the microscope is simulated using Gison Lanni model of point spread function (PSF) [47].It simulates the blurring introduced by the microscope as the light passes through the coverslip and microscope optics to the image region.We use a fast implementation of Gibson Lanni PSF reported in [48].
Among the various parameters needed for simulating the Gibson Lanni PSF, the following were used as a constant for the setup.The sample is assumed to be mounted on a glass surface and present in water medium.A glass coverslip of 170 µm is assumed to be present between the sample and the microscope.For simulation purposes, the emission wavelength of the emitters is assumed to 660 nm.In practice, the emission wavelength is a characteristic of the fluorescent dye chosen for the experiment for a particular type of structure and is generally in the range [488, 650] nm for visible range fluorescent dyes.However, the manifestation of the wavelength is in terms of achievable resolution and the spread of PSF.The same effect can be achieved through varying the NA of the microscope.Therefore, choosing a fixed wavelength but sufficiently large span of NA allows us to simultaneously consider variety of microscopes and dyes without loss of generalization.The numerical aperture (NA) of the system is selected randomly from the range [1.2, 1.49].The camera pixel size in terms of the sample dimensions is computed by dividing the actual hardware pixel size of the camera with the magnification of the microscopes.We consider pixel sizes in sample dimensions directly and select candidate values most popularly encountered in high NA microscopy systems.Four different pixel sizes were considered for simulation (65, 80, 108 and 120 nm), each pixel size getting used for exactly one quarter of the total number of samples simulated for each type of structure.
A4: Noise simulator − The noise simulation approach is taken from [8,20], and we refer the readers to the supplementary material of [19] for precise details.There are two main sources of noise.The first is the camera's electronic noise that contributes a noisy background.The second is photon noise, which is based on Poisson statistics of arrival of photon at the expected location.Let the simulated microscopy image, scaled to span [0, 1] be denoted as I.Moreover, let the signal to background ratio be SBR and the measured background values in the camera with closed shutter be b.First, a microscopy image of the expected signal strength (such as observed in the microscopy data) and having a constant background b is simulated as I ˆ= b(SBR − 1)I + b.Then, the noisy microscopy image I ˜is generated such that each pixel in I ˜is computed using a Poisson distribution with mean equal to the corresponding pixel in I ˆ.With ∼ms exposure time used in FNMs, the electronic noise is significantly stronger than the photon noise.In such situation, signal to background ratio (SBR) is a practical measure of noise.The original article of MUSICAL reports super-resolution for SBR ≥ 3. Therefore, we simulate our dataset with the lowest SBR (i.e.highest level of noise) recommended for MUSICAL.Furthermore, we noted that a large variety cameras have background noise in the range [50.120] on a 16 bit intensity scale, depending upon the type of camera, the imaging speed, the cooling system, and other usage factors.We used a constant value b = 100 in our simulations.
A total of 3000 noise-free and noisy image stacks were created, each containing 200 frames.Among them, 1000 pairs were simulated each for actin filaments, microtubules and vesicles.75% of the pairs were used for training and the rest for testing.The selection was performed randomly.

MUSICAL (block B of Fig. 2)
For each pair of raw microscopy data, MUSICAL is applied independently on the noise-free and nanoscopy raw microscopy image stacks to obtain one pair of training data.Here, we explain MUSICAL and the MUSICAL parameters.
MUSICAL achieves super-resolution by performing spatio-temporal analysis of the fluctuations in the measured image stack and exploiting that the noise is stochastic while the fluctuations arising from photokinetics are modulated through the PSF in the microscopy images.MUSICAL decomposes the image stack using singular value decomposition or eigenvalue decomposition [8,49,50] into a orthogonal set of vectors called eigenimages, and eigenvalues uniquely associated to them.In particular, the eigenimages with high eigenvalues are associated to the actual emitters and therefore, are strongly related to the PSF of the system.Specifically, eigenimages associated to the actual structure are expected to be linear combinations of the PSFs at emitter locations.These eigenimages (the ones with high eigenvalues) are grouped into one set, called the signal subspace, since they span the images measured in the stack.Notably, only a subset of all the eigenimages belongs to this set.The remaining are grouped into another set called the noise subspace.The key property exploited by MUSICAL is as follows.The signal and noise subspaces are orthogonal, and the signal subspace is given by the linear combinations of PSFs at the emitters.Therefore, the PSFs at emitter locations are also orthogonal to the noise subspace.As a result, a test point at an emitter location has a large projection in the signal subspace and small in the noise subspace.On the other hand, if a test point is far from the actual structure, it has small projection in the signal subspace, and large projection in the noise subspace.These two situations are combined in an 'indicator function' that takes the ratio of the projection in the signal and noise subspace.As a result, its value is high for test points at emitters locations and low otherwise.
MUSICAL needs the following knowledge about the microscopy data: the emission wavelength of the fluorophores, the pixel size of the camera as scaled for sample dimensions, and the NA of the microscope.In addition, MUSICAL needs three algorithmic control parameters: (a) a threshold for assigning eigenimages to the signal and noise subspace, (b) a contrast parameter α, and (c) the level of subpixelation which determines the fineness of the grid and pixel size in the nanoscopy image.We used a recent work on automatic soft threshold for the first parameter, which obviates the need for user-specified threshold [19].Further, α has been set to 4 following the recommendation of [8], and subpixelation of 10 since this subpixelation gives pixel size well below the smallest structures we have considered.

Autoencoder architectures
We tried two different architectures for this task.The first one is the U-Net [51] model.It was designed specifically for biomedical images and it is known for good performance in the medical imaging domain.The inspiration behind using U-Net is primarily the similarity of the application domain.The second model is the Feature Pyramid Network (FPN) [52].Although FPN was originally designed for object detection tasks, many have successfully utilised the architecture for image-to-image tasks like semantic segmentation and instance segmentation [53,54].Inspiration for this choice was to see if the impressive performance seen with an image-to-image task like segmentation can transfer to a denoising task like ours.The architectures are shown in Fig. 3.For each model, we considered two options for convolutional layer architectures, namely R-34 and R-50, where R stands for residual network.This was done to explore both deep and deeper architectures.We found that FPN with R-50 model often did not converge while training.So, we drop discussion on this combination hereon.
We note that some changes in the input images and the architectures were made to accommodate for the special case of the chosen nanoscopy algorithm, as described next.The simulated input images had 32-bit floating point pixel values.Both the input and output images were normalized using max normalization.Without the normalization, the neural network has to deal with an ill-defined problem as the actual dynamic range of the data may be much smaller than 32 bit for the noisy nanoscopy image.This is a consequence of the MUSICAL's nanoscopy performing indicator function.Further, learning the intensity span of the actual 32 bit image for the noise-free nanoscopy for each case is more challenging than defining the intensity in the output image to be in the range [0, 1].Therefore, the max normalization makes the input and output intensity ranges better-defined and mapping more learnable.At the same time, loosing the actual intensity value in the output is not considered a problem since the quality of the output image is unaffected and its interpretability unaltered due to it.This is because MUSICAL and several other FMNs are qualitative reconstruction techniques in the sense that the intensity values generated by them indicate statistical significance of presence of emitters but not values of physical quantities.The only exception to the best of our knowledge is balanced super-resolution optical fluctuation imaging (b-SOFI) [55].
It is arguable that the actual emitter distribution or their equivalent image with a synthetic hypothetically narrow point spread function may be used as the ground truth.However, it is then not a denoising but a super-resolution problem.Further, the choice of the point spread function, its spread, etc. are a matter of subjective selection.Further, it is noted in [14] that the different FNMs contribute different advantages and enhance different features of FNM data.In this sense, it is of interest to retain the basic characteristics of the FNMs and suppress only the artefacts generated by them.Therefore, we use the nanoscopy images generated by the chosen algorithm on the noise-free data as the target images.
The selected architectures is then modified to fit the new output format of 32 bit floating point images with intensity range [0, 1].To do so, a rectified linear unit (ReLU) activation layer is added at the output to force the lower limit of the output image to be zero.This layer is followed by a max normalization step to scale the output intensity values in [0,1].
Choice of the Loss Function The choice of the loss function determines the nature and quality of learning.Since nanoscopy image denoising for FNMs is new, we experimented with a variety of loss functions presented below.We use the following notations.The input denoised image (the output of the autoencoder) is denoted as I ˆwhile the corresponding noise-free image (the target or ground truth for the autoencoder) is denoted by I.The pixel indices are specified by n and the total number of pixels is N. Therefore intensity in the denoised image for nth pixel is denoted as I ˆn and similarly for the noise-free image.
L1 loss: The pixel-wise mean absolute error between the output and the ground truth image is: L2 loss: The pixel-wise mean squared error between the output and the ground truth images: The detailed expression and further insights into SSIM are available at [56,57].It is remarkable that two images should be similar to each other in terms of the overall luminance, contrast and structure for the SSIM value to be large, which trends at the level of individual pixels are not considered too important.The SSIM values are limited to be between 0 and 1 using ReLU, thus we compute the SSIM loss function as L SSIM (I ˆ, I) = 1 − SSIM(I ˆ, I).MS-SSIM loss: For calculating MS-SSIM [58], the image pairs are iteratively scaled down by a factor of 2 down M number of times.Let us denote I ˆm and I m as the denoised and noise-free images after the mth scale down.c(I ˆm, I m ) and s(I ˆm, I m ) are calculated for all values of m ∈ [1, M] while l(I ˆM, I M ) is only calculated only for Mth scaled down version.Then, MS-SSIM is computed as: where α M , β m , and γ m are powers imparted to luminance, contrast, and structure terms for the relevant scales.In the original article [58], their values are set to 1, and we have used the same.The MS-SSIM values are also limited to the range [0, 1] using ReLU.We therefore compute the MS-SSIM loss function as L MS-SSIM (I ˆ, I) = 1 − MS-SSIM(I ˆ, I).Perceptual or VGG loss: The perceptual loss [59] is calculated by comparing the high-level representations obtained by feeding the images to a pretrained benchmark convolutional network, such as VGG-16 [60].The activations obtained from the 4th, 9th, 16th and 23rd layer (equivalently 2nd, 4th, 7th and 10th convolution layers) in the VGG-16 model by passing the denoised and noise-free images as inputs are used for comparison.Let A ˆl and A l denote the activation maps obtained from the lth layer of VGG-16 for the denoised and the noise-free images, i.e.I ˆand I, respectively.Then the VGG loss is given as: We note that the above loss does not maximize perception [61], and may need the help of adversarial loss for maximizing the perception [62].
Weighted combination: Apart from the loss functions described above, a few more loss functions were devised by using a weight sum of two loss functions.
Two such combinations are explored -a combination of MS-SSIM and L1 loss functions (with β = 0.6), and a combination of SSIM and L1 loss functions (with β = 0.4).The combination of MS-SSIM and L1 losses is inspired from [63], where it was found slightly superior to either one of the losses individually.The latter combination (SSIM and L1 loss) was inspired by the appreciable performance of the individual loss functions on the simulated dataset.The optimal weight parameter β was determined empirically.
Training setup:For training, Adam [64] optimizer was used with a learning rate of 0.001.The models were trained for 60 epochs.PyTorch library was used deep learning.

Results
We validate our approach using both simulated and experimental data and analyse the results.

Denoising results on simulated validation dataset
A test set was created comprising of 250 image pairs each of actin filaments, mitochondria, and vesicles using the raw microscopy data simulator discussed in section 2.1.

Quantitative comparison of different methods
We perform quantitative comparison of different models and loss functions using peak signal-to-noise ratio (PSNR), which is a prominent quantitative metrics used for gauging denoising performance.For simplicity, we refer to a combination of a loss function and a model as a method.Therefore, essentially, we compare 21 different denoising methods in Table 1 using PSNR.The separation of the test results for the different structures is done to appreciate if the geometry has a bearing on the achievable denoising.It is noted in Table 1 that UNet (R-50) together with VGG performs the best for vesicles and mitochondria and the second best for actin filaments.We note that the PSNR values were computed using noise-free MUSICAL images as the reference.Qualitative analysis: Single valued quantitative metrics such as PSNR are often unsuitable in representing the quality of images, specifically for the case of low-contrast microscopy images.An illustration of this point is given in Fig. 4. The output produced by the method labeled M-3 is much cleaner, with least background debris, while the one produced by FPN (R-34) | L2 has visible artefacts right along the edges of each of the strands.Despite this, the PSNR of the former is 37.48 dB while the latter is 40.02 dB.In contrast, SSIM metric values the denoising output from FPN (R-34) | L2 at a lower score of 0.944 while M-3 is valued at a higher score of 0.95.This is more true to the reality than the PSNR score.However, there will be other cases, where SSIM is not a good indicator of quality.Therefore, we perform a qualitative analysis of aretfact suppression.We consider the following three methods for qualitative comparison:  The results for mitochondria are presented in Fig. 5.It is seen that M-1 (5(c)) and M-3 (Fig. 5(e)) restore the resolution and construct the boundary of the membrane.M-2 (Fig. 5(d)) also appears to perform well, unless the intensities at a line section (shown as yellow line in Fig. 5(a)-(e)) are observed (Fig. 5(f)), where M-2 shows a jittery intensity profiles between the two peaks, which may be mistaken as resolving further small features.Here, we have shown only one line section, but we observed similar effect along multiple other sections.Another observation is that the all the methods methods seem to suppress out-of-focus structures (left bottom tail in a-e), but not as effectively as the noise free image (Fig. 5(b)).The contrast stretched and over-saturated versions of the images (Fig. 5(g)-(k)) show that the out-of-focus structures are present in all the images, including the noise-free, however with significantly lower intensity as in seen in Fig. 5(b).In this sense, better optical sectioning supported by the noise-free image is still not achieved by the denoised images, although M-3 works the best in this sense.Lastly, from Fig. 5(g)-(k), it is seen that M-2 and M-3 and significantly more effective in terms of suppressing background debris artefacts.We noted similar observations for actin filaments, i.e.M-3 produces the thinnest filaments and M2-M3 suppress the background debris well.Further, M-3 performs better in suppression of the out-of-focus structures.The results are not reported for space constraints.
It was indicated in Fig. 1 that the noisy nanoscopy image had background debris due to out-of-focus structures and witnessed reduced sharpness in the features.We show the denoising results for the same example in Fig. 6.The pesudocolor rendering and different contrasts in Fig. 6(a)-(b) help in observing these effects more clearly.The yellow line section is used to investigate both the effects.The log-intensities at yellow line sections in Fig. 6(a),(e-g) are shown in Fig. 6(d).We see that M-1 and M-3 follow similar trends with each other and the noise-free image, with lower valley in the background region.M-2 generally follows the trend well in the high intensity zones, but introduces peaks of small intensity in the background.Overall, M-2 and M-3 are more effective in suppressing background, and M-1 and M-3 are behttps://orcid.org/0000-0003-4462-4600tterat improving the sharpness of the image.Generally, for simulated examples, M-3 presents the best qualitative results.Testing nanoscopy images from other nanoscopy algorithms Here, we consider if our trained models can be directly applied to nanoscopy images generated by other FNMs.For the same vesicles example as shown in Fig. 1, we use the noisy raw microscopy image stacks and processed them with three different methods, namely SOFI [6], bSOFI [55], and super-resolution radial fluctuations [7] to obtain noisy nanoscopy images.These are then processed using M-3 to generate denoised nanoscopy images.The results are presented in Fig. 7.It is seen that M-3 does not denoise SOFI and bSOFI images well, but seems to performing well for SRRF.Whether it works well on SRRF data of wider variety is still an open question.Therefore, we conclude that even if some transferability may be present across methods that generate similar type of features for certain structures (such as seen here for SRRF on vesicles), such an assumption cannot be generally applicable across FNMs and either fresh training or retraining on data created using specific FNMs should be undertaken.At the same time, we note the concept of the proposed method is generalizable, but not the trained models themselves.

Denoising results on experimental data
We performed denoising experiments on real microscopy data of actin filaments (invitro preformed), microtubules in fixed cells (these are thick fiber like structures not included in our training data), liposomes (lab-fabricated agarose stablized artificial small vesicles), and mitochondria in living cells.The results for them are presented in Figs.8-10, respectively.The experimental details and discussion on results for each data is presented below.In vitro preformed actin filaments, Fig. 8 This data is taken from the publicly available data of [8].We use the first 500 frames.The relevant imaging parameters are NA 1.49 total internal reflection microscopy, pixel size 65 nm, and emission wavelength of 590 nm.Detailed protocol can be found in [8].The denoising results for a sample of actin filaments are shown in Fig. 8.We choose two regions, shown in green and yellow boxes in Fig. 8(a) to consider regions with different local SBRs.The SRBs for the green and yellow boxes are 3.2 and 3.63, despite the peak intensity in the green box being significantly higher.This is because the density of structures in the green box is significantly larger than in the yellow box.We see that all the denoising methods perform similar with minor difference.It is seen that the portion in the top with a loop that appears saturated in the noisy nanoscopy image (Fig. 8(b)) gets better intensity distributed after denoising (Fig. 8(c)-(e)).This indicates better contrast distribution close to junctions.The log scale versions of the nanoscopy images for the green box (Fig. 8(d)-(g)) clearly indicate that the background region is suppressed well by all the methods.However, it is noted that M-3 restores the continuity of some low-intensity strands, a feature that is missed by M-1 and M-2.For the sparser region (yellow box).The denoised results in Fig. 8(i)-(k) appear similar and are effective in restoring the visibility of the strands.When seen in the log scale (Fig. 8(l)-(o)), it is evident that M-3 is better at restoring continuity but M-1 is better at suppressing the background.
Microtubules in fixed cell, Fig. 9 We consider an example of microtubules in fixed cells taken from [65] as another challenge case.This is because a microtubule has geometric similarity with actins and mitochondria in the sense of tubularity but is significantly different in terms of radius.The radii of microtubules is in the range 25-30 nm while those of actin filaments are in the range 5-7 nm.Detailed protocol of the considered example can be found in [65].The first 500 frames of the second example of microtubules in fixed cell are used.This data is also publicly available.The relevant imaging parameters are inverted epifluorescence system of 1.49 NA, 108 nm pixel size, and emission wavelength of 667 nm.The SBR of the selected region is ∼4.Since the sample and illumination are 3D, out-of-focus light is also a problem.The result is shown in Fig. 9.The sample has a dense structure with a number of thin strands, not previously encountered in the simulated data.The results are shown in Fig. 9.We can see from Figs. 9(b)-(e) that M-1 to M-3 all manage to enhance the continuity of the strands while also suppressing the low intensity, out-of-focus strands.On a qualitative front, M-1 seems to perform the best with good amount of clarity in the individual strands.This example illustrates some potential of generalization of the model for untrained structures and structural density on structures geometrically and optically similar to those simulated for the training dataset.
Liposomes stabilised in agarose, Fig. 10 This is one of the challenging samples with structures having radii of 125±30 nm.This data is also taken from a publicly released dataset of [14].The imaging parameters of relevance are epifluorescence microscope of NA 1.42, pixel size 80 nm, and emission wavelength of 537 nm.The liposomes were lab fabricated with average diameter of 250 nm.The emulate vesicles with membrane labels.They were stabilized in agarose, which is likely to contribute to background through autofluorescence.The fragile nature of liposome assembly also means that there may have been debris from liposomes that were disintegrating before the fixation in agarose.These sources of extra background case the SBR at the bright spot seen in Fig. 10(a) to be ∼4.7 and at the second bright spot to be ∼3.2 despite the simplicity of the structures, fixation, and relatively favorable sparsity of liposome distribution.Further, the resolution limit for the microscope parameters is approximately 190 nm for the noise-free case.Therefore the liposomes are not large in comparison to the resolution limit.The denoising results are shown in Fig. 10(c)-(d), while Fig. 10(b) shows the noisy nanoscopy image.While the denoising or artefact suppression effect is not evident in the denoised images in the first glace, the contrast enhancement and visibility of liposomes other than the two clearly defined ones is witnessed.A further insight is obtained from the log-scaled images shown in Fig. 10(g)-(k), where the background suppression by M-1 to M-3 is easily noticeable.
We note that the methods were trained for liposomes of radius distributed uniformly in the range [25,500].Since in this case the brightness is proportional to the size, smaller object will produce dimmer signals.This is particularly important for sub-diffraction structures where the resolution-limited image will display intensity proportionally to their sizes.Since MUSICAL introduces non-linearities in order to achieve super-resolution, this also means that inherently will reduce the contribution and therefore the appearance of dimmer objects.As a result, the training set is implicitly adding a bias toward larger structure.However, the structures in the experimental data have a narrow distribution around the resolution limit which explains why the results seems different from the ones obtained for the simulations.Therefore, there exists a margin for customization where the training set contains narrower distribution of the diameter.
Mitochondria in a living cell, Fig. 11 This data is measured in our laboratory on living cardiomyocytes, in which mitochondria were labeled using MitoTracker green dye which are live cell compatible.Two hundred frames were acquired at a frame rate of 40 frames per second but with an exposure time of 3 ms.The other relevant microscopy parameters are epifluorescence microscope of NA 1.42, 80 nm pixel size, and emission wavelength of 520 nm.The SBR of the image stack at the brightest point shown in the green box (labeled k) in Fig. 11(a 11(j)-(l).The denoising properties are well exhibited in terms of sharpening the boundaries and suppression of intensities inside the mitochondrial boundaries.Further, it is seen appreciable that the denoising works over regions of different intensities quite well, especially as noted in Fig. 11(l) which corresponds to a region of weak intensity.There, it is notable that all the three methods manage to improve the continuity of the left strand compared to the noisy MUSICAL reconstruction.All the three methods perform equally well on this mitochondria example.

Discussion
Generalizability and scalability Apart from the visually better results obtained on structures that the models were trained on, we observe the models performing generally good on new weakly-related structures that the model was not trained on.However, this generalizability is not observed for examples outside the training space, such as endoplasmic reticulum (Supplement 1) and randomly distributed debris emitters (Supplement 1).We also note a general restoration of resolvability of structures and a reduction in the background debris.We deliberately train for an SBR value which is considered quite poor in the hope that it can be generalized for data with better SNR as well.This is clearly witnessed in our results on actual experimental microscopy data (SBRs of them reported in Supplement 1, together with a systematic study of SBR for a simulated example).It is an interesting deviation from deep learning based denoising, where sensitivity to noise levels is noted [66,67].This is a testimony to the fact that artefacts introduced by FNMs have certain structural aspect due to computational processing, as opposed to purely pixel-wise independent noise in raw image data.The results also verify that the unconventional approach of simulation-supervised deep learning works well for this problem and helps in circumventing the ground truth absence problem.The random selection of the values of a variety of parameters ensures that diversity of situations are included without introducing significant bias.Nonetheless, some of the quantities at present are fixed either for simplicity of simulations or for limiting the size of dataset (and thereby the time needed for creating it).In the future, the same dataset may be expanded for more variety of conditions, or more independent datasets may be created for exploration of transfer learning across FPMs, structures, microscopes; and other sources of artefacts.
Models and loss functions Coming to specific methods, we see M-1 performing really well on a variety of structures including structures of varying thicknesses, and even denselypacked structures like microtubules.M-2 and M-3 lag slightly behind but still seem to work really well at suppressing the background debris.In summary, M-1 comes across as the most generalizable model producing good results across a variety of structures with appreciable resolution improvement and significant background noise reduction.It is also the method that generally resulted into leading PSNR values in our test data.From our results, it appears that VGG-based perceptual loss function used in M-1 provides good qualitative as well as quantitative performance.It is possibly due to the use activation maps of abstract nature at multiple depths that VGG loss function is able to learn sophisticated artefact suppression model.On the other hand, we think that the combination of SSIM and L1, such as used in M-2, provides a good balance between perceptual quality and pixel wise match.
Metrics and the value of quantitative analysis.The training procedure of deep learning methods need loss functions and therefore inherently uses some form of quantitative indicator of quality of denoising.Nonetheless, as exemplified through Table 1 and Fig. 4, a single valued quantitative metric may fail to be an absolute hallmark of quality assessment, especially for the microscopy images in general and nanoscopy images in particular.It might be interesting in the future to design quality metrics customized for this field of science.

Conclusion
In this work, artefact removal for a selected fluctuations based nanoscopy method is reported.Artefacts in such nanoscopy methods are attributed to the noise, the photokinetics, as well as the computational treatment of data.The problem of ground truth absence is effectively dealt with simulations that realistically mimic every aspect of measurement.It is seen that autoencoder deep learning through simulation-supervised training dataset is quite effective in suppressing artefacts arising from photokinetics, raw microscopy, and nanoscopy algorithm induced non-linear data distortions.Our approach is also observed to be generalizable across multiple different structures and nanoscopy algorithms not used during the training process and thus previously unseen by any of the models.Nonetheless, scaling the dataset for more variety of conditions can be easily incorporated or transfer learning can be explored.In the future, we wish to add more versatility to the simulation-supervised training dataset and explore the design of suitable metrics for quality analysis in the nanoscopy images.

Fig. 1 .
Fig. 1.A side by side view of noisy (SBR3) and ideal MUSICAL reconstructions (simulated) is presented.The top row shows an example of mitochondria while the bottom row shows an example of vesicles.In some case, as shown in (c,d) artefacts can suppress resolvability of features in addition to contributing background debris.In other cases, it may compromise the sharpness of certain structures (yellow arrows in g) and reduce optical sectioning by reconstructing out-of-focus structures (as shown with red arrows in g).Number of frames used to generate nanoscopy images 200.Diffraction limited image (mean image of all frames) is abbreviated as Diff.Lim.Scale bar 500 nm in [a,e] and 1 µm in [b-d, f-h].Example noise-free and noisy data are provided in supplementary Fig.S1 (a,c).

Fig. 2 .
Fig. 2. The proposed approach of artefact removal is illustrated here.The asterisk (*) shown in block C1 indicated the low-dimensional latent feature space of autoencoder, suitable for representing feature-deficient microscopy and nanoscopy images.Relevant details of the labeled blocks appear in section 2.

Fig. 3 .
Fig. 3. Block diagrams of the autoencoder architectures explored in this work.

Fig. 4 .
Fig. 4. The results of denoising the noisy image (a) using the method with best PSNR (c) and the method M-3 (d) listed in section 3.1, and quality comparison with the noise-free image (b).Scale bar 1 µm.

Fig. 5 .
Fig. 5.A qualitative comparison for mitochondria where artefact suppression restores resolution.In a-e, the contrast is adjusted manually for best visualization of resolution restoration.The intensities along the yellow line shown in (a-e) are plotted in f. g-k show saturated versions of a-e, where the the out of focus regions and background debris are also visible.Scale bar 500 nm.

Fig. 6 .
Fig. 6.Qualitative comparison for an example of vesicles sample.(a,b) show the same noise-free image rendered in two contrast stretch.The contrast c1 in (a) is set so that the the structure marked in yellow triangle can be seen in both noisy and noise-free images.The contrast c2 in (b) is set so that the appearance and visual thickness of the bright spot marked in red triangle appears similar to the noise-free image.The intensities along the line sections shown in (a,b-f) are compared in (g).Scale bar 500 nm.

Fig. 8 .
Fig. 8. Artefact removal from nanoscopy image of in-vitro preformed actin filaments.The 2nd and 4th rows show results in log scale.Their contrast is adjusted such that the elliptic blob in the top portion of green ROI and the fork in the bottom right of the yellow ROI appear visually similar across the row.Scale bar 2 µm in a, 500 nm in b-o.

Fig. 9 .
Fig. 9. Artefact removal from nanoscopy image of microtubules neither simulated nor used in training.Bottom row shows results in the log scale.Scale bar 1 µm.

Fig. 10 .
Fig. 10.Results of artefact removal from nanoscopy result of liposomes.f-i show results in log scale.The contrast in log scale is adjusted such that the elliptic blob to the left of the color bar appears visually similar.Scale bar 500 nm.
) is ∼3.8.The noisy and denoised nanoscopy images are shown in Fig. 11(b)-(e), and their log versions in f-i.The log versions clearly indicate the removal of debris from the background, with the best removal contributed by M-1.In order to facilitate the visualization, zoom-ins of three regions are shown in Fig.

Fig. 11 .
Fig. 11.Results of artefact removal from nanoscopy result of mitochondria in living cells.(f-i) show results in log scale.Scale bar 2 µm.