Machine learning approach for single molecule localisation microscopy

: Single molecule localisation (SML) microscopy is a fundamental tool for biological discoveries; it provides sub-diﬀraction spatial resolution images by detecting and localizing "all" the ﬂuorescent molecules labeling the structure of interest. For this reason, the eﬀective resolution of SML microscopy strictly depends on the algorithm used to detect and localize the single molecules from the series of microscopy frames. To adapt to the diﬀerent imaging conditions that can occur in a SML experiment, all current localisation algorithms request, from the microscopy users, the choice of diﬀerent parameters. This choice is not always easy and their wrong selection can lead to poor performance. Here we overcome this weakness with the use of machine learning. We propose a parameter-free pipeline for SML learning based on support vector machine (SVM). This strategy requires a short supervised training that consists in selecting by the user few ﬂuorescent molecules ( ∼ 10-20) from the frames under analysis. The algorithm has been extensively tested on both synthetic and real acquisitions. Results are qualitatively and quantitatively consistent with the state of the art in SML microscopy and demonstrate that the introduction of machine learning can lead to a new class of algorithms competitive and conceived from the user point of view.


Introduction
Single molecule localisation (SML) microscopy is an invaluable tool for biological discoveries [1,2], however its success in obtaining a super-resolved (or nanoscopy) image depends on the optimization of several parameters, both from the experimental and image processing sides [3,4]. With SML microscopy (e.g. PALM [5] and STORM [6]) fluorescent molecules are not active all together, they are activated at random space and time, with the aim to gather a stack of low resolution images made of few separated molecules each. A processing pipeline is then fundamental in order to obtain a super-resolved final image, circumventing the Abbe diffraction limit [7]. Generally, a SML algorithm achieved super-resolution in two main steps: a first phase of detection, or segmentation, where fluorescent molecules are localised at a pixel level, and a subsequent position refinement through fitting algorithms or faster and less accurate techniques, like centroids. These two phases are typically preceded by low-pass or band-pass filters in order to lower spurious signals contribution.
Despite such consolidated pipeline, the detection step is often made of heuristics e.g. the user is asked to select a threshold to discriminate active molecules based on image intensity, signal-to-noise ratio or the number of collected photons. The two extrema of a bad threshold selection are: an under-sampling of the biological structure, that cannot thus be recovered, or the presence of artefacts. The first effect has been experimentally demonstrated by [8] and it is in agreement with the Nyquist sampling theorem [9]; the second is due to background mistaken for active molecule, whenever the threshold is locally too low.
From a machine learning perspective, we can set the detection as a binary classification problem that can be addressed with a plethora of algorithms not yet exploited in SML microscopy. Among those we choose support vector machine (SVM) [10] because is able to obtain a spatial filter directly modelled on part of the acquired data, thus ideally being more robust to the acquisition variability that naturally characterize these images. Moreover, to finally localise the molecules with the required accuracy, we suggest the use of a linear and recursive formulation of non-linear Least Squares fitting, similarly to what originally proposed in [11].
This work is based on the learning framework investigated in [12]. Here, the proof of concept is extended and the pipeline of [12] is embedded within a GUI in order to be usable for the community. The efficacy of the approach is proved on real datasets and performance are compared with ThunderSTORM [13], the best proposal on SML low density images according to [14]. Though ThunderSTORM requires the user to select some parameters, it has also a default setting for non-experts.
There are other works aiming at the automation of the parameters selection, they rely mainly on Bayesian framework (like [15], [16], [17]), or statistical significance, as in SimpleSTORM, [18]. The proposed approach is more intuitive and it is not based on any assumption on noise statistics: our hypothesis is that each frame can be divided in smaller pieces, each one having or not an active molecule inside. We claim, and experimentally verify, that these two groups, or classes, are linearly separable and the separation line can be computed directly on the first frames of each acquisition.
The paper is organized as follows: Section 2 presents the methodological tools we used and their implementation details, Section 3 shows results on synthetic and real datasets comparing them with ThunderSTORM, conclusions are drawn in Section 4.

Support vector machines for detection of candidate regions
Support vector machine (SVM) [10] belongs to the max-margin supervised machine learning family and has demonstrated good performance in several applications. The idea behind our SVM-based algorithm is that activations and background are linearly separable and the separation among them is the hyperplane below: where x is each vectorized image patch lying on the hyperplane represented by the parameters w and b. The way to find w and b values is to formulate an optimization problem like in: where the first part of the objective function is a margin defined as the minimum distance among the closest examples of the two classes, the support vectors, and the hyperplane that separate them, while the second part is a regularization term, here ψ is the hinge loss function. The optimization is performed over a training dataset made of representatives of both classes. Each data object x i has an associated label y i ∈ {−1, +1} that is used to minimize the miss-classification error. From Eq. (2) we can see how the optimization problem is convex thus algorithms to solve it are particularly efficient [19]. In any case, the learning is performed off-line at the beginning of the acquisition session, with a very limited computational burden. For the SVM implementation, we used liblinear [20], which implements the SVM with linear kernel and the value of the constant C (see Eq. (2)) is chosen with a k-fold cross-validation step, k = 5, before the training. As detailed in [12], the best patches dimension is 7 × 7 and its representation in Eq. (2), x i , is obtained with a simple vectorisation, which means that patch intensities are used as features.
By producing a spatial filter, the SVM algorithm both encloses pre-processing and detection. At inference, when a new image is given, a linear transformation is applied on each frame. Such transformation, formalised by a convolution of the image with a filter (the so-called SVM model), has the peculiarity of being learned from the dataset that has to be analysed. Thus, the framework provides a custom filter with the property of being discriminative given the possible sensor noise and aberrations of the instrument. Specifically, the filter is a patch of 7 × 7 made of the hyperplane coefficients of Eq. (1), normalised to have w = 1.
The normalisation is fundamental because the obtained model is not affected by the absolute intensity of the frames used for training and this is crucial in order to be consistent with a dropping in background intensity over time which affects real experiments.
The SVM model computed over a given dataset is thus an ad hoc spatial filter that is used in the pipeline to detect the activated molecules. Its output is a binary value for the assigned class, 1 if a patch has a molecule and −1 otherwise. We consider as candidates for a subsequent localisation refinement only patches labelled with 1.

Localisation at subpixel scale
Each emitting molecule is typically represented as a bivariate Gaussian function. This function is considered to be a good approximation of the point spread function of a microscope, thus localising an emitting molecule has the meaning of estimating the Gaussian peak. Two are the approaches generally proposed: Maximum Likelihood Estimation or Least Square fitting. We suggest [3] for an extensive comparison of the two methods. Based on [3], [14] and [4], we opted for a Least Squares fitting that does not need any camera noise model. However, the SVD-based detection approach proposed above is valid with any fitting methods. We do not detach from the literature for the subpixel estimation but, rather than applying a non-linear Least Squares fitting of a Gaussian, a parametrized Gaussian is linearised as proposed by [11].
In particular, each molecule is represented by: where σ x , σ y , µ x , and µ y are the x, y variances and mean values of the Gaussian, respectively, while k is proportional to the number of photons. Taking the logarithm of Eq. (3) is a convenient step because the solution of the Least Squares can be obtained from the inversion of a linear system in {A, B, C, D, E }, defined as in Eq. (4) and The quadratic function under consideration represents a bi-dimensional Gaussian if and only if the following constraints are met: Vol. 9, No. 5 | 1 May 2018 | BIOMEDICAL OPTICS EXPRESS 1683 The implemented algorithm verifies that these conditions hold, if not the patches are discarded. Each single patch is composed by the tuples (x i , y i , I i ) with x i , y i representing the position and I i the grey scale intensity of the i pixel. All the elements in the tuple are positive integers, I i because a negative value has no physical meaning, x i and y i because they are indexes. For a patch with N pixels, our goal is to minimize: whereÎ i is the logarithm of I i , so that the sum of squared differences between the model and the data is minimum.
Computing the derivative of the residual sum of squares with respect to each parameter and setting it to zero, we obtain the linear system: where A good strategy is to weigh the data, since in logarithmic scale a larger error is associated with small values of I i . Our goal (at each iteration) is now to minimize the following expression: where the weight w i is the squared predicted value of the model obtained in the previous iteration. This leads to the following linear system: where:M and with k being the current iteration number; G 0 (x i , y i ) is an initial guess, a Gaussian centered in the origin of the detected patch under analysis. The outcome of this step is a list of points for which the iterative procedure has converged. The number of iterations that states the convergence is N = 20, regardless the dataset. A check on the convergence allows the algorithm to discard points for which this condition is not reached.

2D localisation pipeline and GUI
Our proposal is sketched in Fig. 1 where it is compared with the schematic of the most common pipelines in the literature. The first green block, called Support Vector Machine aims at substituting the red blocks of Pre-Processing and Detection. Localisation, instead, is a common operation even if the suggested approach has some novelties with respect to what is proposed in the literature of SML microscopy [4] and [21]. The pipeline is embedded within a GUI that, together with the main functions like loading datasets or exporting results, guides the user in the training of the algorithm. The user is asked to select some active molecules while examples from the background are selected automatically. Few selections are enough to have good performance, as detailed in Section 3.

Datasets
In order to evaluate the performance of our proposal, both synthetic and real datasets are used. While synthetic datasets ease a quantitative evaluation of the algorithm, real datasets are fundamental to check the behaviour in case of real acquisition scenario.
Synthetic datasets are taken from the 2013 SML microscopy challenge described in [22]. They have various levels of difficulty, depending on the amount and kind of simulated noise while the biological structure to be imaged is mainly a bundle of Tubulin protein. Each dataset provides a collection of frames in uncompressed 16-bits TIFF format; Table 1 shows more details about them with Tubulin II being the dataset with the highest auto-fluorescent noise.
The proposed pipeline is then tested also on real datasets, not just to be sure that the algorithm is able to face real acquisitions, but also to test if the SVM, though a supervised algorithm, is applicable to real conditions in which the training dataset has to be created without ground truth.
The 2013 Challenge website provides access also to four experimental datasets, courtesy of the Laboratory of Experimental Biophysics in EPFL, Switzerland. We have used the acquisitions Table 1. Properties of the synthetic datasets in use, for detailed informations on the acquisition settings refer to [22]. named TubulinsLS and TubulinAF647. The first one is taken imaging Tubulin filaments, the other one is a fixed cell, stained with mouse anti-alpha-tubulin primary antibody and Alexa647 secondary antibody, Tab. 2 summarizes the images acquisition conditions.

Evaluation metrics
In the SML evaluation protocol, having a true positive (TP) means that a localisation is matched with a point of the Ground Truth (GT) and this implies that it must be spatially close enough to it. Following [14], we select a radius equal to the full-width half-maximum (FWHM) of the PSF, in our case around 250 nm. The remaining localised molecules farthest from the radius, unpaired, are categorized as false positives (FP). Finally, GT molecules that are not associated are called false negatives (FN). Detection rate and localisation accuracy are the performance indexes commonly used in SML community, together with Fourier Ring Correlation [23,24] which is useful in the case of real datasets when ground truth is not available. The Jaccard index (JAC) is used for monitoring the detection, while the root mean square error (RMSE) is used for the accuracy. Only the detections matched with the GT contribute to the equation 14 as: where N = T P and a =accuracy.
For the sake of completeness we present the results also in terms of Precision and Recall indexes, with formulas as follow: Jaccard J = T P/(F N + FP +T P), Precision p = T P/(T P + FP) and Recall r = T P/(T P + F N).

Synthetic data analysis
Preliminary results of the proposed method was presented in [12]. The evaluation was purely quantitative and it was performed on the synthetic datasets of 2013 SML microscopy Challenge [22]. The obtained results were compared with the best performing software of the Challenge, namely ThunderSTORM [13], revealing a good detection ability of our pipeline. Results was close or at the level of ThunderSTORM and the proposal stands out with better performance in two of the three datasets.
However, in order to obtain the spatial filter (SVM model) for the detection task, the classifier needs to be trained. To this end, in [12] each dataset was divided exactly in two non-overlapping stacks of frames; from the first, samples of positive and negative classes are extracted based on GT information while the other frames were used as test set, to evaluate the performance of the pipeline and reconstruct the super-resolved image.
Such way of building the training set is typical in classification problems but it is not realistic for SML microscopy acquisition, questioning the usability of the approach. Here we demonstrate that there is no need for a big amount of data by explicitly characterize the behavior of the pipeline as the cardinality of the training set changes. We embed the algorithm proposed in [12] in a wider framework made of a GUI and a simpler yet effective way to create a training set. Besides being easier to use, this system allows a quick selection of few positive samples and the generation of the training dataset from them.
A drawback in not having the GT information is the need to assume the highest local maxima within a patch as the pixel location of an activation, which is not always the case due to the presence of shot noise. This workaround has an impact on the detection ability if we compare the results shown in [12] with the column named SVM-half of Table 3. In particular, such a training set composition lead to a more selective identification of activations, as the Jaccard index reflects. Table 3. Performance comparison among ThunderSTORM (TS), our SVM-based approach with half of the acquired stack used for the training (SVM-half) and the same SVM approach using for training only 20 activations (SVM-few).  Table 3 shows the performance of ThunderSTORM and the SVM-based approach with two different sizes of the training set. Namely, SVM-half means that half of the images are used to build the SVM while SVM-few referred to a manual selection, through the GUI, of 20 activations. Comparing the two kinds of training, we can say that all the performance that we can obtain with few activations are in line with what we have using half of the dataset, looking at indexes like Accuracy and Precision, and are much better in terms of Jaccard and Recall. A possible explanation is that with too much data the algorithm tends to be driven by the activation patches that are more common. Instead, the reason of the drop in Precision we have on TubulinII dataset is that the dataset has more variability, being the noisiest, and the use of only 20 points penalizes the performance. We remark that the manual selection does not need to be precise because the algorithm automatically shift the selection on the closest pixel with a local maxima intensity. Looking at this selection from a different perspective, the user is only asked to select few areas within one or more frames where to find positive activations, its selection will be refined by the algorithm prior to be used. For each new selected pixel, also the number of background samples increases, so that the training set is equally divided into activations and background (i.e. it is balanced). Whenever the manual selection is completed, the SVM is trained and the processing can start on the remaining frames. Regarding the number of activations needed, Fig. 2 represents the performance of the whole pipeline when the SVM of the detection step is trained increasing the number of positive samples from 10 to 80, with a step of 10. We can see how on TubulinI with the selection of few points, the Jaccard measurement is not as favourable as in ThunderSTORM and the reason is a poor recall, while there is an improvement of Precision and Accuracy with respect to the competitor; these results are in agreement with what we observe in all the analysed datasets.
We can conclude that there is no need for the user to select a large amount of activations: decreasing their number, the performance remains stable until a value between 10-20 selections, for which there is a drop. The user can select a reasonable yet arbitrary amount of activations without impairing the processing outcome. In this context, it is important to highlight that for a not-expert user the selection of few activations is more intuitive than to chose arbitrary thresholds. Fig. 3 shows the super-resolved images obtained by our algorithm, second and third column, and the ones reconstructed by ThunderSTORM. Images underline the strength and limit of our pipeline: the reconstruction is less noisy with respect to ThunderSTORM outputs, however not all the TPs are detected, resulting in a structure made of fewer molecules in some regions. This prudential approach is quantitative measurable and it clearly emerges observing the behaviour of Precision and Recall measurements, see

Real data analysis
In the previous Section we have seen how it is possible to achieve performance in line with a state of the art algorithm, such as ThunderSTORM, without the need of a time-consuming training and without the definition of arbitrary thresholds -now replaced by an intuitive labelling procedure for the user.
We are proposing a pipeline without mentioning any hypothesis on noise or background statistical behaviour. In fact, the detection task is based on the only assumption that the two classes, activation and background, are linearly separable. In order to verify the benefits of the approach with real acquisitions, thus real background, we run the algorithm on several Tubulin datasets. Figure 5 presents the qualitative evaluation of our approach on TubulinAF647 and TubulinsLS datasets [22], comparing it with ThunderSTORM. While on ThunderSTORM we use the default set-up, on our pipeline we manually select 20 activations within the first frames. As it is evident from the FRC, Fig. 6, the performance are exactly the same on TubulinAF647 while there is a loss of few nm on TubulinLS. However, the final super-resolved images present sensibly less artefacts with respect to ThunderSTORM reconstructions. This observation is valid on both datasets as shown in Fig. 5, where a magnification underlines how ThunderSTORM detected spurious activations and some big spots that our algorithm is more robust to discard. This later difference, explain also the drop in correlation at low spatial frequencies in TubulinAF647 in the FRC curve for our method with respect to ThunderSTORM.

Conclusions
Within SML microscopy, processing after image acquisition is fundamental to obtain a superresolved image. Unfortunately, state of the art algorithms rely on thresholds heuristically chosen by the user to accomplish molecules detection.
We experimentally demonstrate that it is possible to formulate the detection step as a classification problem, resulting in the building of a custom filter for each dataset to analyse. Such training step is not always necessary, filters are saved and can be reused but we advice the retraining in case the acquisition conditions differs too much. In particular, the same filters can be used to analyses a long series of experiments where microscope and sample conditions does not change, which is a typical scenario in SML microscopy.
The SVM filter has performances in line with the state of the art but it is conceived to be threshold free, learning directly from the dataset the division between molecules and background. The supply of few molecules to the algorithm prevents also the use of a pre-processing step, reducing the introduction of artefacts and the computational load of the algorithm; this is further reduced adopting the linearisation of a LS fitting, which eventually is a very accurate tool.
As future work, we plan to test our algorithm on a broader range of samples (with different signal-to-noise/background ratios) and to extend our method to 3D by accounting for PSF with distortions related to their position in depth. The detector has to be extended to encode such distortions, for instance by designing appropriate kernels. Moreover, our fitting algorithm can be possibly extended without strong modifications the algorithmic structure, especially if the PSF model is still an exponential function. Another significant challenge in the context of SML microscopy, which is not currently addressed by our SVM approach, is the ability to work under high-density molecular conditions, i.e., low sparsity. Solution to this problem can still be searched in the field of machine learning, but within other approaches, such as neural networks. The complete MATLAB source code for our SVM-based localisation approach is available from Code 1 [25].