1 Introduction

Fluorescence microscopy imaging is an indispensable tool in numerous biological studies for visualizing (intra)cellular structures and dynamic processes. However, the quality of the acquired images is often rather poor, mainly due to low photon counts and the usage of diffraction-limited optics. This is especially true in single-molecule imaging experiments, where the objects of interest (generally termed “particles”) are smaller than the optical resolution of the microscope and the fluorescence signal is weak. As a result, fluorescently labeled vesicles, peroxisomes, or other particles are typically rendered as blurred spots with Gaussian-like intensity profiles, severely corrupted by Poisson noise [1, 2]. But also in the case of imaging larger objects, such as microtubules and actin filaments within cells, or the dendritic and axonal arbors of neurons, the effects of noise and blurring may adversely affect subsequent analyses.

The main sources of noise in fluorescence microscopy are the signal itself (photon shot noise) and the digital imaging electronics. The mechanics of both noise sources is well understood and the statistical distributions of noise are known [3, 4]. The signal-dependent noise is described by a Poisson distribution, while the noise arising from the imaging system often follows a Gaussian distribution. In practice, especially with low-excitation or single-molecule fluorescence imaging, the signal-dependent noise dominates, and the Gaussian noise caused by the imaging device can be ignored. The latter source of noise, for example, can also be substantially reduced by cooling the detector/camera.

To reduce both noise and blurring, deconvolution methods are frequently used [5, 6]. Most existing deconvolution methods, however, are primarily designed for deblurring, involving careful modeling of the point-spread-function (PSF) of the microscope optics. Although noise is often also reduced in the process, the underlying algorithms typically do not focus on this aspect and do not properly model the noise distributions, but assume that noise can be easily removed as a preprocessing step by applying specific image filters.

Here we approach the deconvolution problem in a novel and more fundamental way, by considering the blurring due to the PSF and the statistics of the noise as equally important aspects of one and the same “stochastic” inverse/reconstruction problem. Because of the Poissonian nature of photon emission, so-called Poisson point processes (PPPs) [7] are especially suitable for solving such problems. Similar reconstruction problems appeared in the field of positron emission tomography (PET), where they were tackled using the well-known Shepp-Vardi algorithm [8]. There, PPPs also played a major role in developing reconstruction algorithms that are still used in clinical PET imaging devices. We demonstrate how the Shepp-Vardi solution to the stochastic reconstruction problem can be adapted for microscopy imaging, and we show its performance for several biological applications.

Surprisingly, the final iterative update equations of the solution coincide with the update equation of the so-called probability hypothesis density (PHD) filter [9]. In computer vision and robotics, the PHD filter (which is derived in a very complex and rigorous way using the theory of random finite sets [10]) is a state-of-the-art method for multi-object tracking in image sequences. However, contrary to tracking applications, our solution (as an extension of the Shepp-Vardi algorithm) is equal to employing the PHD filter unconventionally: not to an image sequence but a static image, and not for tracking but for iterative denoising/deconvolution. The power of our problem formulation and the proposed solution is in the ability to fully model the acquisition process, to incorporate the noise statistics, and even to model such events as scattering of emitted photons and failure of a detector to produce photoelectrons.

This paper is organized as follows. In Sect. 2 we provide the problem statement, including necessary definitions. Next, in Sect. 3, we describe the classical Shepp-Vardi algorithm, which forms the basis for solving our reconstruction problem, and its relation to the PHD filter. Section 4 describes how the algorithm can be further improved by introducing prior knowledge about the smoothness of the reconstructed image. In Sect. 5 we present first examples of the proposed method in comparison with alternative solutions for several biological applications. Concluding remarks are made in Sect. 6.

2 Problem Statement

A point process is a random variable whose realizations are sets containing only points of some state space \(\mathcal {X}\) (usually the Euclidean space \(\mathbb {R}^m, m \ge 1\)). For a Poisson point process (PPP), the distribution of the points is fully characterized by a density (also called “intensity”) function \(\lambda (x) \ge 0\) over \(\mathcal {X}\) (Fig. 1). The number of points (realizations of PPP) on any subset \(\mathcal {R}\) of \(\mathcal {X}\) is Poisson distributed with mean \(N_\mathcal {R}=\int _{\mathcal {R}}\lambda (x)dx\). In our case, the space \(\mathcal {X}\) is defined as the physical space, where the light-emitting fluorophores are located.

The points on the input space \(\mathcal {X}\) (emissions of photons) occur as a PPP with the intensity function \(\{\lambda (x): x \in \mathcal {X}\}\). By means of a microscope, the points of this process are observed with a detector/camera after a random translation (due to the PSF of the optical system) to the output space \(\mathcal {Y}\) (also Euclidean space \(\mathbb {R}^m, m \ge 1\)). According to the theory [11], points on the output space occur as a Poisson process with intensity function \(\{\mu (y): y \in \mathcal {Y}\}\) given by

$$\begin{aligned} \mu (y) = \int _{\mathcal {X}}P_d(x)p(y|x)\lambda (x)dx + \mu _0(y), \end{aligned}$$
(1)

where \(P_d(x)\) is the probability that a point (e.g. an emitted photon) will be translated according to the PSF p(y|x) from the input location x to the output space \(\mathcal {Y}\). The intensity \(\mu _0(y)\) models the observed clutter (background noise). The spurious emissions (e.g. caused by autofluorescence) are assumed to be uniformly distributed over \(\mathcal {X}\), with a constant intensity \(\lambda ^c\). In principle, \(\lambda ^c\) can be made spatially dependent in order to model more complex clutter (e.g. false light emissions originating within an inhomogeneous cell background). In any case, the clutter intensity is defined as \(\mu _0(y)=\int _{\mathcal {X}}P_d(x)p(y|x)\lambda ^cdx\).

Fig. 1.
figure 1

Example of (a) the density function \(\lambda (x)\), (b) the realization (yellow) of the PPP on \(\mathcal {X}\), obtained by sampling from \(\lambda (x)\), (c) the translated PPP (green) with superimposed clutter (cyan) on \(\mathcal {Y}\), and (d) the resulting binned measurements (an image), with the number of bins (pixels) \(N_{\tilde{y}}\). (Color figure online)

The observed microscopy image \(\mathcal {I}\) is the result of binning the points of the PPP on \(\mathcal {Y}\) into \(N_{\tilde{y}}\) bins, which constitute the pixels of the image. The pixels in \(\mathcal {I}\) are defined as disjoint bounded sets \(R_1,\ldots , R_{N_{\tilde{y}}}, \mathcal {\tilde{Y}}=\cup _{i=1}^{N_{\tilde{y}}}R_i\in \mathbb {R}^2\). The number of points of the process that lie in \(R_i\) (the pixel value) is denoted as \(Y_i\). No record is kept of the location of the points within any set \(R_i\).

The inverse problem is defined as the inference about the points in the input space on the basis of points observed in the output space. In general, there are two problems that complicate the estimation of \(\lambda (x)\) using the observed pixel values or observed points on \(\mathcal {Y}\). The first one is present even if so much data are available that the function \(\mu (y)\) can be regarded as known perfectly. In this case, the integral in (1) must be solved for \(\lambda (x)\) for a given kernel p(y|x). This is generally termed a deterministic inverse problem, and a deconvolution problem if the kernel p(y|x) depends only on the difference \(y-x\). The second problem is that the point-process data available are random, so estimating \(\lambda (x)\) from measurements on the output space \(\mathcal {Y}\) is a stochastic inverse problem.

Considering our problem statement, the log-likelihood of the probability density function \(p(Y_1,\ldots , Y_{N_{\tilde{y}}}, N_{\tilde{y}}|\lambda (x))\) of the binned data, which describes how likely it is to observe image \(\mathcal {I}\) given intensity \(\lambda (x)\), is given by [11, 12]

$$\begin{aligned} \mathcal {L}(\lambda ) = {-}\int _{\mathcal {Y}}\mu (y)dy+ \sum _{i=1}^{N_{\tilde{y}}} Y_i \ln {\left( \int _{R_i}\mu (y)\right) }dy + \sum _{i=1}^{N_{\tilde{y}}}\ln {(Y_i!)}. \end{aligned}$$
(2)

To solve the stochastic inverse problem, we seek a solution in terms of \(\lambda (x)\) that maximizes the log-likelihood \(\mathcal {L}(\lambda )\). In trying to solve the problem straightforwardly, using the calculus of variations, one can employ functional derivatives (to find a local maximum) and form a system of nonlinear equations, which can be solved only in some special (but not practically useful) cases. Fortunately, this difficulty can be avoided by employing the expectation-maximization (EM) algorithm, which results in a recursion for the maximum-likelihood (ML) estimate of \(\lambda (x)\), avoiding directly solving any nonlinear equations [11, 12]. For PET, the EM approach yields the well-known Shepp-Vardi algorithm [8].

3 Shepp-Vardi Algorithm and the PHD Filter

The Shepp-Vardi algorithm was originally proposed for PET and solves the problem of estimating the intensity function of the emission of short-lived radioisotopes. The algorithm was derived via the EM method, which at that time was not widely known [8]. In PET, radioisotope emissions are modeled as a non-homogeneous Poisson point process. The Shepp-Vardi algorithm produces the ML estimate of the intensity function of this process, which provides an image of the radioisotope spatial density. This density is directly proportional to the intensity \(\lambda (x)\) of radioisotope decay at the point x. In PET reconstruction the assumptions with respect to (1) are \(P_d(x)=1\) and \(\mu _0(y)=0\). The EM recursion for iteratively estimating \(\lambda (x)\) is then given by [12]

$$\begin{aligned} \lambda ^{(k+1)}(x)=\lambda ^{(k)}(x)\sum _{i=1}^{N_{\tilde{y}}}Y_i \frac{p(Y_i|x)}{\int _{\mathcal {X}}p(Y_i|x')\lambda ^{(k)}(x')dx'}. \end{aligned}$$
(3)

Following the derivation of the Shepp-Vardi algorithm, but taking \(P_d\) and \(\mu _0(y)\) into account, we similarly find

$$\begin{aligned} \lambda ^{(k+1)}(x)=\lambda ^{(k)}(x)\left\{ 1-P_d(x)+ \sum _{i=1}^{N_{\tilde{y}}}Y_i \frac{P_d(x)p(Y_i|x)}{\int _{\mathcal {X}}p(Y_i|x')\lambda ^{(k)}(x')dx' + \mu _0(Y_i)}\right\} . \end{aligned}$$
(4)

This recursion turns out to be identical to the estimate update in the well-known PHD filter [9, 13], which has been developed for multi-target tracking (a completely different application). This surprising connection is established by proving that the first step of the Shepp-Vardi algorithm is essentially identical to the information update step of the PHD intensity filter. Thus, the recursion (4) actually describes a PHD filter, which is applied iteratively (and is therefore referred to as iPHD in the sequel) to a single image in order to solve the stochastic inverse problem and produce the estimate of \(\lambda (x)\). It extends the Shepp-Vardi algorithm to a more general solution, capable of dealing with clutter and imperfect detection and scattering of photons.

4 Improving Stability by Regularization

The described stochastic reconstruction procedure has a fundamental noise instability which results in artifacts that are visually similar to the Gibbs phenomenon [11]. These artifacts are not caused by the EM method, but arise in any algorithm that estimates an infinite dimensional quantity (in our case \(\lambda (x)\)) from a finite number of measurements. In practice, several regularization procedures can be used, e.g. Grenander’s method of sieves, penalty functions, Markov random fields (extending penalty functions to multiple dimensions), or resolution kernels [11, 12]. Sophisticated combinations of these can be very effective in reducing not only noise but also edge artifacts. Since microscopic images from biological experiments typically do not contain structures with sharp edges, we concentrate here only on two basic noise suppression techniques.

Grenander’s method of sieves, which preserves the EM form of the intensity estimator, is one of the simplest techniques to suppress noise. The idea is to constrain the estimate \(\lambda (x)\) to be in a smooth subset, called a sieve, of the set of nonnegative functions. By allowing the size of the sieve to grow in an appropriate manner with the number of measurement points, it is hoped that the constrained estimate is consistent in the sense that it converges to the true intensity. The sieve restricts \(\lambda (x)\) to the collection of all functions of the form

$$\begin{aligned} \mathcal {S} \equiv \left\{ \lambda (x) = \int _{\mathcal {Z}}\kappa (x|z)\zeta (z)dz, \quad \text {for some}\quad \zeta (z)\right\} , \end{aligned}$$
(5)

where the kernel of the sieve, \(\kappa (x|z)\), is a specified pdf (e.g. an appropriately dimensioned Gaussian pdf) for each \(z\in \mathcal {Z}\), so that \(\int _\mathcal {X}\kappa (x|\cdot )dx=1\), and \(\zeta (z)\ge 0\), and \(\int _{\mathcal {Z}}\zeta (z)dz<\infty \). For our applications, without loss of generality, we can assume that \(\mathcal {Z}=\mathcal {X}\). In effect, the integral (5) is a low pass filter applied to the intensity \(\zeta (z)\). The basic idea is to compute the ML estimate \(\hat{\zeta }(z)\) from the data and subsequently compute the ML intensity \(\hat{\lambda }(x)\) from \(\hat{\lambda }(x)=\int _{\mathcal {Z}}\kappa (x|z)\hat{\zeta }(z)dz \). The estimate \(\hat{\zeta }(z)\) is computed by the EM method, which is directly applicable to estimating \(\zeta (z)\) using a modified PSF \(g(y|z)=\int _{\mathcal {X}}p(y|x)\kappa (x|z)dx\). Iterating until convergence by means of

$$\begin{aligned} \zeta ^{(k+1)}(z)=\zeta ^{(k)}(z)\sum _{i=1}^{N_{\tilde{y}}} Y_i\frac{g(Y_i|z)}{\int _{\mathcal {X}}g(Y_i|z')\zeta ^{(k)}(z')dz'} \end{aligned}$$
(6)

gives the ML estimate \(\hat{\zeta }(z)\) from which \(\hat{\lambda }(x)\) can be computed [11, 12]. This method is also compatible with the PHD intensity filter.

Alternatively, the difficulty of maximizing \(\mathcal {L(\lambda })\) without introducing artifacts can be also partially solved by introducing a regularization (penalty) term \(\mathcal {E}(\lambda )\), which describes the prior knowledge about the smoothness of the intensity \(\lambda (x)\) using Markov random fields (MRFs). The ML estimation problem is then transformed into a maximum a posteriori (MAP) estimation problem. Specifically, the approach is to maximize the functional \(\mathcal {G}(\lambda )\) defined by

$$\begin{aligned} \mathcal {G}(\lambda ) = \mathcal {L}(\lambda ) - \mathcal {E}(\lambda ), \end{aligned}$$
(7)

where \(\mathcal {L}(\lambda )\) is the log-likelihood functional given by (2), and \(\mathcal {E}(\lambda )\) is the penalty that describes the prior knowledge about \(\lambda (x)\). For 2D images, the prior density of the MRF is defined on a discrete lattice (ij) which discretizes the space \(\mathcal {X}\), and is based on penalizing large gradients in \(\lambda (x)\). The EM algorithm can be combined with the prior for deriving the MAP estimator by simply adding the prior to the maximization at each stage of the algorithm. In practice, this leads to a set of nonlinear equations that can be solved by performing several steps of Jacobi-like gradient iterations at each stage of the EM algorithm to solve the nonlinear difference equations [11, 14].

5 Experimental Results

5.1 Evaluation of Regularization

Before presenting examples of the described techniques for biological applications we first study the effects of regularization. We simulated 2D images size of \(128\times 128\) pixels using (1) and applying binning (Fig. 2(a), (g)). The intensity \(\lambda (x)\) had a constant value of, respectively, 100 (Fig. 2(a)) and 1 (Fig. 2(g)) within a square region of size of \(50\times 50\) pixels in the center of the image, and was equal to 0 outside the square. This corresponds to a signal-to-noise ratio (SNR) of, respectively, 10 and 1. The transition density p(y|x) was selected as a normal density with the standard deviation of 1 pixel, and we used \(P_d(y|x)=1\).

Fig. 2.
figure 2

Results of reconstructing a square-shaped density \(\lambda (x)\) from the realization of a PPP with SNR = 10 (top two rows) and SNR = 1 (bottom two rows). The panels correspond to original noisy image (a, g), the density \(\lambda (x)\) obtained by SVA (b, h), SVA regularized with sieves (c, i), SVA regularized with MRF (d, j), the result of HPS (e, k), and the intensity profiles along the indicated yellow regions (lines) of interest (f, l). For this example (with no clutter) the iPHD filter simplifies to SVA (Color figure online)

The results of estimating the underlying intensity \(\lambda (x)\) using the Shepp-Vardi algorithm (SVA), SVA with sieves, SVA with MRF, and with state-of-the-art commercial deconvolution tool Huygens Professional Software (HPS, Scientific Volume Imaging B.V., Hilversum, the Netherlands, https://svi.nl/) are shown in Fig. 2. We observe that using regularization clearly produces smoother results than using no regularization, but there appears to be no striking difference between the sieves and MRF methods (see also Table 1). In practice, regularization with sieves is preferable because it avoids solving nonlinear equations (as in the case with MRF) and allows easy integration into the EM optimization. The HPS tool, which tries to solve the deterministic inverse problem, produces inferior results. It should be noted that in the case of stochastic reconstruction one should not expect to get an ideal constant-valued square as \(\lambda (x)\) and that some “artifacts” are unavoidable. The purpose of these experiments was merely to show how noisy typical estimates of the intensity function can be and how regularization techniques deal with it.

Table 1. SNR values for the synthetic images and the reconstructed intensity \(\lambda (x)\) obtained using the described methods

5.2 Evaluation for Biological Applications

We explored the potential of the proposed iPHD filter for deconvolution in various biological applications in comparison with SVA and HPS. As a first example we used images from the particle tracking challenge (http://bioimageanalysis.org/track/). The images show subresolution objects modeled as PSF-shaped bright spots on top of a noisy background [15]. The noise in the background was modeled using the Poisson distribution with mean equal to 10. The amplitudes of the spot intensity were chosen to obtain SNRs in the range from 2 to 7 [15]. The results of applying iPHD, SVA, and HPS, are shown in Fig. 3. The iPHD filter clearly outperformed the other methods, producing more localized estimates of object positions in comparison with HPS, and fewer spurious local maxima in the background regions compared to the SVA.

Fig. 3.
figure 3

Example of deconvolving an image from the particle tracking challenge. The image contains PSF-shaped objects embedded in noise (a). Results are shown of reconstructing \(\lambda (x)\) using HSP (b), SVA (c), and the proposed iPHD filter (d). The ability of the iPHD filter to better account for noise is clearly visible.

Fig. 4.
figure 4

Example of deconvolving an image from the single-molecule localization microscopy challenge. The image contains filamentous cellular structures (a). Results are shown of reconstructing \(\lambda (x)\) using HSP (b), SVA (c), and the proposed iPHD filter (d). The absence of background noise in the original image makes the iPHD filter perform comparably to SVA.

As a second example we used data from the single-molecule localization microscopy challenge (http://bigwww.epfl.ch/smlm/). The data contain sequences of images from single-molecule imaging experiments [16]. A typical image contains hundreds of PSF-shaped blurred spots, representing a subset of emitting fluorophores (at a specific time point), located on an underlying subcellular structure (e.g. a tubulin filament). In super-resolution microscopy, such images are used to accurately detect the positions of the fluorophores, which, taken all together yield a super-resolved image of the structure. For our experiments we combined the individual images of a sequence using maximum intensity projection to form a single image corresponding to the case when all fluorophores would be emitting simultaneously. Having this single blurred image we studied the power of iPHD, SVA, and HPS to obtain a better, deconvolved image of the biological structures. The results are shown in Fig. 4. In this case, with negligible noise in the background areas, iPHD and SVA performed comparably. The deconvolution results of HPS were found to be slightly worse.

As a final example we applied the described techniques to fluorescence microscopy images of neurons. The images were acquired with a confocal microscope and in this case the background areas contained low levels of noise. The results of deconvolving such images using iPHD, SVA, and HPS, are shown in Fig. 5. The iPHD filter performed slightly better in the sense that it produced sharper image structures than HPS and with less noise than SVA.

Fig. 5.
figure 5

Example of deconvolving a confocal fluorescence microscopy image showing dendritic structures. The panels show the original image (a) and the results of reconstructing \(\lambda (x)\) using HSP (b), SVA (c), and the proposed iPHD filter (d). It can be appreciated that the iPHD filter is better able to deal with background noise and produces more consistent and sharper results than SVA and HPS.

The iPHD filter was implemented in the Java programming language (Oracle Corporation, Redwood Shores, CA, USA) as a plugin for ImageJ (National Institutes of Health, Bethesda, MD, USA) [17, 18]. Executing 100 iterations of the described approach for an image of size \(128\times 128\) pixels on a single thread on a standard personal computer (Dual-Core AMD Opteron 2216, 2.4 GHz CPU, 8 GB RAM) takes approximately 30 s. The method is highly parallelizable and can be implemented for GPU (Graphics Processing Unit) hardware.

6 Discussion and Conclusion

In contrast with existing solutions for image deconvolution we have proposed to formulate the problem more fundamentally as a stochastic reconstruction problem. Using the theory of Poisson point processes, which perfectly suits our needs, we have shown how the solution to the stochastic inverse problem can be obtained as an iterative recursion. In doing so we have established a connection between the classical Shepp-Vardi algorithm and the more recently proposed PHD filter. The advantage of our proposed deconvolution method is that it provides a better theoretical framework for modeling the entire image acquisition process including the noise sources. We have presented first examples of deconvolution using our method for various biological applications, demonstrating that it performs comparable or better than even state-of-the-art commercial deconvolution software. The image data used in these examples contained realistic amounts of noise and/or blur, for which our method is especially useful. We note that in high-quality (low-noise) images the advantage of our method over other methods may be less pronounced. In future work we aim to extend the proposed solution to deal with more general classes of point processes and to incorporate other sources of noise (different distributions). We also aim to explore further improvements by incorporating prior information about the relevant object structures to be reconstructed (e.g. by modeling the appearance of microtubules as elongated structures). In terms of further application and experimental validation we plan to quantify the improvements potentially offered by our method in fluorophore localization for super-resolution microscopy.