A Bayesian approach to denoising of single-photon binary images

This paper discusses new methods for processing images in the photon-limited regime where the number of photons per pixel is binary. We present a new Bayesian denoising method for binary, single-photon images. Each pixel measurement is assumed to follow a Bernoulli distribution whose mean is related by a nonlinear function to the underlying intensity value to be recovered. Adopting a Bayesian approach, we assign the unknown intensity field a smoothness promoting spatial and potentially temporal prior while enforcing the positivity of the intensity. A stochastic simulation method is then used to sample the resulting joint posterior distribution and estimate the unknown intensity, as well as the regularization parameters. We show that this new unsupervised denoising method can also be used to analyze images corrupted by Poisson noise. The proposed algorithm is compared to state-of-the art denoising techniques dedicated to photon-limited images using synthetic and real single-photon measurements. The results presented illustrate the potential benefits of the proposed methodology for photon-limited imaging, in particular with non photonnumber resolving detectors.

Poisson noise approximation remains valid across all of the pixels. This automatically and artificially reduces the (already low) probability of detection in the darker regions of the image, potentially unnecessarily, to preserve a tractable observation model. This approach is counterproductive as this decrease is usually compensated for by increasing T , the number of repetitions.
In this work, and in contrast with most denoising methods developed for photon-limited data, we focus on applications for which the phenomenon can only be observed once and for which we need to infer the intensity field for each individual image. Consequently, the observation model considered assumes the observed images are binary (i.e., either no photon or at least one photon detected). We also consider the case where the detectors are photonnumber resolving (i.e., data corrupted by Poisson noise). Here, we focus primarily on binary images even if the proposed methodology can be applied to Poisson data denoising, i.e., for data recorded by photon-number resolving systems. As will be seen in Section V, we show that when using non photon-number resolving systems with relatively high detection probabilities (x ≈ 1, 1 − exp −x ≈ 63%), it is possible to obtain similar results to photonnumber resolving systems. In other words, adopting the appropriate observation model and associated estimation strategy allows for much more efficient data acquisition as it becomes possible to improve the data quality without numerous repetitions (we consider a single detection in this work). However, when saturation is significant, i.e., when x >> 1, it becomes extremely challenging to accurately estimate the intensity field, in particular using a single frame. In this work, we limit ourselves to E [x] ≤ 1.
Adopting a classical Bayesian approach, we propose a flexible intensity prior model (see Section III) able to capture different sources of intensity fluctuations such as object movement and changes of the illumination conditions. Starting from the observation model of ideal detectors (Poisson likelihood), we present an alternative observation model accounting for detector limitations (Bernoulli likelihood). Both likelihoods are combined with the prior models and a stochastic simulation method (Markov chain Monte Carlo) method is investigated to exploit the resulting posteriors. An important advantage of the proposed method is that it is fully unsupervised and does not require parameter tuning, as the parameters controlling the spatial and temporal regularizations are automatically adjusted during the sampling process.
The remained of the paper is organized as follow. Section II presents the two observation models considered. The Bayesian models are detailed in Section III and the sampling strategy proposed to exploit the resulting posterior distributions is described in Section IV. Simulation August 5, 2018 DRAFT results conducted using synthetic and real single-photon data are discussed in Sections V and VI. Conclusions and future work are finally reported in Section VII.

II. OBSERVATION MODELS
Consider a set of T intensity images X t of size N row × N row whose elements x i,j,t = [X t ] i,j are the unknown average numbers of photons reaching the detector array (composed of N row ×N row detectors regularly spaced) over a given time period. The two observation models are detailed below.

A. Poisson likelihood
In the general case where each detector can potentially detect an infinite number of photons (within a given time period ∆t), it is widely acknowledged that the distribution of the photon counts y i,j,t in the pixel (i, j) of the tth image can be accurately modelled by a Poisson distribution, i.e., where η i,j > 0 is an attenuation factor that stands for the detector sensitivity/efficiency and P (η i,j x i,j,t ) denotes the Poisson distribution with mean η i,j x i,j,t . In this work we assume that the coefficients {η i,j } i,j are known (they can usually be estimated during the system calibration) and do not change with time (in the case of video acquisition). Assuming independence between the detectors and the different noise realizations corrupting the images (in particular, we consider non-overlapping acquisition periods) yields the joint likelihood

B. Bernoulli likelihood
Although the Poisson noise assumption is relevant for many imaging applications SPDs can often only detect at most one photon per pixel within a clock period and need to be reset to potentially detect the next photons reaching the sensor. In the remainder of this paper, we assume that ∆t corresponds to this clock period (i.e., the smallest temporal sampling period), which also defines the temporal resolution of the imaging system when recording image August 5, 2018 DRAFT sequences. In such cases, the detected photon counts, which become binary measurements within a period ∆t, satisfy is the photon count that would be detected by an ideal detector (able to detect an infinite number of photons). Consequently, in this case the detected photon counts are distributed according to the following Bernoulli distribution whose mean is given by In a similar fashion to the observation model described in Section II-A, assuming independence between the detectors and between noise realizations yields where f 1 (·|η i,j x i,j,t ) denotes the Bernoulli distribution in (4).
It is important to mention here that although Poisson and Bernoulli distributions present different shapes and supports, we have which will be useful during the description of the proposed estimation strategy.

C. Faulty sensor and missing data
In addition to the potential of sensor saturation, we also consider the presence of faulty detectors and missing data within the array. As mentioned in the introduction, the proposed denoising framework exploits the spatial correlation between neighbouring pixels to regularize the denoising problem and the presence of spurious pixels (providing meaningless values) can drastically degrade the algorithm performance. To tackle this problem, we introduce an N row × N row × T binary mask H whose entries h i,j,t are 0 (resp. 1) when a detector is faulty or data is missing (resp. functioning correctly). This mask is assumed to be known and can potentially vary with time. When considering the presence of faulty detectors/missing data, the likelihood functions (2) and (5) becomẽ respectively, where δ(·) denotes the Dirac delta function and where the observations of faulty pixels (whose positions are known) are arbitrarily set to 0 before applying the proposed method. These values are not used during the denoising process.
The proposed methodology does not assume a particular structure for H. Although the observation models in (7) and (8) could be used for restoration of sparsely and/or irregularly sampled images, we restrain ourselves to cases where the number of zero entries in H is small compared to the number of pixels/detectors. The very interesting and more challenging problem of sparsely sampled images constructed from sparse single-photon data outwith the scope of this paper.
The next section describes the Bayesian model and associated estimation strategy proposed to solve the denoising problem considered here; that is, the estimation of the unknown and non-stationary intensity field X from the observed set of photon counts in Y.

A. Intensity field modelling
As in most ill-posed inverse problems which need to be regularized, the choice of the regularization or prior model considered for image restoration is crucial both in terms of quality of image recovery and the resulting computational complexity. In this work we investigate a Bayesian model coupled with an efficient simulation method which allows the estimation of denoised images but that can also provide information about the denoising uncertainty via measures of uncertainty from the posterior distribution of interest. Consequently, we investigate an intensity prior model which allows the use of an simple simulation strategy to exploit the posterior distribution. As will be shown in Sections V and VI, the prior models presented in this section not only facilitate the estimation strategy but are also flexible enough to compete with standard regularizations used to denoise images corrupted by Poisson noise (e.g., total-variation [16], [17] or Gaussian MRFs [18]).
It is well known that gamma distributions are conjugate priors for the means of Poisson distributions, which makes them particularly attractive to denoise images corrupted by Poisson noise. Moreover, as will be further discussed in Section IV, gamma distributions remain conjugate priors when considering saturating sensors (i.e., assuming (4)), which is particularly convenient in simplifying the denoising problem when the data are Bernoulli distributed.
Due to the spatial organization of images, we expect the values of x i,j,t to vary smoothly from one pixel to another. Moreover, if an image sequence is considered, it might be relevant to capture the temporal correlation between successive images to improve the denoising performance, especially since the sampling period can be extremely short (of the order of nanoseconds or less). In order to model this behaviour, we consider an extended prior model such that the resulting prior for X is a hidden gamma-MRF (GMRF) [19]. In a similar fashion to [6], we introduce T auxiliary matrices U t of size (N row + 1) × (N col + 1) with elements u i,j,t ∈ R + and T + 1 additional auxiliary images W t of size N row × N row . We then define a tripartite conditional independence graph between X, U = {U t } and W = {W t } such that each x i,j,t of X t is connected to four (spatial) neighbors of U t and two temporal neighbors in W t and W t+1 . This 1st order neighbourhood structure is depicted in Fig. 1, where we notice that any given x i,j,t and x i+1,j,t are 2nd order neighbors via u i+1,j,t and u i+1,j+1,t . Similarly, x i,j,t and x i,j,t+1 are 2nd order neighbors via w i,j,t+1 . Following the general GMRF model proposed in [19] and specified here by the neighbourhood structure depicted in Fig. 1, we assign (X, U, W) a (constrained) GMRF prior, and obtain the joint prior f (X, U, W|α, β) (see [19] for the GMRF formulation adopted here). This prior model explicitly depends on the value of the hyperparameters α > 0 and β > 0, which here act as regularization parameters that control the amount of spatial (α) and temporal (β) smoothness enforced by the GMRF.
For brevity, we assume that these parameters are fixed and constant across the image sequence in the remainder of this Section. However, following an empirical Bayesian approach, in the results presented in Sections V and VI, the value of (α, β) is adjusted automatically (either for each image or for all the images) during the early iterations of the sampler by maximum marginal likelihood estimation (the interested reader is invited to consult [6], [20] for details about the estimation of α (or (α, β))).
Exploiting the proposed neighbourhood structure yields and G X (·, ·) denotes a gamma distribution restricted to X (this distribution reduces to a standard gamma distribution with X = (0, +∞)) and IG (·, ·) denotes an inverse-gamma distribution. If a single image or independent images are considered, the GMRF-based prior f (X, U, W|α, β) can be simplified by removing the auxiliary variables W and by considering the prior model f (X, U|α), as in [6]. In that case, the neighborhood structure reduces to the red subgroup of Fig. 1 and we obtain In addition to their flexibility, one of the main motivations for considering GMRFs here is the fact that they make the sampling strategy cosier and thus the inference process (using the conjugacy of (9a) and (7) or (10a) and (8) (see Eq. (15))), while introducing spatial and temporal dependencies between the neighbouring intensities.

B. Joint posterior distributions
Now that we have defined the prior model for the unknown image or images to be recovered, we can derive the posterior distribution of (X, U) or (X, U, W) (depending on whether temporal correlation is considered), given the observations Y, and the fixed model The red (resp. blue) sub-graph highlights the spatial (resp. temporal) neighborhood structure. For the pixels at the boundaries of the image/ image sequence, we assume the images to be cyclic spatially (e.g., x0,j,t = xN row ,j,t) and set xi,j,0 = xi,j,T +1 = γ, ∀(i, j) ∈ V X . The temporal boundary condition γ is set arbitrarily to the empirical mean of the observed images but the image sequence can also be assumed to be cyclic.
parameters/hyperparameters Φ = {H, N, α, β} and the observation model considered. Using Bayes' rule, we obtain with m = 0 (Poisson noise) or m = 1 (Bernoulli realizations) for a single or independent images and when considering temporally correlated images, wheref m (Y|N, X, H) is given either by (7) (ideal detector) or (8) (saturated detector). The next paragraph details the Markov chain Monte Carlo (MCMC) method proposed to sample the posteriors of interest (11) and (12) and subsequently estimate the unknown intensity field X.

IV. ESTIMATION STRATEGY
In this work we adopt a simulation based strategy to approximate, for each model, the marginal posterior mean or minimum mean square error (MMSE) estimator of X, i.e., where the auxiliary variables U (and W when considering correlated frames) have been Although marginalizing analytically U and W from (11) and (12) is possible using is challenging due to the complexity of this non-standard and high-dimensional distribution. Fortunately, for the two observation models and the two prior models considered, (13) can be efficiently approximated with arbitrarily large accuracy by Monte Carlo integration. More precisely, it is possible to compute (13) by first using an MCMC computational method to generate samples asymptotically distributed according to (11) or (12), and subsequently using these samples to approximate the required marginal expectation.
Here we propose a Gibbs/Metropolis-within-Gibbs sampler to simulate samples from the The remainder of this section details the main steps of the proposed samplers, depending on the observation model considered. The main steps of the proposed PID-GMRF and BID-GMRF (for Poisson and Bernoulli image denoising using GMRF) are summarized in Algo.

B. Sampling X
Similarly, it is easy to show that for a given realization of U (and W), the elements of X are a posteriori independent and can be updated simultaneously. If a given pixel (i, j, t) is faulty or does not contain meaningful data, i.e., h i,j,t = 0, its corresponding unknown intensity value does not appear in (7) nor in (8). Consequently, sampling such intensity values reduces to sampling from (10a) (single or independent images) or (9a) (correlated images).
Sampling from truncated gamma distributions can be done efficiently via rejection sampling by sampling from non-truncated gamma distributions, in particular when the non-truncated distribution is mainly concentrated in X .
Consider a valid pixel (i, j, t) following the observation model (1). It is easy to obtain using the Poisson-gamma conjugacy that for a single image or independent images and for correlated images. These distributions, denoted as f 2D 0 (x i,j,t |y i,j,t , U t , Φ) and f 3D 0 (x i,j,t |y i,j,t , U t , W, Φ), respectively, can also be sampled from via rejection sampling. Consider now a valid pixel (i, j, t) following the observation model (4). We are interested , the conditional distributions of x i,j,t using the 2D and 3D GMRFs respectively. By recalling that (6) and (6), it can be shown that are truncated gamma distributions and that f 2D 1 (x i,j,t |y i,j,t = 1, U t , Φ) and f 3D 1 (x i,j,t |y i,j,t = 1, U t , W, Φ) are infinite mixtures of gamma distributions which are less trivial to sample from. To tackle this problem, we introduce a Metropolis-Hastings move to update x i,j,t under a Bernoulli observation assumption. More precisely, for a given valid pixel (i, j, t) at the kth iteration of the sampler, we can use a so-called proposal distribution q(·) defined on X to generate a candidate x * . This candidate is then accepted with probability µ = min(ρ, 1) where and using the 2D and 3D GMRFs, respectively. Otherwise, we set x i,j,t . Here, to avoid additional algorithmic complexity (e.g., tuning the variances of Gaussian random walks), we use as proposal distributions the conditional distributions obtained under Poisson noise assumption (15) or (16), depending on the scenario considered (independent/correlated images). Using this choice of proposal, 1) when y i,j,t = 0, we obtain ρ = 1 and the Metropolis-Hastings step reduces to a Gibbs step and 2) in practice we have observed that this choice leads to satisfactory acceptance rates (ρ > 0.6 for the pixels such such that y i,j,t = 1) for all the results presented in Sections V and VI.
In this Section, we considered two intensity prior models, depending on whether the T > 1 observed images are assumed to be temporally correlated or not. As the underlying intensity field is the same if the detectors saturate or not, a single prior model is considered when analysing Poisson or Bernoulli images. We then detailed a single sampling strategy to exploit the posterior distributions of the different scenarios. When the data are Bernoulli observations, accept/reject procedures are only required for the pixels where a detection occurs, i.e., y i,j,t = 1, which, in the case of low illumination images (E [y i,j,t ] << 1), represent a small fraction of the pixels. In the case of Poisson data, the proposed Metropolis-within-Gibbs sampler reduces to a standard, yet highly parallelizable, Gibbs sampler. The following Sections illustrate the potential benefits of the proposed method through a series of experiments conducted using synthetic and real single-photon images.

V. SIMULATIONS USING SYNTHETIC IMAGES
In this Section we investigate the performance of the proposed methods and the effect of the detector saturation on the intensity estimation performance through simulations conducted with synthetic images. First, we compare the performance of the proposed algorithms with existing methods when denoising a single image. Then we assess the benefits of the proposed 3D GMRF, when denoising a sequence of images.

A. Single image denoising
We evaluate the proposed methods in denoising the two test images depicted in Fig. 2.
The first image of size 256 × 256 (circular pattern) and denoted I 1 , presents a piece-wise constant intensity profile while the second image I 2 , of size 512 × 512 presents smoother intensity variations. In all the experiments presented in this section, for fair comparisons to methods which cannot handle missing data, we assume that all pixels are observed. We then  We have compared our methods with the following state-of-the art methods: First, we considered a set of methods relying on the Poisson noise assumption. Precisely, we used SPIRAL-TV [17], which solves the same optimization as PIDAL [16]; that is where TV (·) is the total variation regularization whose influence is controlled by λ TV ≥ 0.
We also applied the other regularizations proposed in [17] but SPIRAL-TV seems to provide the best and most robust results in this very sparse photon regime, which is why we only present the results obtained with this version of SPIRAL. We also implemented an alternative   algorithm which solves the following problem where x t is the vectorized version of X t and D is the N row × N col × N row N col circulant convolution matrix of the Laplacian filter [18]. In contrast to the TV regularization which promotes piece-wise constant intensity profiles, the penalization in (21) promotes smooth intensity variations. The problem (21) is solved using an ADMM scheme, similar to that used in PIDAL, therefore, this method is referred to as "PIDAL-Lap". We also used the NL-PCA algorithm [23] which is a state-of-the-art unsupervised method for image denoising under a Poisson noise assumption (we used the parameter values recommended in [23] in all the experiments presented here).
To the best of our knowledge, there is no published algorithm to directly estimate the intensity profiles involved in (5). However, we implemented an ADMM-based algorithm based on the following convex problem This algorithm is referred to as "Ber-TV" in the remainder of this paper. Note that as described in [24], it might be possible possible to consider other regularizations, for both the Poisson and Bernoulli observation models. However, an extensive comparison of regularizations, potentially using changes of variables and whose regularization parameters need to be carefully adjusted, is outwith the scope of this paper and we concentrate on the TV and Laplacianbased penalizations, whose effects are easily understood and which require the adjustment of a single parameter.
We measure the performance of the different algorithms in term of normalized mean squared error (NMSE) defined by where x i,j,t (resp.x i,j,t ) is the actual (resp. estimated) intensity value of the pixel (i, j, t).
The lower the NMSEs, the more similar the original and reconstructed images. Note that the NMSE does not depend on the intensity dynamic of the original image. We also evaluate how the reconstruction error varies across the image pixels around the NMSE using the standard deviation of the normalized squared error Var We have applied the proposed PID-GMRF and BID-GMRF algorithms with N MC = 2000 (including N bi = 600 burn-in iterations) to the data corrupted by Poisson and Bernoulli noise.
We have also applied PID-GMRF to data corrupted by Bernoulli noise to simulate the performance of methods relying on a Poisson noise assumption when denoising data recorded by non photon-number resolving detectors. The PIDAL-TV, PIDAL-Lap and Ber-TV algorithms, requires tuning of a regularization parameter (λ TV or λ Lap ). We adopted the methods proposed in [25] to automatically adjust these hyperparameters, but these methods tend to significantly overestimate the smoothness of the intensity field, due to the extreme sparsity of the observed images and thus yield poor results, visually and in terms of NMSEs.Consequently, for the results presented here, these hyperparameters have been optimized in a supervised manner in order to minimize the NMSE, which however requires knowing the actual intensity image in advance.  to capture the spatial correlation of piece-wise constant (I 1 ) and smoother (I 2 ) images and is visually more robust than Ber-TV (less prominent patch-like artifacts). NL-PCA provides similar images and is able to detect spatial structure in the data but underestimate the large intensities due to the model mismatch, yielding higher NMSE when E [x i,j ] → 1 (see Table  II).

B. Denoising of image sequences
We now illustrate the benefits of the proposed 3D GMRF model when denoising videos constructed from single-photon data. We consider a video composed of T = 141 frames of successive frames. Note that the NMSEs increase at the very beginning and the very end of the sequences due to the GMRF boundary conditions considered. This bias can however be easily reduced if we further assume that the temporal sequence is cyclic. In order not to add unnecessary assumptions in the general case, we did not present this case (which can be addressed by changing the GMRF boundary conditions (see Fig. 1)).

VI. SIMULATIONS USING REAL DATA
We illustrate the benefits of the proposed denoising framework to denoise sparse images of a dynamic object recorded by a ghost-imaging system similar to those considered in [26], [27]. The system considered here uses correlated photons at 710nm and the images were displayed on a spatial light modulator (SLM). We consider a set of 12 spatial patterns, i.e, 12 smiley faces gradually changing from a sad to happy face. The images of size 256 × 256 are recorded by an intensified camera with a CCD detector array (ICCD) triggered by a Perkin Elmer silicon SPAD (see [26], [27] for more details about data acquisition and setup of the ghost-imaging instrument). Each face is observed over 300 seconds with a frame rate of 1Hz, leading to 300 frames per face position. The ICCD acts here as a non photon-number resolving SPD, and thus provides binary images. The average intensity profile relates to the image of the faces formed from a polished silicon wafer onto which was patterned a microscopic gold test target. At the wavelength considered, the silicon is transparent whereas the gold layer is not. Consequently, the acquired images are darker in the region where the gold target is present. The laser source is adjusted so that the average number of detected photons per pixel and per frame is significantly lower than 5% (E [x i,j,t ] = 2.3×10 −4 and x i,j,t < 2%, ∀(i, j, t)).
In other word, the probability of having more than one photon reaching a given pixel within a given 1s frame is extremely low. In this extremely sparse photon-limited imaging regime, the distributions of the photon count can be approximated by Poisson distributions. Fig. 7 depicts the accumulated photon counts obtained by summing the 300 images associated with each position.
To illustrate the benefits of the proposed denoising method, we denoise images that would have been obtained using exposure times of 25s, 50s, 100s and 300s. Such images are obtained by integrating non-overlapping groups of 25 up to 300 images. These images (approximately corrupted by Poisson noise) are then used to produce binary images associated with the presence/absence of detected photons within successive 25s, 50s, 100s or 300s periods. The top rows of Table IV show the average number of photons in the images to be enhanced.
In particular, less than 500 photons per frame are available when considering the shortest exposure, which corresponds to a per pixel detection rate less than 1%. The bottom row of Table IV compares  and still obtain satisfactory intensity field estimates, which can be particularly useful to reduce the amount of data (divided by up to 300 here) to be stored, transmitted and/or processed.
These results also show that the proposed method can be used to enhance image sequences of dynamic scenes constructed from extremely sparse single-photon data.

VII. CONCLUSION
Here we have proposed a new Bayesian method for binary image denoising. The model considered assumed that each pixel measurement follows a Bernoulli distribution whose mean is related by a nonlinear function to the underlying intensity value to be recovered.
In contrast with classical Poisson noise models, this model is particularly adapted for data intensities close to 1, it is possible to obtain from saturating sensors, an estimation accuracy close to that obtained using non-saturating sensors. For instance, the results of simulations conducted using real sparse single-photon measurements illustrated how one can reduce the amount of data (by reducing the frame rate here but one could also adjust the laser source and reduce the overall acquisition time when possible) while being able to estimate the intensity profile without a significant performance degradation.
Here we used a hidden gamma Markov random field to build a prior model. However, we noticed that dictionary learning techniques (such as NL-PCA) can significantly improve the denoising performance in the presence of sparse single-photon images. Including such considerations in future binary image denoising methods is clearly interesting. Moreover, the generalization of the proposed methodology for images following binomial distributions (e.g., sum of binary images) is currently under investigation.