Images from Bits: Non-Iterative Image Reconstruction for Quanta Image Sensors

A quanta image sensor (QIS) is a class of single-photon imaging devices that measure light intensity using oversampled binary observations. Because of the stochastic nature of the photon arrivals, data acquired by QIS is a massive stream of random binary bits. The goal of image reconstruction is to recover the underlying image from these bits. In this paper, we present a non-iterative image reconstruction algorithm for QIS. Unlike existing reconstruction methods that formulate the problem from an optimization perspective, the new algorithm directly recovers the images through a pair of nonlinear transformations and an off-the-shelf image denoising algorithm. By skipping the usual optimization procedure, we achieve orders of magnitude improvement in speed and even better image reconstruction quality. We validate the new algorithm on synthetic datasets, as well as real videos collected by one-bit single-photon avalanche diode (SPAD) cameras.


Quanta Image Sensor
Since the birth of charge coupled devices (CCD) in the late 1960s [1] and the complementary metal-oxide-semiconductor (CMOS) active pixel sensors in the early 1990s [2], the pixel pitch of digital image sensors has been continuously shrinking [3]. Shrinking the pixel pitch is intimately linked to the need of increasing image resolution, reducing power consumption and reducing the size and weight of cameras. However, as pixel pitch shrinks, the amount of photon flux detectable by each pixel drops, leading to reduced signal strength. In addition, the maximum number of photoelectrons that can be held in each pixel, known as the full-well capacity, also drops. Small full-well capacity causes reduced maximum signal-to-noise ratio and lowers the dynamic range of an image [4]. Therefore, pushing for smaller pixels, although feasible in the near future, will become a major technological hurdle to new image sensors.
A quanta image sensor (QIS) is a class of solid-state image sensors originally proposed by Eric Fossum as a candidate solution for sub-diffraction-limit pixels. The sensor was first named the digital film sensor [5] and later the quanta image sensor [6][7][8][9] (see [10] for a more comprehensive discussion of the history). A similar idea to QIS was developed a few years later by EPFL (École Polytechnique Fédérale de Lausanne, Lausanne, Switzerland), known as the Gigavision camera [11][12][13].
In the past few years, research groups at the University of Edinburgh (Edinburgh, UK) [14][15][16], as well as EPFL [17,18] have made new progresses in QIS using binary single-photon avalanche diode (SPAD) cameras. In the industry, Rambus Inc. (Sunnyvale, CA, USA) is developing binary image sensors for high dynamic range imaging [19][20][21]. For the purpose of this paper, we shall not differentiate these sensors, but refer to them generally as the QIS, because their underlying mathematical principles are similar.
The working principle of QIS is as follows: In CCD and CMOS, one considers each pixel as a "bucket" that collects and integrates photoelectrons. The bucket is partitioned in QIS into thousands of nanoscale cells referred to as "jots". Each jot is capable of detecting a single photon to generate a binary response indicating whether the photon count is above or below a certain threshold q. If the photon count is above q, the sensor outputs a "1"; If the photon count is below q, the sensor outputs a "0". QIS has a very small full-well capacity because it is not designed to accumulate photons. Since the binary response is generated as soon as the number of photons exceeds the threshold, QIS can be operated at very high speed. For example, using single-photon avalanche diodes (SPAD), one can achieve 10k frames per second with a spatial resolution of 320 × 240 pixels [14] or even 156k frames per second with a spatial resolution of 512 × 128 pixels [17]. For a higher spatial resolution, Massondian et al. [9] reports a QIS operating at 1000 frames per second for a spatial resolution of 1376 × 768 pixels.
From a signal processing perspective, the challenge of QIS is the extremely lossy process using binary measurements to acquire the light intensity. In order to compensate for the loss, QIS over-samples the space by using a large number of jots and takes multiple exposures in time. This technique is similar to the classic approach in oversampled analog-to-digital conversions.

Scope and Contribution
The theme of this paper is about how to reconstruct images from the one-bit quantized measurements. Image reconstruction is a critical component of QIS, for without such an algorithm, we will not be able to form images. However, unlike classical Poisson image recovery problems where solutions are abundant [22][23][24][25][26][27], the one-bit quantization of QIS makes the problem uniquely challenging, and there is a limited number of existing methods [28][29][30][31]. Another challenge we have to overcome is the complexity of the algorithm, which has to be low enough that we can put them on cameras to minimize power consumption, memory consumption and runtime. Numerical optimization algorithms are generally not recommended if they are iterative and require intensive computation for every step.
The main contribution of this paper is a non-iterative image reconstruction algorithm for QIS data. We emphasize the non-iterative nature of the algorithm as it makes the algorithm different from existing methods. The new algorithm is based on a transform-denoise framework. The idea is to apply a variance stabilizing transform known as the Anscombe transform [32] to convert a sum of one-bit quantized Poisson random variables to binomial random variables with equal variances. When variance is stabilized, standard image denoising algorithms can be applied to smooth out the noise in the image. Transform-denoise is a single-pass algorithm with no iteration. Empirically, we find that the new algorithm achieves two orders of magnitude improvement in speed and provides an even better reconstruction result than existing iterative algorithms.
The rest of the paper is organized as follows. First, in Section 2, we present the imaging model of QIS. We discuss how the light intensity of the scene is over-sampled and what statistics do the one-bit quantized measurements have. Unlike existing works, which typically assume a quantization level q = 1, we make no assumption about q, except that it is a positive integer. This requires some discussion about the incomplete Gamma function. We also discuss why a simple summation over a local spatial-temporal volume is insufficient to reconstruct images. Section 3 presents the main algorithm. We discuss the concept of variance stabilizing transform, its derivation and its limitations. We also discuss how various image denoising algorithms can be plugged into the framework. In Section 4, we present experimental results. There are two sets of data we will discuss. One is a synthetic dataset in which we can objectively measure the reconstruction quality. The other one is a set of real videos captured by SPAD cameras. We compare the proposed algorithm with existing methods. Proofs of major theorems are given in the Appendix A.

QIS Imaging Model
In this section, we provide a quick overview of the QIS imaging model. A similar description of the model was previously discussed in [13,29]. For notational simplicity, we consider one-dimensional signals. The extension to the two-dimensional images is straightforward. Furthermore, we follow [5] by referring to sub-pixels of a QIS as "jots".
The mathematics of QIS is built upon two concepts: (1) a spatial oversampling process to model the acquisition by the sensor; and (2) a quantized Poisson process to model the photon arrivals. The block diagram of the model is summarized in Figure 1. The overall process can be written as s = αGc. The second part of the block diagram is to generate a binary random variable B m from Poisson random variable Y m . The example at the bottom shows the case where K = 3.

Oversampling Mechanism
We represent the light intensity of a scene using a digital signal c = [c 0 , c 1 , . . . , c N−1 ] T . To avoid the ambiguity of the scaling, we assume that c n is normalized so that c n ∈ [0, 1] for n = 0, 1, . . . , N − 1. A fixed and known constant α > 0 is multiplied to scale c n to the proper range.
QIS is a spatial oversampling device. For an N-element signal c, QIS uses M N number of jots to acquire c. Thus, every c n in the signal c is sampled by M/N jots. The ratio K def = M/N is called the spatial oversampling factor. To model the oversampling process, we follow [13] by considering an upsampling operator (↑ K) and a low-pass filter with filtering coefficients {g k }, as shown in Figure 1. Since upsampling and low-pass filtering are linear operations, the overall process can be compactly expressed using a matrix-vector multiplication where s = [s 0 , s 1 , . . . , s M−1 ] T is the light intensity arriving at those M jots, and the matrix G ∈ R M×N encapsulates the upsampling process and the linear filtering.
There are a variety of choices for the low-pass filter depending on which filter provides a better model of the light intensity. In this paper, we make the following assumption about the low-pass filter. Assumption 1. We assume that {g k } is a boxcar function, such that g k = 1/K for k = 0, . . . , K − 1. Consequently, the matrix G is where 1 K×1 is a K-by-one all one vector, I N×N is an N-by-N identity matrix and ⊗ denotes the Kronecker product.
Intuitively, what Assumption 1 does is to assume that the light intensity is piecewise constant. As will be discussed in Section 3, this is important for us to derive simple algorithms.

Quantized Poisson Observation
At the surface of the m-th jot, the light intensity s m generates y m photons according to the Poisson distribution where Y m denotes the Poisson random variable at the m-th jot and y m denotes the realization of Y m . The one-bit quantization of QIS is a truncation of y m with respect to a threshold q ∈ Z + . Precisely, the observed one-bit measurement at the m-th jot is Denoting B m the random variable of the one-bit measurement at the m-th jot, it follows that the probability of observing B m = 1 is and the probability of observing s k m e −sm k! . For general q, keeping track of the sum of exponentials in Equation (5) could be cumbersome. To make our notations simple, we adopt a useful function called the incomplete Gamma function [33], defined as follows. Definition 1. The incomplete Gamma function Ψ q : R + → [0, 1] for a fixed threshold q ∈ Z + is defined as where Γ(q) = (q − 1)! is the standard Gamma function evaluated at q.
The incomplete Gamma function is continuous in s, but discrete in q. For every fixed s, Ψ q (s) is the cumulative distribution of a Poisson random variable evaluated at q − 1. When q is fixed, Ψ q is the likelihood function of the random variable B m . In this paper, we will focus on the latter case where q is fixed so that Ψ q is a function of s. Since Ψ q is continuous, the derivative of Ψ q (s) is available: With the incomplete Gamma function, we can rewrite Equation (5) as Example 1. When q = 1, the incomplete Gamma function becomes Ψ q (s m ) = e −s m . In this case,

Image Reconstruction for QIS
We consider a multiple exposure setting. Assuming that the scene is stationary, the measurement we obtain is a sequence of binary bit maps {b m,t | m = 0, . . . , M − 1, and t = 0, . . . , T − 1}, where the first index m runs through the M jots spatially, and the second index t runs through the T frames in time. The image reconstruction task is to find the signal c = [c 0 , . . . , c N−1 ] T that best explains {b m,t }. Translating into a probabilistic framework, this can be formulated as maximizing the likelihood: where the constraint s = αGc follows from Equation (1). Taking the logarithm on the right-hand side of Equation (9), the maximization becomes The optimization in Equation (10) is known as the maximum likelihood estimation (MLE). As shown in [13], the objective function of Equation (10) is concave (for any q), and therefore, the global maximum exists and is unique. When G satisfies Assumption 1, the closed form solution is available, and we will show it in Section 3. However, for general G or if we include additional constraints to enforce the smoothness of the solution, e.g., using total variation [34] or sparsity [35], then numerical algorithms are required. Some existing methods include the gradient descent method [13], Newton method [28] and the alternating direction method of multipliers [29]. These algorithms are iterative, and the computation for each iteration is costly.

Non-Iterative Image Reconstruction
In this section, we present a non-iterative image reconstruction algorithm for QIS. There are three components behind the algorithm. The first is the closed-form solution of Equation (10), which is coherent to [13]. The second component is a variance stabilizing transform that transforms the sum of one-bit quantized Poisson random variables to a binomial random variable with equal variances. The transformation is known as the Anscombe transform, named after the English statistician Frank Anscombe (1918Anscombe ( -2001. The third idea is the application of an off-the-shelf image denoiser.

Component 1: Approximate MLE
Because of the piecewise constant property of Assumption 1, the sequence {b m,t } can be partitioned into N blocks where each block contains K × T binary bits. This leads to a decomposition of Equation (10) as where the subsequence B n,t def = {b Kn,t , . . . , b Kn+(K−1),t } denotes the n-th block of the t-th frame. Define be the sum of the bits (i.e., the number of one's) in B n,t . Then, Equation (11) becomes where L = KT. The maximum of Equation (13) is attained when each individual term in the sum attains its maximum. In this case, the closed form solution of Equation (13), for every n, can be derived as in Proposition Equation 1. (13) is

Proposition 1. The solution of Equation
where L = KT, K is the spatial oversampling factor and T is the number of exposures.

Proof. See Appendix A.
It would be instructive to illustrate Proposition 1 using a figure. Figure 2 shows the case when T = 1, i.e., a single exposure, and K = 16. The one-bit measurements are first averaged to compute the number of ones within a block of size K. Then, applying the inverse incomplete Gamma function Ψ −1 q (·) and a scaling constant K/α, we obtain the solution c n .  (13) is found by applying an inverse incomplete Gamma function Ψ −1 q (·) and a scaling factor K/α. Proposition 1 shows why a simple summation S n /L is inadequate to achieve the desired result, although such summation has been used in [18,36]. By only summing the number of ones, the resulting value S n /L is the empirical average of these one-bit measurements. Since the probability of drawing a one in QIS follows a quantized Poisson distribution and not a Bernoulli distribution, the nonlinearity due to the quantized Poisson distribution must be taken into account. A comparison of the ground truth image, the summation result and the MLE solution is shown in Figure 3.
As an immediate corollary of Proposition 1, we can simplify the inverse incomplete Gamma function when q = 1. This result is sometimes known as the exponential correction function [37]. Corollary 1. When q = 1, the MLE solution of the n-th pixel c n is where S n is the number of ones in the n-th block (See Equation (12)) and L = KT is the total number of pixels in the block.
Proof. The result follows from the fact that Ψ q (s) = e −s for q = 1. Thus, Figure 3. Image reconstruction using synthetic data. In this experiment, we generate one-bit measurements using a ground truth image (a) with α = 160, q = 5, K = 16, T = 1 (so L = 16). The result shown in (b) is obtained using the simple summation, whereas the result shown in (c) is obtained using the MLE solution. It can be seen that the simple summation has a mismatch in the tone compared to the ground truth.

Component 2: Anscombe Transform
The MLE solution c = [ c 0 , . . . , c N−1 ] T computed through Proposition 1 is noisy, as illustrated in Figure 3. The reason is that for a relatively small K and T, the randomness in the one-bit measurement has not yet been eliminated by the summation in S n . Therefore, in order to improve the image quality, additional steps must be taken to improve the smoothness of the image.
At first glance, this question seems easy because if one wants to mitigate the noise in c, then directly applying an image denoising algorithm D to c would be sufficient, e.g., Figure 4a. However, a short afterthought will suggest that such an approach is invalid for the following reason. For the majority of image denoising algorithms in the literature, the noise is assumed to be independently and identically distributed (i.i.d.) Gaussian. In other words, the variance of the noise should be spatially invariant. However, the resulting random variable in Figure 2 does not have this property.
Our proposed solution is to apply an image denoiser before the inverse incomplete Gamma function as shown in Figure 4b. Besides the order of denoising and the Gamma function, we also add a pair of nonlinear transforms T and T −1 before and after the denoiser D. The reasons for these two changes are based on the following observations. The proof of Observation 1 follows immediately from Assumption 1 that if G = (1/K)I N×N ⊗ 1 K×1 , we can divide the M jots into N groups each having K × T entries. Within the group, the one-bit measurements are all generated from the same pixel c n .
The consequence of Observation 1 is that for a sequence of i.i.d. Bernoulli random variables, the sum is a Binomial random variable. This is described in Observation 2.
for k = 0, . . . , K − 1 and t = 0, . . . , T − 1, then the sum S n defined in Equation (12) is a Binomial random variable with mean and variance: Observation 2 is a classic result in probability. The mean of the Bernoulli random variables is specified by the incomplete Gamma function Ψ q αc n K , which approaches one as K increases. Thus, for fixed T, the probability 1 − Ψ q αc n K → 0 as K → ∞. When this happens, the binomial random variable S n can be approximated by a Poisson random variable with mean L 1 − Ψ q αc n K [38]. However, as T also grows, the binomial random variable S n can be further approximated by a Gaussian random variable due to the central limit theorem. Therefore, for a reasonably large K and T, the resulting random variable S n is approximately Gaussian.
The variance of this approximated Gaussian is, however, not constant. The variance changes across different locations n because Var[S n ] is a function of c n . Therefore, if we want to apply a conventional image denoiser (which assumes i.i.d. Gaussian noise) to smooth S n , we must first make sure that the noise variance is spatially invariant. The technique used to accomplish this goal is called the variance stabilizing transform [39]. In this paper, we use a specific variance stabilizing transform known as the Anscombe transform [32]. Anscombe transform is best known in the image processing literature for Poisson denoising, where one transforms observed Poisson data to approximately Gaussian with equal variance [24]. For binomial random variables S n , the Anscombe transform and its property are given in Theorem 1.
Theorem 1 (Anscombe transform for binomial random variables). Let S n be a binomial random variable with parameters (L, p n ), where p n = 1 − Ψ q αc n K and L = KT. Define the Anscombe transform of S n as a function T : {0, . . . , L} → R, such that Then, the variance of Z n is Var[Z n ] = 1 4 + O(L −2 ) for all n.
Proof. The proof of Theorem 1 is given in the Appendix A. It is a simplified version of a technical report by Brown et al. [40]. The original paper by Anscombe [32] also contains a sketch of the proof. However, the sketch is rather brief, and we believe that a complete derivation would make this paper self-contained.
The implication of Theorem 1 is that regardless of the location n, the transformed random variable Z n has a constant variance 1 4 when L is large. Therefore, the noise variance is now location independent, and hence, a standard i.i.d. Gaussian denoiser can be used.

Example 2.
To provide readers a demonstration of the effectiveness of Theorem 1, we consider a checkerboard image of N = 64 pixels with intensity levels c 0 , . . . , c N−1 . The n-th pixel c n generates K = 100 binary quantized Poisson measurements {B Kn , . . . , B Kn+(K−1) } using α = 100, q = 1, T = 1 (so L = 100). From each of these K measurements, we sum to obtain a binomial random variable S n = ∑ K−1 k=0 B Kn+k . We then compute the variance of Var[S n ] and Var[T (S n )] using 10 4 independent Monte Carlo trials. The results are shown in Figure 5, where we observe that Var[S n ] varies with the location n, and Var[T (S n )] is nearly constant for all n.

Var[S n ]
Var[T (S n )]

Remark 1. The inverse Anscombe transform is
which we call the algebraic inverse. Another possible inverse of the Anscombe transform is the asymptotic unbiased inverse [32], defined as The performance of the unbiased inverse is typically better for low noise (large L), whereas the algebraic inverse is better for high noise (small L). This is consistent with the Poisson denoising literature. (e.g., [24]).
Example 3. Table 1 shows the peak signal to noise ratio (PSNR) values of the reconstructed images using the algebraic inverse and the asymptotic unbiased inverse. In this experiment, we consider 10 standard images commonly used in the image processing literature: Baboon, Barbara, Boat, Bridge, Couple, Hill, House, Lena, Man and Peppers. The sizes of the images are either 256 × 256 or 512 × 512. For each image, we set T = 1, q = 1 and α = K and vary K = {1, 4,9,16,25,36,49, 64}. The results in Table 1 indicate that T −1 unbias is consistently better than T −1 for K > 1, although the difference diminishes as K grows. Table 1. PSNR values using algebraic inverse T −1 and asymptotic unbiased inverse T −1 unbias . The results are averaged over 10 standard images. In this experiment, we set T = 1, q = 1 and α = K.

Component 3: Image Denoiser
An important feature of the proposed algorithm is that it can take any off-the-shelf image denoiser for the operator D. Here, by image denoiser, we meant an image denoising algorithm designed to remove i.i.d. Gaussian noise from an observed image.
Image denoising is an important research topic by its own. In the following, we provide a few popular image denoising algorithms.

•
Total variation denoising [34]: Total variation denoising was originally proposed by Rudin, Osher and Fatemi [34], although other researchers had proposed similar methods around the same time [41]. Total variation denoising formulates the denoising problem as an optimization problem with a total variation regularization. Total variation denoising can be performed very efficiently using the alternating direction method of multipliers (ADMM), e.g., [42][43][44]. • Bilateral filter [45]: The bilateral filter is a nonlinear filter that denoises the image using a weighted average operator. The weights in a bilateral filter are the Euclidean distance between the intensity values of two pixels, plus the spatial distance between the two pixels. A Gaussian kernel is typically employed for these distances to ensure proper decaying of the weights. Bilateral filters are extremely popular in computer graphics for applications, such as detail enhancement. Various fast implementations of bilateral filters are available, e.g., [46,47].

•
Non-local Means [48]: non-local means (NLM) was proposed by Buades et al. [48] and, also, an independent work of Awante and Whitaker [49]. Non-local means (NLM) is an extension of the bilateral filter where the Euclidean distance is computed from a small patch instead of a pixel. Experimentally, it has been widely agreed that such patch-based approaches are very effective for image denoising. Fast NLM implementations are now available [50][51][52]. • BM3D [53]: 3D block matching (BM3D) follows the same idea of non-local means by considering patches. However, instead of computing the weighted average, BM3D groups similar patches to form a 3D stack. By applying a 3D Fourier transform (or any other frequency domain transforms, e.g., discrete cosine transform), the commonality of the patches will demonstrate a group sparse behavior in the transformed domain. Thus, by applying a threshold in the transformed domain, one can remove the noise very effectively. BM3D is broadly regarded as a benchmark of today's image denoising algorithm.
The factors to consider in choosing an image denoiser are typically the complexity and quality. Low complexity algorithms, such as bilateral filter, are fast, but the denoising ability is limited. High end algorithms, such as BM3D and non-local means, produce very good images, but require much computation. The trade-off between complexity and performance is a choice of the user.
Readers at this point may perhaps ask a question: What will happen if we apply a denoiser after the MLE solution, like the one shown in the block diagram in Figure 4a? As we have explained in the Anscombe transform section, this will lead to a suboptimal result because the noise is not i.i.d. Gaussian. To illustrate the difference in terms of performance, we show in Figure 6 a comparison between applying image denoising using the two block diagrams shown in Figure 4. The denoiser we use in this experiment is BM3D. The metric we use to evaluate the performance is the peak signal to noise ratio (PSNR), which will be defined formally in Section 4. In short, a large PSNR value is equivalent to a low mean squared error comparing the estimated image and the ground truth image. The results are shown in Figure 6. Although denoising after the MLE (the conventional idea) generates some reasonable images, the PSNR values are indeed significantly lower than the proposed Anscombe approach. This is not surprising, because the denoiser tends to oversmooth the dark regions and undersmooth the bright regions due to the signal-dependent noise levels.  Figure 6. Comparison between image denoising after the MLE solution and using the proposed Anscombe transform. The denoiser we use in this experiment is 3D block matching (BM3D) [53]. The binary observations are generated using the configurations α = 160, q = 5, K = 16, T = 1. The values shown are the peak signal to noise ratio (PSNR).

Related Work in the Literature
The proposed algorithm belongs to a family of methods we call the transform-denoise methods. The idea of transform-denoise is similar to what we do here: transform the random variable using a variance stabilizing transform, then denoise using an off-the-shelf image denoiser. Among the existing transform-denoise methods, perhaps the most notable work is the one by Makitalo and Foi [24], where they considered the optimal inverse of the Anscombe transform for the case of Poisson-Gaussian random variables. A more recent work by the same research group [27] showed that it is possible to boost the denoising performance by applying the transform-denoise iteratively. We should also mention the work by Foi [54], which considered the modeling and transformation for clipped noisy images. The problem setting of that work is for conventional sensors. However, the underlying principle using the transform-denoise approach is similar to that of QIS.
The approximate MLE solution in Section 3.1 is based on the piecewise constant assumption (Assumption 1). Under this assumption, summing of the Bernoulli random variables can be thought of as performing a "binning" of the pixels. Binning is a common technique in restoring images from Poisson noise, especially when the signal-to-noise ratio is low [23,25,26]. Binning can also be applied together with transform-denoise, e.g., in [27], to achieve improved results. For QIS, the result of binning is different from that of the Poisson noise, for the sum of QIS bits leads to a binomial random variables, whereas the sum of Poisson noise leads to a Poisson random variable.

Experimental Results
In this section, we provide further experimental results to evaluate the proposed algorithm.

Synthetic Data
In this experiment, we consider 100 natural images downloaded from the Berkeley Segmentation database [55]. The input resolution of these images is 481 × 321 (or 321 × 481), and all images are converted to a gray-scale image with values in the range [0, 1]. To generate the synthetic data, for each image, we consider an oversampling factor of four along the horizontal and the vertical directions (so K = 16). The sensor gain is set as α = 16, and the threshold level is q = 1 to simulate a single photon sensor that triggers at one photon. We generate one-bit observations using a quantized Poisson statistics and use the proposed algorithm to reconstruct. As a comparison, we also test the MLE solution, i.e., a summation followed by the inverse incomplete Gamma transform (see Theorem 1), and an ADMM algorithm using a total variation regularization [29,31]. For the proposed algorithm, we use BM3D as the image denoiser.
We report two results in this experiment. First, we consider the peak signal to noise ratio (PSNR) as an evaluation metric. The PSNR of an estimated image c compared to the ground truth image c * is defined as where N is the number of pixels in c. Typically, higher PSNR values imply better image reconstruction quality. The PSNR values of these 100 images are shown in Figure 7. In this figure, we observe that the proposed algorithm is better than the MLE solution and the ADMM solution by 10.20 dB and 2.75 dB, respectively, which are very substantial amounts from a reconstruction perspective.  [55]. In this experiment, we fix q = 1, α = 16, and K = 16. The proposed algorithm uses BM3D [53] as the image denoiser.
Apart form the PSNR values, we also report the runtimes of the algorithms. The runtimes of the algorithms are recorded by running the methods on the same machine and the same platform, which is an Intel i7-6700 3.4-GHz desktop with Windows 7/MATLAB 2014. As shown in Figure 8, the runtime of the proposed algorithm is approximately two orders of magnitude (100×) faster than the ADMM algorithm. As for the influence of the oversampling factor K on the reconstruction quality, we show in Figure 9 the reconstructed results using K = 4, 16, 64 and the binary one-bit measurements. While it is clear from the figure that the reconstructed image improves as K increases, we observe that most of the visual content has been recovered even at K = 4.
One-bit measurement K = 4 K = 16 K = 64 Figure 9. Influence of the oversampling factor K on the image reconstruction quality. In this experiment, we set α = K, q = 1. T = 1.

Real Data
In this experiment, we consider two single-photon avalanche diode (SPAD) cameras for capturing high speed videos. The first camera is a CMOS SPAD-based image sensor developed by Dutton et al. [14][15][16]. This camera has a resolution of 320 × 240, with a frame rate of 10k frames per second. The second camera is the SwissSPAD camera developed by Burri et al. [17,18]. This camera has a resolution of 512 × 128, with frame rate of 156k frames per second. Both cameras capture one-bit measurements from a scene containing a stationary background with a rapidly moving foreground. Our experimental goal is to test if the proposed algorithm can resolve the spatial content with minimal trade-off in the temporal resolution.
There are several points of this experiment on which we should comment. First, since the spatial resolution of these two cameras is relatively small (as compared to the synthetic case), we do not assume any spatial oversampling, i.e., K = 1. Instead, we use T temporal frames to reconstruct one output image. To ensure smooth transitions across adjacent frames, we use a temporally-sliding window as we progress to the next output image. Second, since these real videos do not have a ground truth, we can only compare the quality of the resulting images visually.
We first look at the results of the SPAD camera by Dutton et al. [14][15][16]. Figure 10 shows several snapshots of the "fan" sequence and the "milk" sequence. To generate this result, we run the proposed algorithm using T = 16 frames in a sliding window mode. That is, we use Frames 1 to 16 to recover Frame 1 and Frames 2 to 17 to recover Frame 2, etc. The quantization level q is set as q = 1, and the sensor gain α is adjusted to produce the best visual quality. For the "fan" sequence, we set α = 4, and for the "milk" sequence, we set α = 0.75. As we can see from the figures, the proposed algorithm recovers most of the content from the scene, even revealing the textures of the milk in the scene.
As for the SwissSPAD camera, we consider a video sequence "oscilloscope" captured at a frame rate of 156k frames per second. The goal is to track the sinusoid shown on the oscilloscope's screen. We consider four values of T = 4, 16, 64 and 256 for the number of frames. The results are shown in Figure 11. As one may expect, when T = 256, the image quality improves because we are effectively summing 256 frames to reconstruct one output frame. However, since we are summing the 256 frames, the temporal resolution is severely distorted. In particular, the trace of the sinusoid signal disappears because of the strong averaging effect. When we reduce the number of frames to T = 16, we observe that the trace of the signal can be observed clearly. The same image obtained by the MLE (i.e., simple summation) is still highly noisy.
(a) "Fan" sequence (b) "Milk" sequence Figure 10. Image reconstruction of two real video sequences captured using a 320 × 240 single-photon avalanche diode (SPAD) camera running at 10k frames per second [14][15][16]. In this experiment, we use T = 16 frames to construct one output frame. In both columns, the left are the raw one-bit measurements, and the right are the recovered images using the proposed algorithm.
(a) Snapshot of the raw one-bit image (b) Temporal sum of K frames (c) Output of the proposed algorithm Figure 11. Image reconstruction of real video sequences captured using the 512 × 128 SwissSPAD camera running at 156k frames per second [17,18].

Conclusions
We present a new image reconstruction algorithm to recover images from one-bit quantized Poisson measurements. Different from existing algorithms that are mostly iterative, the new algorithm is non-iterative. The algorithm consists of three key components: (1) an approximation to the standard maximum likelihood estimation formulation that allows us to decouple the dependency of pixels; (2) a nonlinear transform known as the Anscombe transform that converts a sum of one-bit quantized Poisson random variables to a Gaussian random variable with equal variance; (3) an off-the-shelf image denoising algorithm that performs the smoothing. Experimental results confirm the performance of the proposed algorithm. The algorithm demonstrates two orders of magnitude improvement in speed compared to existing iterative methods and shows several dBs of improvement in terms of the peak signal to noise ratio (PSNR) as a metric of image quality.

Conflicts of Interest:
The authors declare no conflict of interest.

Abbreviations
The following abbreviations are used in this manuscript: Proof. For notational simplicity, we drop the subscript n. Then, the optimization problem is Taking the first order derivative with respect to c and setting to zero yields (with the help of Property 1): Rearranging the terms leads to the desired result that Appendix A.2. Proof of Theorem 1 Proof. For notational simplicity, we drop the subscript n. Our goal is to show that if X∼Binomial(L, p), then the transformed variable: has a variance Var[T (X)] = 1 4 + O(L −2 ). To this end, we first consider the function Q, such that Q(X) = T (X) − L + 1 2 sin −1 √ p. There are two terms in this equation. The first term L + 1 2 can be expanded (using Taylor expansion) to its first second order as The arcsin function can be expanded to its second order as We next consider the standardized binomial random variable by defining Then, by Lemma A1, it follows that Therefore, Since the first four moments of Y are  . It holds that By expanding √ 1 + E 1 , we arrive at Multiplying both sides by 1 − p and substituting for E 1 yields The proof of the second equality can be done by expressing 1 − F in terms of Y as By expanding √ 1 + E 2 , we arrive at Multiplying both sides by √ p and substituting for E 2 yields 32(1 − p)L