Computational multi-depth single-photon imaging.

We present an imaging framework that is able to accurately reconstruct multiple depths at individual pixels from single-photon observations. Our active imaging method models the single-photon detection statistics from multiple reflectors within a pixel, and it also exploits the fact that a multi-depth profile at each pixel can be expressed as a sparse signal. We interpret the multi-depth reconstruction problem as a sparse deconvolution problem using single-photon observations, create a convex problem through discretization and relaxation, and use a modified iterative shrinkage-thresholding algorithm to efficiently solve for the optimal multi-depth solution. We experimentally demonstrate that the proposed framework is able to accurately reconstruct the depth features of an object that is behind a partially-reflecting scatterer and 4 m away from the imager with root mean-square error of 11 cm, using only 19 signal photon detections per pixel in the presence of moderate background light. In terms of root mean-square error, this is a factor of 4.2 improvement over the conventional method of Gaussian-mixture fitting for multi-depth recovery.


Introduction
The ability to acquire 3D structure of a scene is important in many applications, such as biometrics [1], terrestrial mapping [2], and gaming [3]. Time-of-flight methods for 3D acquisition illuminate the scene and analyze the backreflected light signal [4,5]. However, as illustrated in Fig. 1, scenes that include partially-reflective or partially-occluding objects have complex patterns of light being reflected at different depths even at a single pixel. For such scenes, one can analyze multiple light returns to fully reconstruct the multiple depths present in the field-of-view. This is known as the problem of multi-depth reconstruction from full-waveform measurements [6].
Conventional time-of-flight imaging sensors, such as amplitude-modulated continuous-wave (AMCW) modules, aim to reconstruct multi-depth profiles by the methods of transient imaging [7][8][9]. For low-power and long-range 3D imaging applications, a sensitive single-photon detector is used instead in the active imaging setup for low-light level operations [10][11][12][13]. Recent advances in single-photon imaging system design allowed experimental demonstration of low-flux time-of-flight imaging at long ranges (∼100 m) [14,15]. Given a pulsed illumination source, time histogramming methods of the photon detections from the backreflected pulse waveform have been used for pixelwise reconstruction of scene depths using single-photon imagers [16,17]. For example, the high sensitivity of time-correlated single-photon imaging systems was demonstrated when full photon histogram measurements were used to track pulses of light in flight [18].
For low-flux multi-depth imaging of scene reflectors in particular, one may choose to identify the peaks in the photon histogram by brute-force search over time bins. However, since this leads to a large processing time (polynomial in the number of time bins, with degree equal to the number of depths), fast algorithms using parametric deconvolution or finite-rate-of-innovation methods have been developed [19,20]. Assuming accurate waveform measurement, the compressive depth acquisition camera (CoDAC) [21] framework also exploits parametric deconvolution, but for estimating positions of extended planar facets rather than multiple depths per transverse location.
The previously described multi-depth imaging methods using single-photon detectors only give accurate results when the image acquisition time is long enough that the number of photon detections is sufficiently high to form an accurate histogram. The problem of recovering the multi-depth information is generally difficult in low-flux scenarios due to the low signal- to-noise ratio of observing only a few photon detections, and extraneous background light and detector dark counts. For moderate numbers of detected photons, a statistical approach has been used to estimate the multi-depth profile by learning a mixture of Gaussians (MoG) model that interprets the photon detection data as samples from a distribution of the full-waveform observation. In the mixture model, the mode of each mixture component corresponds to a depth value of a target. Learning of the mixture model is achieved either by using the expectationmaximization (EM) algorithm [22] or the Markov chain Monte Carlo (MCMC) method [23,24]. However, MoG-based multi-depth estimation involves a non-convex cost function and generates locally optimal solutions whose accuracy is generally poor when the number of detected photons is low. Thus, existing methods are limited in their ability to recover a scene's multidepth information in a photon-efficient manner.
In this paper, we demonstrate a full-waveform imaging framework that accurately recovers the multi-depth profile from single-photon observations. For example, as detailed in Section 4.2, our framework was successful in accurately reconstructing the depth features of a mannequin behind a partially-reflecting medium using only 19 signal photon detections with root meansquare (RMS) depth error of 11.4 cm. Compared to the conventional MoG-based method, which gave RMS depth error of 48.7 cm, this is an improvement by a factor of 4.2. Unlike previous works, we show that the multi-depth estimation problem from single-photon observations can be reformulated as a convex optimization problem by combining the statistics of photon detection data with sparsity of multi-depth profiles. By adapting the iterative shrinkage-thresholding algorithm (ISTA) [25,26] used for robust sparse signal pursuit to our single-photon imaging setup, we accurately solve for the global optimum of the convex optimization problem to obtain a multi-depth solution from a small number of photon detections. Figure 2 depicts the single-photon imaging setup we used to estimate the 3D structure of scene. A focused optical source, such as a laser, periodically illuminates a patch of the scene using the pulse waveform function s(t) starting at time 0. Let T r be the pulse repetition period in seconds, N be the total number of independent pulse illuminations made at the image pixel, and T p be the pulsewidth defined as the RMS duration of the pulse waveform. The single-photon detector, in conjunction with a time correlator, is used to time stamp individual photon detections that are generated by the backreflected waveform from the scene plus extraneous detections arising from background light and dark counts. The recorded time of a photon detection is relative to the time of the most recent pulse illumination. We raster scan the optical source over multiple pixels in the scene to recover a spatially-resolved depth profile.

Observation model at a single pixel
We first derive the relationship between the depths of multiple reflectors in the scene and our observed data of photon detections. To avoid unnecessary notation, we characterize the light transport and detection statistics for a single pixel; the same model applies at each pixel.
Let r(t) be the total optical flux that is incident on the single-photon detector. Through the linearity and time invariance of optical flux transport, we can write where h(t) is the impulse response of the scene, B is the constant background light flux in photons per second, and * represents convolution. Then, the rate function λ (t) that describes Recovered multi-depth profile from photon detections at a pixel (1) (2) indicator of reflectors (1) (2) extraneous response the photon detections at the single-photon detector is where η is the detector's quantum efficiency, d is the detector dark count rate, and h d (t) is the detector's response function. For simplicity, we assume a normalized detector response function so that h d (t) dt = 1. Substituting Eq. (1) into Eq. (2) then gives wheres = h d * s is the effective pulse waveform after accounting for the detector response. Note that because ηB + d is constant in t and h d (t) dt = 1, the detector response function does not appear in the second term. Hereafter, we assume that the scene is entirely within the maximum unambiguous range of the imager, so h(t) = 0 for t > T r . We also assume that T p T r , so we may avoid complicating the exposition by assuming (h * s)(t) = 0 for t ∈ [0, T r ).
A time-correlated single-photon detector records the time-of-detection of a photon within a timing resolution of ∆ seconds. We choose a pulse repetition period that is much longer than the timing resolution (∆ T r ) and assume that T r is divisible by ∆. Then, we see that m = T r /∆ is the number of time bins that may contain photon detections. In other words, m is the dimension of the photon count vector y, where y k ∈ N for k = 1, 2, . . . , m. (We use y k to indicate the scalar value at the kth index of the vector y.) Using the probabilistic theory of photon detections [27], after N pulse illuminations at an image pixel, the kth bin of the observed photon count histogram is distributed as We would like to reach an approximation in which the Poisson parameter of y k is given by the product of a known matrix and an unknown vector. We will approximate (h * s)(t) with a sampling period of ∆ , where we have ∆ = T r /n for some n ∈ Z + . We can approximate the first term in the Poisson parameter in Eq. (4) as where (a) follows from partitioning [0, T r ) into n subintervals and (b) from replacing h(y) and Note that the quality of the approximation (b) will depend on the size of ∆ . Finally, using the derived approximations, the observation model of Eq. (4) can be rewritten in a concise matrixvector form as where x is an n × 1 vector, S is an m × n matrix, and 1 m is an m × 1 vector of ones, and b = N∆(ηB + d).

Observation likelihood expressions
Our goal of multi-depth reconstruction is to accurately estimate x from y. Using Eq. (6), the photon count histogram vector y has the probability mass function We can thus write the negative log-likelihood of x as where . = indicates equality up to terms independent of x. By checking the positivesemidefiniteness of the Hessian matrix of the negative log-likelihood function, it is straightforward to prove that L (x; y, S, b) is convex in x.

Characteristics of the impulse response functions of natural scenes
It has been shown that the following K-reflector model is effective in describing the impulse response of a natural scene with multiple reflectors [8,23]: where a (i) and d (i) are respectively the reflectivity and depth values of the ith reflector at an image pixel, δ (·) denotes the delta function, c is the speed of light, and K is the number of reflectors. Let us choose the indexing rule so that d (1) < d (2) < · · · < d (K) . Then, we define the minimum separation of reflector depths as We observe that, assuming the K-reflector model and c∆ /2 < d s , exactly K elements of x are nonzero and those entries have values

Novel image reconstruction algorithm
The multi-depth profile can be estimated by minimizing the negative log-likelihood function, while including a signal support constraint that the number of nonzero elements in x must be equal to K. However, the 0 -norm constraint set, which describes the set of vectors with K nonzero elements, is a non-convex set. With no additional assumptions, exactly solving an optimization problem constrained to this set is computationally infeasible, since the problem is NP-hard [28]. In order to design a computationally feasible algorithm, we apply the convex relaxation whereby the 1 -norm serves as a proxy for the 0 -norm [29]. Our proposed imaging framework also includes a constraint on reflectivity values being nonnegative. Thus, we obtain the multi-depth profile estimatex OPT by solving the following 1 -penalized and constrained likelihood optimization problem: where τ > 0 is the variational parameter controlling the degree of penalizing the non-sparsity of the signal. Because L (x; y, S, b) and the 1 -norm are both convex functions in x and the nonnegative cone is a convex set, the minimization problem given in (OPT) is a convex optimization problem. ISTA is a celebrated algorithm for rapidly solving the 1 -penalized constrained likelihood optimization problem when the data is corrupted by additive white Gaussian noise. However, instead of using a Gaussian likelihood, (OPT) is derived based on the model for single-photon observations. Thus, we modify the first step of ISTA that takes a gradient descent in the leastsquares cost to one that takes a gradient descent in the negative log likelihood obtained from photon observations given in Eq. (9). We can compute the gradient of our negative log likelihood function as where we used (S T ) k to denote the kth column of S T and div(·, ·) represents elementwise division of the vector in the first argument by the vector in the second one. We then modify the second step of ISTA that performs a shrinkage-thresholding operation on the gradient-descent solution to include the nonnegativity constraint of scene reflectivities. Our extra nonnegativity constraint replaces the shrinkage-thresholding operation with a shrinkage-rectification operation. The shrinkage-thresholding used in ISTA and the shrinkage-rectification used in our algorithm are illustrated in Fig. 3. Our modified ISTA algorithm is given in Algorithm 1. After solving (OPT) using Algorithm 1, we apply post-processing onx OPT that sets small residual nonzero elements to zero and groups closely-neighboring nonzero elements into an average depth. The depth grouping step is to ensure maximum sparsity of the multi-depth profile, by using the assumption that a tight cluster of depth estimates originates from a single reflector in the scene. This end-to-end processing is summarized in Algorithm 2. To illustrate the role of the post-processing, Fig. 4 shows the intermediate SPISTA output and the final output of Algorithm 2 for one pixel of our experiment detailed in Section 4.2.2 (see pixel marked in experimental setup figure).
In summary, we have developed a low-flux multi-depth imaging framework that incorporates the statistics of single-photon detections with the sparsity of the multi-depth profile at a pixel to formulate a convex optimization problem in (OPT). This is unlike existing histogram-based Fig. 3. Illustration of a shrinkage-thresholding operation used as a step in ISTA (left) and the shrinkage-rectification operation used as a step in our SPISTA (right) that includes the nonnegativity constraint. Here the operations map scalar v to scalar z (variables only used for illustration purposes), with regularization parameter τ.
Algorithm 1 Single-photon iterative shrinkage-thresholding algorithm (SPISTA) low-flux methods, such as MoG, which solve a non-convex problem directly and do not guarantee high accuracy solutions due to local minima. Our framework modifies ISTA to include accurate photodetection statistics and the nonnegativity constraint to accurately solve (OPT). Our algorithm also employs post-processing to ensure filtering of residual signals and clustering depth estimates to maximize the level of sparsity of the final multi-depth estimate.

Performance of two-path recovery
Using simulations, we first study the multi-depth estimation performance for K = 2, motivated by the two-Dirac recovery scenario of second-order multipath interference from reflective surfaces [30] and looking through a transparent object [31]  of-flight imaging. We focus on comparing two algorithms: the MoG-based estimator using a greedy histogram-data-fitting strategy and our proposed imager using convex optimization. Let {d 1 , d 2 }, with d 1 < d 2 , be the set of true depths at a pixel. Also, let {d 1 ,d 2 }, withd 1 <d 2 , be the set of identified depths obtained using either the MoG method or our proposed framework. Then, we use the pulsewidth-normalized root mean-square error (NRMSE) to quantify the recovery performance of the two-path signal: such that if NRMSE is below 1, then the imager has achieved sub-pulsewidth depth accuracy. When more than two paths were estimated by the algorithm, two depth values with highest intensities were used for NRMSE computation. Figure 5 shows Monte Carlo simulated performance results of pixelwise two-path estimation using the MoG-based method and our method. The results are presented for low (b = 0. tion samples, we used the EM algorithm. Simulation parameters were set as the following: the number of detector time bins m = 100, the number of discretized depth bins n = 100, RMS pulsewidth T p = 0.3 ns, and bin width ∆ = 1 ns. The pulse shape was discrete Gaussian. The number of Monte Carlo trials for the simulation was 2000. For each Monte Carlo trial, two entries out of n were generated in a uniformly random fashion. (A two-path reflector profile was chosen from n-choose-2 combinations at random.) Both selected entries were set to 1, in order to simulate two reflectors with unit reflectivities. For our algorithm, we chose the regularization parameter τ = b, based on a heuristic that higher penalty is required for higher background levels. We chose the convergence parameter δ = 0.01, and the residual filtering parameter ε = 0.1. Also, our initializationx (0) was chosen to be y. We see that for both low and high background levels, our proposed framework uniformly outperforms the existing MoG method for various numbers of photons backreflected from the scene. For example, for both b = 0.1 and b = 0.5, the difference in RMSE between our framework and MoG is around 2 given 10 signal photon detections. This translates to RMS depth error reduction of 9 cm, since 1 NRMSE equals cT p /2 = 4.5 cm. Also, our method successfully achieves sub-pulsewidth depth accuracy (NRMSE less than 1) when the number of signal photons exceeds ∼30, while the MoG method fails to do so. In this simulation, we required an average of 85 SPISTA iterations per pixel. The average per pixel processing time of Algorithm 2 was measured to be ∼0.004 seconds. Our algorithm's processing time is short because the computational time of a SPISTA update is linear in the number of depth bins n, since the most costly operation in SPISTA is the size-n convolution, and the post-processing step only requires a linear search over n bins. The average per pixel processing time of the MoG method was measured to be ∼0.019 seconds. All processing was done using a laptop with Intel(R) Core(TM) i7-4500u CPU @ 1.80 GHz.

Resolvability
In the problem of multi-depth estimation, it is natural to ask how small the depth separation of two adjacent reflectors can be so that the proposed algorithm can still accurately resolve two reflectors instead of one. Figure 6 shows simulation results that describe how the number of cm between reflectors  Fig. 6 show the mean number of estimated reflectors. In Fig. 6, for all relative reflectivity settings, we observe that if the distance between two reflectors is too small (around 3cT p ), then our algorithm falsely recognizes two reflectors as a single reflector. Even when there is a large separation between reflectors (larger than 3cT p ), if the ratio between target reflectivities is too small (such as 1/8), then the number of reflectors is likely to be falsely recognized as 1 by our Algorithm 2.

Experimental results
We demonstrate the performance of the proposed multi-depth imager using an experimental single-photon imaging setup. A PicoQuant LDH series pulsed laser diode with center wavelength of 640 nm, pulsewidth T p = 270 ps, and repetition period T r = 100 ns was used as the illumination source. We observed that the laser spot size cast on a planar object at a 1 m distance was around 1 mm, implying that the beam solid angle was around 7.9 × 10 −7 sr. A Thorlabs GVS012 two-axis galvo was used to raster scan the scene with a field-of-view of 40 • × 40 • . A lensless Micro Photon Devices PDM series Geiger-mode avalanche photodiode detector with quantum efficiency η = 0.35, timing jitter less than 50 ps, and dark counts per second less than 2 × 10 4 was used for detection of photons. A PicoQuant model HydraHarp 400 time-correlator was used to record the detection times of individual photons. Figure 7 shows a photograph of our experimental single-photon imaging setup. We injected extraneous background light using an incandescent lamp. In this paper, we chose to use the raster-scanning setup simply due to availability of equipment; one can observe that our computational framework can be applied without modification when employing an imaging setup that includes a flood illumination source and an array of single-photon detectors.
For all experiments, we obtained the pulse waveform matrix S and background-light plus dark-count value b through a calibration step prior to the scene measurements. We obtained the pulse shape by projecting the laser light at the wall, which was a reflective plane 4 meters from the imager, and collecting a photon count histogram. We time-gated an interval of the photon count histogram at around 4 × 2/c seconds, such that the interval contained a clean representation of the pulse histogram. By creating a convolution matrix using the calibrated pulse waveform shape, we got a measurement of S from Eq. (5c). The value of b from Eq. (7) was obtained by taking a baseline measurement of the incandescent light with laser light not present in scene. Code and data are available in [32]. 4.2.1. Imaging through a partially-reflective object Figure 8 shows experimental results of imaging through a partially-reflective object, which is the multi-depth imaging scenario in Fig. 1(a), using the MoG-based and proposed multidepth estimation methods with an average of 46 photon detections at each pixel. We used a stack of plastic sheets enclosed in a plexiglass case as the partially-reflective object, with an average reflectivity of ∼50%. Through calibration we found that the probabilities of a photon coming from the scatterer, the scene behind the scatterer, and background light or dark counts were equal to 0.44, 0.42, and 0.14, respectively. These numbers were calibrated using a single histogram with 10000 photons gathered from all pixels, where each pixel contributed a single photon to the aggregate histogram. Thus, the number of photons that originated from the sceneof-interest behind the scatterer is calculated to be 46×0.42 ≈ 19. The raster-scanning resolution was set to be 100 × 100 in this experiment. We see that the existing MoG-based method fails to recover useful depth features of the mannequin, but our method successfully does so. For example, in the side view of the reconstructed depth, the result from our method differentiates the longitudinal locations of the face and the torso of the mannequin, unlike the result from the MoG method. We were able to form a ground truth depth map of the mannequin by using the log-matched filtering solution [27] on a larger dataset of 500 photons per pixel, after time-gating the photon arrivals at around 2.6 ns such that remaining photons only describe the mannequin scene placed at around 4 m. With respect to this ground truth, the conventional MoG solution gave 48.7 cm of RMS depth error and ours gave 11.4 cm. The RMS depth error was computed as the square root of the average of squared depth errors over all pixels. Our framework thus gave an improvement in reducing erroneous pixel depth by a factor of 4.2, compared to the MoG method, for the task of imaging a scene with partially-reflective object.
In this experiment of imaging through a partially-reflecting object, we required an average of 98 SPISTA iterations per pixel. The average per pixel processing time of Algorithm 2 was measured to be ∼0.036 seconds and its total processing time for the spatially-resolved multi-depth reconstruction was ∼6 minutes. The average per pixel processing time of the MoG method was measured to be ∼0.003 seconds and its total processing time was ∼30 seconds.

Imaging a partially-occluding object
For experimental validation of multi-depth estimation for scenes with partially-occluding objects, we used a photon detection dataset that was collected by Dheera Venkatraman for the first-photon imaging project [11] with the same raster-scanning setup. Our experimental scene consisted of a sunflower and the background wall as shown in Fig. 9. This experiment models the multi-depth imaging scenario in Fig. 1(b). Here we have a higher raster-scanning resolution of 300 × 300, such that many pixels are at the depth boundaries of the two reflective objects: the sunflower and the wall. There are multiple returns at such pixels, where the sunflower petals partially occlude the wall behind it. This artifact is also known as one of mixed pixels in the context of time-of-flight imaging [33]. In our data, the probabilities of a photon originating from the scene and from background light or dark counts are 0.8 and 0.2, respectively. These numbers were calibrated using a single histogram with 90000 photons gathered from all pixels, where each pixel contributed to a single photon to the aggregate histogram. Figure 10 shows how the proposed imager compares to the MoG estimator for the sunflower  Fig. 9. Single-photon imaging setup for estimating multi-depth from partial occlusions at depth boundary pixels. Sample data from 38 photon detections is shown below for the pixel (94, 230) where partial occlusions occur. We show experimental results of multi-depth recovery for this scene using the MoG-based and our methods. and wall scene. The mean number of photon detections over all pixels was measured to be 26 for this experiment. In the figure, we observe that our proposed multi-depth imager successfully distinguishes the sunflower's petals from its leaves and the wall behind it, even though there exist mixed-depth pixels at boundaries and high background light plus dark counts. This is most visible in the side view, where we see that the noisy depth estimates present in the MoG results are absent when using our method. Similar to the previous imaging experiment, we were able to form a ground truth depth map of the sunflower by using the log-matched filtering solution on a larger dataset of 500 photons per pixel, after time-gating the photon arrivals at around 1.6 ns such that remaining photons only describe the sunflower placed at around 2.5 m, and not the wall behind it. Then, we computed that while the conventional MoG solution gave 46.5 cm of RMS depth error, ours gave 4.3 cm. Our framework thus gave an improvement in RMS depth error by a factor of 10.8, compared to the MoG method, for the task of imaging a scene with partially-occluded object. In this experiment of imaging a partially-occluding object, we required an average of 7 SPISTA iterations per pixel. The average per pixel processing time of Algorithm 2 was measured to be ∼0.0014 seconds and its total processing time for the spatially-resolved multi-depth reconstruction was ∼2 minutes. The average per pixel processing time of the MoG method was measured to be ∼0.0057 seconds and its total processing time was ∼8.5 minutes.

Conclusion and future work
In this paper, we presented a robust framework for reconstruction of multi-depth profiles of a scene using a single-photon detector. Our novel imaging method accurately models the single- photon detection statistics from multiple reflectors in the scene, while exploiting the fact that multi-depth profiles can be expressed as sparse signals. Using our signal model the multi-depth estimation problem is a sparse deconvolution problem. We designed an algorithm inspired by ISTA that reaches the globally optimal solution of the convex relaxation of the sparse deconvolution problem with high computational efficiency, demonstrated by the sub-second per-pixel processing time in our simulations and experiments. Using both simulations and experiments for scenes including partially-reflecting and partially-occluding objects, we demonstrated that our imaging framework outperforms the existing MoG-based method for multi-depth estimation in the presence of background light and dark counts. Unlike the parameter-free MoG-based multi-depth imaging method, our framework introduces a number of free parameters, such as the regularization parameter. For our experiments, we used the heuristic of choosing the parameters based on the calibrated background level. In practice, when the signal-to-background ratio varies over multiple imaging experiments, one can employ cross-validation techniques to learn the scalar parameters from multiple experiments [34].
For future work, it is of interest to study how applying other post-processing methods, such as 3D point cloud filtering, can improve multi-depth recovery performance. Also, optoelectronic techniques, such as range gating and narrowband optical filtering, can be incorporated to reject background counts at the data acquisition level to have a more accurate multi-depth reconstruction.