Wavelength-time coding for multispectral 3D imaging using single-photon LiDAR

: Single-photon multispectral light detection and ranging (LiDAR) approaches have emerged as a route to color reconstruction and enhanced target identiﬁcation in photon-starved imaging scenarios. In this paper, we present a three-dimensional imaging system based on a time-of-ﬂight approach which is capable of simultaneous multispectral measurements using only one single-photon detector. Unlike other techniques, this approach does not require a wavelength router in the receiver channel. By observing multiple wavelengths at each spatial location, or per pixel (four discrete visible wavelengths are used in this work), we can obtain a single waveform with wavelength-to-time mapped peaks. The time-mapped peaks are created by the known chromatic group delay dispersion in the laser source’s optical ﬁber, resulting in temporal separations between these peaks being in the region of 200 to 1000 ps, in this case. A multispectral single waveform algorithm was proposed to ﬁt these multiple peaked LiDAR waveforms, and then reconstruct the color (spectral response) and depth proﬁles for the entire image. To the best of our knowledge, this is the ﬁrst dedicated computational method operating in the photon-starved regime capable of discriminating multiple peaks associated with diﬀerent wavelengths in a single pixel waveform and reconstructing spectral responses and depth.


Introduction
Multispectral imaging techniques are widely used in biological and medical sciences [1] and remote sensing [2] to produce data of (X, Y, I(λ)), where X and Y are two spatial dimensions, and I(λ) is wavelength-dependent intensity. When multispectral imaging is combined with light detection and ranging (LiDAR) techniques, it is possible to associate the spectrallydependent intensity I(λ) with a 3D coordinate (X, Y, Z) where Z denotes depth or range. Recently, time-of-flight LiDAR based multispectral imaging systems have been used effectively in geoscience-based surveying, mapping, and classification [3,4]. These systems also provide data for image processing and computer vision communities [5][6][7][8] and are used to achieve improved reconstruction performance and classification of target scenes by combining the spatial and spectral information contained in the 4-dimensional data cubes. Almost all LiDAR based multispectral imaging systems require a spatial form of wavelength routing in their receivers to demultiplex the multiple operational wavelengths [9][10][11]. These wavelength routers are typically constructed with optical volume gratings [9], fiber gratings [10] or a series of narrow-band optical filters [11]. In these systems multiple discrete detectors or detector arrays are required to simultaneously monitor the demultiplexed multiple spectral channels [9,11,12].
In this paper, we have used a wavelength-to-time mapping approach (as opposed to a spatial form of wavelength routing), previously used for spectrometry [13][14][15], to construct a multispectral raster scanning time-of-flight imaging system. Our imager employed a pulsed supercontinuum laser source combined with an acousto-optic tunable filter (AOTF) unit which was used to select a set of up to four discrete wavelengths from the supercontinuum source. The chromatic group delay dispersion in the supercontinuum laser fiber [16] meant that when the laser was triggered, the optical pulse corresponding to each of the wavelengths in the chosen set was emitted at a slightly different time. These inter-pulse separations were in the region of 200 to 1000 ps and this meant that the individual optical pulses (one for each of the wavelengths in the set) were easily distinguished by the time-resolved detection scheme used. This enabled the use of just one optimized single-photon detector in the receiver channel and did not require a spatial type of wavelength demultiplexing.
Similar to the multispectral canopy LiDAR system described in [9], the multispectral imager used in this work utilized the time-correlated single-photon counting (TCSPC) technique. This LiDAR technique offers shot-noise-limited detection and excellent depth resolution [17]. The TCSPC technique is well-suited to long-range depth imaging [18][19][20][21], as well as low photon flux imaging, where minimizing photo-structural damage [22] is necessary or when used with the low-signature targets [23,24]. Compared to the multispectral computational imagers which employ a single-pixel detector described in [25,26], our multispectral imager is capable of time-of-flight depth profiling at the single-photon level. Distinct wavelengths of interest are simultaneously observed at each pixel to generate a histogram of photon counts versus time, which presents a single waveform with wavelength-to-time mapped peaks. A number of multispectral imaging methods, such as those described in [6,27], acquire data for each operational wavelength sequentially for each pixel in the target, resulting in relatively long acquisition times. The method proposed in this paper performs multispectral imaging where all the selected operational wavelengths are emitted and the data acquired simultaneously (using a single detector), thereby reducing acquisition times (when compared to single-pixel detector systems) and simplifying the receiver hardware. Moreover, the proposed method can be less susceptible to changes in ambient (background) illumination and/or target movement, especially if used in conjunction with a single-photon detector array.

Imaging set-up
A schematic of our imaging system is shown in Fig. 1. A pulsed supercontinuum laser source (SuperK EXTREME EXW-12, NKT Photonics, Denmark) used in conjunction with an AOTF, which offered wavelength tunability, provided the active illumination for the measurements. The AOTF is capable of selecting a maximum of eight discrete wavelengths simultaneously. For the work presented in this paper, in order to have moderate time delays between neighboring wavelength channels, four wavelengths (i.e. λ = 473, 532, 589 and 640 nm) were selected using the AOTF. The photon data acquired using these visible wavelengths can be used for the RGB color (and an additional yellow channel) reconstruction of target scenes. In addition, these four wavelength channels were emitted with a suitable time delay between each channel such that two different imaging strategies could be attempted, as described below in Sections 3 and 4. The pulsed light at a repetition rate of 19.5 MHz was delivered via a 5 µm diameter core polarizationmaintaining photonic crystal fiber (FD7-PM, NKT Photonics, Demark) to a custom-designed monostatic transceiver unit. Figure 2 shows the optical spectrum of the light selected by the AOTF, which was measured at the output end of the fiber using an optical spectrum analyzer (OSA). The spectral widths at full width at half maximum (FWHM) were 3.3, 3.6, 4.2 and 5.9 nm for the four wavelengths at λ 1 = 473 nm, λ 2 = 532 nm, λ 3 = 589 nm, and λ 4 = 640 nm, respectively. The transceiver contains two galvanometer mirrors to scan in both X and Y (transverse plane). A more detailed description of the transceiver unit can be found in [28]. An objective lens with a 100 mm focal length was set to f/8 for all measurements presented in this study. In the receiver, a thin-junction silicon single-photon avalanche diode (Si-SPAD) was fiber-coupled using a 50 µm diameter core. The thin-junction Si-SPAD used was a PDM series photon counting detector module manufactured by Micro Photon Devices (MPD), Italy. The photon detection efficiencies of the Si-SPAD are approximately 40%, 47%, 45% and 38% for the four wavelengths at λ 1 = 473 nm, λ 2 = 532 nm, λ 3 = 589 nm, and λ 4 = 640 nm, respectively [29]. In order to temporally filter the back-reflections from the outgoing pulse, the Si-SPAD was electrically gated for a duration of 30 ns in synchronization with the approximate expected photon return time. The laser source provided a triggering signal for the TCSPC data acquisition module (HydraHarp 400, PicoQuant GmbH, Germany), whose stop trigger was generated by the Si-SPAD. A colorful plastic Lego minifigure, approximately 40 mm tall, was chosen as a single-layered target for measurements. These measurements were carried out at a stand-off distance of approximately 1.8 meters from the imager. The scan area of the target scene was approximately 35 mm × 45 mm in X and Y. Figure 3 shows an example of a one-pixel histogram of photon counts versus time measured with a long acquisition time (i.e. approximately 4 seconds) by our system, which presents a typical single waveform with the wavelength-to-time mapped peaks for the four wavelengths (i.e. λ 1 = 473 nm, λ 2 = 532 nm, λ 3 = 589 nm, and λ 4 = 640 nm).

Characterization of single waveforms with wavelength-to-time mapped peaks
In order to obtain calibration measurements for our image reconstruction algorithm (described in Section 4), single waveforms corresponding to each of the operating wavelengths were characterized. This was done by measuring an instrumental temporal response taken on a uniform, Lambertian target (i.e. a Spectralon panel SRT-99-050, Labsphere, USA) in dark-room conditions. Each waveform was constructed from data measured using a similar power level (i.e. a detection level of approximately 110 k photon counts per second). A TCSPC data acquisition module with a time bin width of 2 ps was used to obtain photon timing data. Single wavelength histograms of normalized photon counts versus time for each of the four individual wavelengths (i.e. λ = 473, 532, 589 and 640 nm) can be seen in Fig. 4(a), which are denoted as 1 wavelength ×4. Figure 4(b) shows a histogram containing a single waveform comprised of four wavelengths, denoted as 4 wavelengths ×1. In terms of timing jitter and peak position, the four peaks in the histogram are well-matched to the corresponding peaks in the single wavelength histograms. The timing jitter described in terms of FWHM for the four wavelengths is summarized in Table 1. Fig. 2. The optical spectrum of the light selected by the AOTF was measured at the output end of the polarization-maintaining photonic crystal fiber using an optical spectrum analyzer (OSA). The bandwidths, at full width at half maximum (FWHM), were 3.3, 3.6, 4.2 and 5.9 nm for λ 1 = 473 nm, λ 2 = 532 nm, λ 3 = 589 nm, and λ 4 = 640 nm, respectively. It was confirmed that there was no other light emission in the operational spectral range of the silicon single-photon avalanche diode (Si-SPAD) detector (i.e. 400-1000 nm). Due to the chromatic group delay dispersion, which occurs within the laser source's optical fiber, time delays between neighboring peaks are characterized and compared in Table 1. As seen in Fig. 4(a), shorter wavelengths are more delayed than longer wavelengths. In addition, the laser allows us to select subsets of the four operational wavelengths. For example, we can choose two of the wavelengths to scan the target and then perform the next scan using the other two wavelengths. Specifically, as shown in Fig. 4(c), we considered two individual waveforms with dual-wavelength observation, i.e. λ = 473 and 589 nm (in black), and λ = 532 and 640 nm (in purple), which are denoted as 2 wavelengths ×2. This second strategy illustrates the scalability of the proposed computational method demonstrating its ability to process multiple waveforms associated with different wavelength configurations. With the use of the calibration results shown in Figs. 4(b) and 4(c) as references, example raw pixel photon data associated with the two strategies was fitted by our proposed model described in Section 4. Figure 5 presents examples of measurements fitted by our algorithm and illustrates the ability of the method to satisfactorily recover range and spectral information from single waveforms composed of wavelength-to-time mapped peaks.   5. Examples of real LiDAR-based photon data fitted by our proposed algorithm: topthe simultaneous four-wavelength based raw pixel data (in blue) and its final fit estimation (in red); bottom -the raw pixel data at λ = 473 and 589 nm (in orange) and its final fit estimation (in green), the raw pixel data at λ = 532 and 640 nm (in blue) and its final fit estimation (in red).

Computational method
The pixel-wise cross-correlation method, as a standard point-wise maximum likelihood method, is conventionally used due to its simplicity of implementation to process full waveforms with single return peaks in the vast majority of our previous work [5,19,28]. However, it cannot effectively identify multiple peaks from data such as the single waveforms with wavelength-to-time mapped peaks presented in this paper. For this reason, this section introduces the observation statistical model associated with multispectral LiDAR (MSL) returns for a single-layered object which will be used for depth and reflectivity restoration for sparse single-photon LiDAR data. A multispectral single waveform (MSSW) algorithm is proposed. We consider a set of LiDAR waveforms Y of dimension N × M × T, where N represents the number of pixels associated with a regular spatial sampling grid (in the transverse plane) and M is the number of waveforms used to reconstruct the scene. Here M is set to M = 1 (4 simultaneous wavelengths) or M = 2 (two sets of two spectral bands acquired sequentially). Moreover, T is the number of temporal (corresponding to range) bins. Let y n,m = [Y m ] n,t = [y n,m,1 , . . . , y n,m,T ] T be the LiDAR waveform obtained in the nth pixel using the mth group of B spectral bands. In the remainder of this section, we assume that the same number of bands B is acquired in each of the M groups, but different numbers could be used in each waveform. The element y n,m,t is the photon count within the tth bin of the mth waveform considered. Let d n be the position of an object surface at a given range from the sensor, whose spectral signature (composed of L = B × M reflectivity parameters observed using M waveforms) is denoted as λ n = [λ n,1,1 , . . . , λ n, B,1 , λ n,2,1 , . . . , λ n,B, M ] T = [λ n,1 , . . . , λ n, M ] T . Assuming that the ambient sources and dark photon counts cannot be neglected but are temporally stationary, each photon count y n,m,t is assumed to be drawn from the following Poisson distribution where g m,i (·) is the photon impulse response associated with the ith wavelength of the mth observed waveform and t n is the characteristic time-of-flight of photons emitted by a pulsed laser source and reaching the detector after being reflected from a target at range d n (d n and t n are linearly related in free-space propagation). Moreover, the impulse responses g m,i (·) i,m are assumed to be known as they can be estimated with arbitrary precision during the imaging system calibration. Note that b n,m models the sum of the temporally constant ambient sources and dark count levels and this parameter is assumed to be unknown. The problem addressed here consists of recovering, for each pixel, the spectral response of the scene λ n and its range d n (or equivalently t n ) using the set of M waveforms, each of which being associated with B different wavelengths. Estimating these parameters is challenging for several reasons. First, the observation model (or likelihood function) (1) is highly multimodal (in particular with respect to t n ), which makes maximum likelihood estimation difficult using optimization methods. Second, the problem becomes even more severely ill-posed when the number of detected photons is extremely low and when the background levels are significant. For instance, if a single photon is detected, it becomes impossible to associate it with one of the B wavelengths or with the background without additional information. To address this ill-posed problem, we adopt a Bayesian approach. As discussed in the remainder of this section, the Bayesian framework allows us to incorporate prior information in order to regularize the problem and reduce estimation uncertainty. A Markov chain Monte Carlo algorithm is then used to efficiently exploit the Bayesian model of interest.

Bayesian model
Assuming that the detected photon counts/noise realizations, conditioned on their mean in all pixels, spectral bands and time bins are mutually independent, the joint likelihood function of the observed data can be expressed as where Y = {Y m } m=1,..., M , B = b n,m and t is a vector of length N which gathers the target ranges. We propose to account for the potential spatial correlation between the target distances in neighboring pixels of the scene. As will be shown in Section 5, this enables a robust range estimation, in particular when the number of detected photons in each pixel is extremely low. Each target position is considered as a discrete variable and we define a prior model f (t|c) for the range profile which promotes spatial correlation between target ranges in neighboring pixels. We consider a total-variation (TV) based prior model [30,31] which promotes piece-wise constant range profiles (see [5,27] for more details about the prior model f (t|c)). The amount of correlation between adjacent pixels is controlled by the regularization parameter c which is automatically during the classification process. Similarly, we used TV-based prior models to account for the spatial correlation between the different intensity parameters. Precisely, for the target reflectivity profile associated with the spectral band (i, m) and denoted Λ i,m = λ n,i,m n , we define a prior model f (Λ i,m |γ i,m ) whose hyperparameter γ i,m controls the amount of smoothness of the target reflectivity profile. The target intensities are assumed to be discrete and are allowed to take an arbitrarily fixed finite number of values. Here we chose 60 values, uniformly distributed in (0, 1). Assuming that the target intensity profiles are a priori mutually independent yields where γ = γ i,m i,m . A similar prior model f (b m |δ m ), controlled by the smoothness parameter δ m , is used for the background level b m = b n,m n associated with each waveform. The background profiles are assumed to be a priori mutually independent, leading to with δ = [δ 1 , . . . , δ M ] T . Note that the amount of spatial correlation induced by the resulting models f (t|c), f (Λ|γ) and f (B|δ) is controlled by the parameters Φ = (c, γ, δ) which are automatically adjusted (in a similar fashion to the regularization parameters in [32]). From the joint likelihood (2) and the prior models f (t|c), f (Λ|γ) and f (B|δ) we can derive the joint posterior distribution of t, Λ and B given the observed waveforms Y and the value of the regularization parameters Φ. Using Bayes' theorem, and assuming prior independence between t, Λ and B, the joint posterior distribution associated with the proposed Bayesian model is given by

Estimation strategy
The posterior distribution (5) models our complete knowledge about the unknowns given the observed data and the prior information available. To perform joint depth and reflectivity estimation from the MSL data, we use the following three Bayesian estimators: 1) the marginal maximum a posteriori (MMAP) estimator of the target reflectivity parametersλ n,i,m = argmax f (b n,m |Y, Φ). In order to approximate these estimators of interest, we adopt a simulation approach and generate samples according to the joint posterior (5). More precisely, we use a Metropolis-within-Gibbs sampler to generate sequentially the unknown parameters from their conditional distributions and the samples are then used to approximate the Bayesian estimators of interest (after having discarded the first samples associated with the burn-in period of the sampler). The proposed sampling scheme presented in Algo. 1 is very similar to that used in [27] and is thus not discussed in detail here.
Step 5 of the algorithm uses a black and white checkerboard structure to update the range parameters via a 2-step block Gibbs sampler. Due to the computational cost required to sample perfectly from f (Λ|Y, t, B, Φ) and f (B|Y, t, Λ, Φ (k−1) ), steps 6 and 7 are performed using block Metropolis-Hastings updates where candidates are proposed from the prior distributions f (Λ|γ) and f (B|δ). The candidates are then accepted using standard Metropolis-Hastings accept/reject procedures. Note that due to the structure of the TV-based priors, several updates can be performed in parallel (i.e., parameters of two pixels which are not direct neighbors can be updated at the same time), in a similar fashion to the range parameters. Moreover, the intensity parameters associated with different waveforms (M > 1) can also be updated independently, since the conditional distributions of interest can be factorized as a product over the M waveforms. For instance, One of the main advantages of Monte Carlo methods is that they often allow estimating the appropriate amount of regularization from data. Indeed, there are several Bayesian strategies for selecting the value of the regularization parameters (c, γ, δ) in a fully automatic manner. Here (steps 9-11) we use the empirical Bayes technique proposed in [32], where the value of (c, γ, δ) is calculated by maximum marginal likelihood estimation. The interested reader is invited to consult [5,27] for further details about the implementation of such samplers. Sample t (k) from f (t|Y, Λ (k−1) , B (k−1) , Φ (k−1) ).

8:
if k ≤ N bi then 9: Update c (k) using the method proposed in [32]. 10: Update γ (k) using the method proposed in [32]. 11: Update δ (k) using the method proposed in [32]. 12: else 13: end if 15: end for 16: Compute the MMAP estimators by selecting the values of the parameters which have been generated with the highest frequency during the last N iter − N bi iterations.

Reconstruction results
The proposed MSSW algorithm is validated using real LiDAR-based photon data. Multiple spectral bands are observed simultaneously at each pixel, but our approach still performs satisfactory multispectral imaging with high resolution. In our comparison, we investigated two imaging strategies: 1) 4 wavelengths ×1 (see Fig. 4(b)), where ×1 denotes one observation with 4 wavelengths simultaneously per pixel; 2) 2 wavelengths ×2 (see Fig. 4(c)), where ×2 denotes two individual observations with 2 wavelengths per pixel. As shown in Figs. 6 and 7, the two strategies yield visually similar depth and intensity images when reconstructed using the MSSW method at different detection levels of average photon counts per pixel (i.e. 100, 10, 5 and 1) in dark conditions (i.e. the ambient sources were extremely weak and thus negligible). We used the depth values of three different flat uniform surfaces on the legs and base of the target (see Fig. 8(a)) in order to quantitatively evaluate the surface-to-surface resolution at various detection levels. As shown in Figs. 8(b) and 8(c), two surfaces separated by as little as 2 mm in the direction of laser beam propagation can be identified using our algorithm for both imaging strategies in dark conditions, even in the sparse photon regime. The acquisition time for different levels of average photon counts per pixel for both strategies is compared in Table 2. In order to complete the four wavelength measurement set, only one observation per spatial location was required for the first imaging strategy, but the second strategy requires two individual dual-wavelength observations. Given a similar illumination level for each observation, as shown in Table 2, the sum of the acquisition times of the two measurements per pixel for the second strategy is similar to the per-pixel acquisition time of the first strategy. Therefore, the first strategy reduced the optical power by approximately half while achieving as good performance as the second strategy. However, a similar performance can be obtained through a trade-off between optical power and per-pixel acquisition time. For example, the acquisition time required for the first strategy is only approximately half that of the second when using the same level of optical power.
In order to investigate the effect of ambient light illumination on the image estimations, we used a desk lamp (with a 25 watt incandescent light bulb) to provide a higher ambient background contribution of approximately 20 times. This resulted in a variation of the background levels across the target scene as shown in Fig. 9. This figure also shows that the background estimations from the two strategies are similar. In a similar fashion to the work presented in [27], we use the depth and reflectivity absolute errors (DAE and RAE, respectively) to quantify the performance of the two imaging strategies in terms of depth and reflectivity estimations respectively. We calculate the cumulative density functions (CDFs) of the DAE, which are used to quantify the percentage of locked-on pixels within a certain DAE. Figure 10 shows similar ranging performance for both strategies. The RAEs depicted in Fig. 11(a) show that the two strategies perform similarly in terms of reflectivity estimation, although the first strategy performs slightly better. As expected, Figs. 10 and 11 shows that the estimation performance generally degrades in the presence of significant ambient illumination, which affects the reflectivity estimation more than the range estimation.    obtained using waveforms at 4 wavelengths ×1 (blue curves, strategy #1) and 2 wavelengths ×2 (red curves, strategy #2). These are shown for no ambient background (top row) and with background (bottom row). In each case, we have shown the results for 100, 10, 5 and 1 average "useful" photons per pixel. Fig. 11. Mean reflectivity absolute errors (RAEs) obtained using waveforms at 4 wavelengths ×1 (red curves, strategy #1) and 2 wavelengths ×2 (blue curves, strategy #2) as a function of the average "useful" per-pixel photon counts (photons originally emitted by the laser source, and not events associated with background). The dash lines represent ± one standard deviation intervals. (a) the graph corresponds to RAE obtained as a function of average photon count per pixel without ambient sources, and (b) the graph with ambient sources.

Conclusion and outlook
We have demonstrated a new approach to multispectral three-dimensional single-photon imaging using simultaneous multispectral illumination and demultiplexing. The scanning system employs just one single-photon detector and does not require the use of wavelength spatial routing in the receiver channel. The chromatic group delay dispersion in the pulsed source's optical fiber meant that wavelength demultiplexing was achieved by mapping the wavelength to its staggered time delay in the detector's multispectral waveform. This imaging approach ensures reconstruction of the spectral response at each spatial location, using only one sensor for simultaneous multispectral acquisition. This approach might be less prone to fluctuations of background level and target movement, e.g. in an outdoor environment, since the wavelengths are acquired simultaneously. It is also possible to reduce the total acquisition time of the multispectral measurement by reducing the number of observations that are required.
A new multispectral single waveform method was used to estimate reflectivity images and depth profiles with millimeter scale depth resolution and uncertainty of a single-layered target, at detection levels as few as one photon per pixel, on average. The bespoke MSSW method was also used to process single-photon data in the presence of external ambient sources to mimic daylight conditions. We found our imaging framework is reasonably robust to high ambient light contributions.
Similar performance in terms of depth and reflectivity estimations were found for the two imaging strategies, so the choice of the strategy depends mainly on the system and wavelengths of interest. As our imaging framework is scalable, we can choose the first imaging strategy for applications with higher priority of data measurement time, provided that the selected operational wavelengths are sufficiently separated; the second imaging strategy can be used for applications where the maximum number of wavelength channels that can be selected simultaneously is less than the number of wavelengths of interest.
In order to further reduce measurement time and minimize changes in observation conditions, future work could adapt high-efficiency SPAD arrays with TCSPC functionality capable of time bin sizes in the order of 10 ps [33][34][35][36]. This could make our imaging framework promising for rapid single-photon multispectral depth imaging in an outdoor environment (e.g. underwater depth imaging [37][38][39]). This time-gated approach of TCSPC helps distinguish the time-resolved signal from the background noise, which has a reasonably constant power spectral density. For imaging in daylight conditions, solar radiation is likely to be the predominant source of background noise, which generally leads to a reduced signal-to-noise ratio. A combination of a narrower time gating window, improved spatial filtering, and the use of spectral filtering on the receive channel will be required in order to adapt our system for effective daylight operation. For the system presented in this paper, the effective receive aperture projected onto the target was larger than the projected illumination aperture -this configuration improves the tolerance to beam wander and distortion effects caused by atmospheric turbulence. On the other hand, the use of a receive aperture which is smaller than the illumination aperture will improve the rejection of solar background, meaning that careful consideration must be given to this performance trade-off for daylight operation. The addition of spectral filtering for background noise rejection requires approaches capable of high efficiency light filtering at a number of discrete wavelengths. These challenges will be addressed in future work performed under daylight conditions. For applications using a larger number of wavelength channels, wavelength selection approaches capable of emitting a larger number of discrete lines will be required and additional dispersion approaches would be necessary to increase time delays between the wavelength channels.