Super-resolution single-photon imaging at 8.2 kilometers

Single-photon light detection and ranging (LiDAR), offering single-photon sensitivity and picosecond time resolution, has been widely adopted for active imaging applications. Long-range active imaging is a great challenge, because the spatial resolution degrades significantly with the imaging range due to the diffraction limit of the optics, and only weak echo signal photons can return but mixed with a strong background noise. Here we propose and demonstrate a photon-efficient LiDAR approach that can achieve sub-Rayleigh resolution imaging over long ranges. This approach exploits fine sub-pixel scanning and a deconvolution algorithm tailored to this long-range application. Using this approach, we experimentally demonstrated active three-dimensional (3D) single-photon imaging by recognizing different postures of a mannequin model at a stand-off distance of 8.2 km in both daylight and night. The observed spatial (transversal) resolution is about 5.5 cm at 8.2 km, which is about twice of the system's resolution. This also beats the optical system's Rayleigh criterion. The results are valuable for geosciences and target recognition over long ranges.


Introduction
Single-photon detection has become a well-established technique for the detection of weak optical signals. By exploiting this technique, single-photon light detection and ranging (LiDAR) offers excellent sensitivity, low noise and high time resolution [1]. Particularly, it can be used to remotely acquire three-dimensional (3D) shapes by making precise measurements of timeof-flight information, which has found applications in several scenarios, including geosciences, architecture, and defense.
Long-range and high-resolution active imaging is highly demanding for widespread applications, such as object detection, recognition and identification. When the imaging distance comes to tens of kilometers or more, very few photons can return to the detection system. Single-photon LiDAR, sensitive to the echo signal level as weak as a single photon, becomes an outstanding candidate. Tremendous efforts have been devoted to the developments of single-photon LiDAR for long-range active imaging [2][3][4][5][6][7][8][9]. Single-photon 3D imaging at up to 45-km range has been reported lately [10]. Furthermore, computational imaging algorithms have seen remarkable progress to process the single-photon data efficiently [11]. High-quality 3D structure has been demonstrated in the laboratory environment by an active imager detecting only one photon per pixel (PPP), based on the approaches of first-photon imaging [12], pseudo-array [13,14], single-photon camera [15], unmixing signal/noise [16] and machine learning [17]. These algorithms have the potential to improve the imaging range and quality significantly.
In long-range imaging, an important feature is the spatial (transversal) resolution. For a standard imaging system, the resolution is normally described as the angle determined by the diffraction limit, i.e., the aperture of the imaging system. This is also applied to single-photon LiDAR. An optimized scheme in single-photon LiDAR is to match the FoV of the detector and the point spread function (PSF) resulting from the transmission effect in the source-to-scene path (e.g., matching the diffraction limit and the aberration of transmission system) [3,7,10]. Nonetheless, when the distance is in kilometers-long ranges, the spatial resolution decreases severely due to the divergence of the laser beam or the FoV of the receiver. Only matching the FoV and the PSF is not possible to increase the imaging resolution. To improve the resolution, an alternative approach is the fine sub-pixel scanning [18][19][20][21]. The fine sub-pixel scanning precisely shifts the imager below the pixel scale to capture a series of low-resolution images and produces an image with higher resolution by combining multiple low-resolution images based on computational approaches. It can overcome the inherent spatial resolution limitation of the imaging system [22,23]. To obtain different looks at the same scene, some relative scene motions must exist from frame sequences. In LiDAR system, setting the inter-pixel scanning space much smaller than the size of the receiver FoV can realize the fine sub-pixel scanning.
In this work, we demonstrate an effective super-resolution method to enhance the resolution of the long-range image captured by the single-photon LiDAR. Our method includes a sub-pixel scanning scheme in hardware and a matched 3D deconvolutional algorithm in software. The implementation of a high-efficiency, low-noise single-photon LiDAR system operating at the telecom wavelength of 1550 nm is presented. The performance of our super-resolution method is verified by both numerical simulations and outdoor experiments. In simulations, we show that our proposed method has the capability of achieving sub-Rayleigh resolution under light levels as low as ∼1 photon per pixel (PPP). In the outdoor experiments, we achieved super-resolution single-photon 3D imaging at long ranges up to 8.2 km in an urban environment. The experimental results demonstrate an adequate resolution of ∼5.5 cm to distinguish different postures of a mannequin model at a stand-off distance of 8.2 km, while conventional approaches fail to do so due to the limited resolution. The achieved resolution beats about twice of the system's resolution of ∼11.1 cm (or angle resolution of ∼13.5 µrad).

Photon-efficient sub-pixel scanning approach
The resolution of a standard imaging system is limited by its diffraction limit, which is mainly determined by the aperture of the system. In general, to balance the resolution and the detection efficiency, the field of view (FoV) of a single detection pixel is typically set to match this diffraction limit (Airy disk diameter). For single-photon LiDARs, an efficient way is to match the pixel size of the receiver with the point spread function (PSF) of the transmitter. In this case, the angle resolution of a LiDAR system ∆θ can be described by, ∆θ is determined by the operating wavelength λ, the diameter D of the circular aperture for the aberration-free lens and the focal length f . Here 2.44 is the coefficient of the Airy pattern. Taking our single-photon LiDAR system as an example, the operating wavelength is λ = 1550 nm and the diameter of the object lens is D = 279 mm; the resolution of the system is ∆θ = 13.5 µrad. At a distance of 8.2 km, this corresponds to the spatial resolution of ∼11.1 cm.
In long-range imaging, as shown in Fig. 1(a), due to the divergence of the FoV, even a small FoV that matches with the diffraction limit will turn into a large patch projected on the target, which deteriorates the resolution. This becomes notable for the imaging distance of kilometers long. To overcome this challenge, we propose to use the sub-pixel scanning method customized for single-photon LiDAR. Sub-pixel scanning scheme was initially proposed for ordinary digital cameras [19,20], where the imager is shifted at sub-pixel size scale to capture a series of low-resolution images. Then, an image with higher resolution can be computed from those low- resolution images. The sub-pixel displacements between these low-resolution images will offer frequency aliasing restraint, which is the essence of super-resolution reconstruction [19,20]. The scheme has also been extended to single-pixel imaging [24] to improve the signal-to-background ratio (SBR) [21]. In our coaxial single-photon LiDAR design, we set the FoV of the receiver slightly smaller than the divergence angle of the transmitting laser. This setting balances the need for high SBR and high collection efficiency, which has been widely adopted in long-range single-photon LiDAR systems [3,7,10]. To realize sub-pixel scanning, unlike the standard point-by-point scanning scheme [see Fig. 1(b)], we set the inter-pixel scanning space smaller than the size of the receiver FoV. As an example, Fig. 1(c) illustrates the inter-pixel spacing of 1/8 FoV. The inter-pixel shift is performed in both x and y directions [see Fig. 1(d)]. After all pixels are scanned, a high-resolution image data will be computed from the combination of (8×8) frames of low-resolution images. Note that the sub-Rayleigh resolution information offered by the sub-pixel scanning can be retrieved by a specific computational algorithm.
Another challenge in long range condition is the exceedingly weak echo signal. Recently, to deal with such extremely low-light levels, some photon-efficient algorithms were proposed [12][13][14][15][16]. However, most of them were only tested in the laboratory at a short distance. When it comes to the long-range situation, the divergence of the laser beam will lead to the problem of multiple returns for each pixel [10,[25][26][27][28]. Previous algorithms can not retrieve the super-resolution information from the sub-pixel scanning data. Here we propose the 3D deconvolutional algorithm to solve the problem of multiple returns and to compute the high-resolution image from the sub-pixel scanning data with low signal levels at ∼1 PPP (see below).

3D deconvolution algorithm
To cooperate with the sub-pixel scanning approach, we develop an algorithm to perform the super-resolution photon-efficient imaging over long ranges. The advantages of our algorithm can be summarized in two aspects.
1. We adopt a convolution forward model to take the fine scanning process into account and consider the issue of "multiple returns per pixel" into consideration.
2. We develop a 3D deconvolution method to retrieve the sub-pixel resolution information acquired from the fine scanning.

Forward model
In our single-photon LiDAR system, we use a periodically pulsed laser to illuminate the target scene in a raster-scanned manner. The waveform of a single pulse is denoted by w(t) with the full width at half maximum (FWHM) of T p , and the repetition period is T r . Now, we suppose the laser beam and the receiver are aimed at a scanning angle (θ x ,θ y ). Considering the divergence of the laser beam and the aperture of the receiver, a spatial kernel g xy is adopted to describe the spatial intensity distribution of the laser, and the FoV of the receiver projected on the scene. Based on the theory of the linear light transmission, the photon flux rate function F(t; θ x , θ y ) for the scanning angle (θ x ,θ y ) can be given as follows, for t ∈ [0, T r ), where (r(θ x ,θ y ), d(θ x ,θ y )) is the (reflectivity, depth) pair for the patch (θ x ,θ y ) on the scene; c is the speed of light; b denotes the background noise, and a temporal kernel g t describes the distribution of the system jitter. Here, we take the whole system jitter g t into account rather than the waveform of the laser pulse w(t).
Specifically, the divergence of our transceiver system determines the FoV and the size of g xy . In our experiment, g xy is set to be a standard 2D Gaussian distribution, and its FWHM is set to be the size of the FoV (22.3 µrad); g t is set to be a standard 1D Gaussian distribution, and its FWHM is set to be 1 ns, which is equal to the system's timing jitter.
In practice, the continuous equation above has a discrete representation. We construct a 3D matrix RD whose (i, j)th element is a vector with only one nonzero entry to describe the (reflectivity, depth) pair for the target scene. The value and the index of this entry represent the reflectivity and the depth of the scene respectively. Also, let a spatiotemporal kernel g be the outer product of h xy and h t , and B denotes the background noise matrix. Combining the inhomogeneous Poisson photon-detection processing [29], the photon histogram matrix Y can be written as, where * denotes the convolution operator.

Reconstruction algorithm
After taking the "multiple returns per pixel" into consideration in the forward model, we design a 3D deconvolutional algorithm to compute the sub-Rayleigh resolution information. To get the fine estimation of RD from the raw data Y acquired from single-photon LiDAR, we treat this inverse problem as a single optimization problem and developed a deconvolutional convex optimization algorithm based on the forward model to solve it. Let L RD (RD; Y, g, B) be the negative log-likelihood function of the RD derived from Eq. (3). The inverse regularized convex problem can be described as, Here, the constraint RD i, j,k ≥ 0 comes from the non-negativity of the reflectivity, and the smoothing term uses the total variation (TV) constrains. It is worth mentioning that our reconstruction framework is not restricted to a particular choice of the regularizer β.
Our 3D deconvolutional program, employing sequential quadratic approximations to the log-likelihood objective function L RD , is modified from the SPIRAL-TAP solver [30]. One main modification is that our program is performed under the 3D spatiotemporal domain to match the 3D convolutional operator g, while the original SPIRAL-TAP solver was applied to 2D domain only. Here the core parameter in our 3D deconvolutional program is the spatiotemporal kernel g. To cooperate with the sub-pixel scanning scheme, the spatial size of g needs to be adjusted according to the size of the inter-pixel spacing. For example, suppose we set the inter-pixel spacing to the size of 1/2 FoV, the spatial size of g is determined to be 3×3. Generally, if we set the inter-pixel spacing to 1/(2n) FoV, the spatial size of g should be (2n+1)×(2n+1) for n is a positive integer.

Numerical simulations
We provide numerical simulations to evaluate our sub-pixel scanning scheme. As shown in Fig. 2, we choose a resolution chart of size 120×128 as our target scene. This chart consists of 6 squares of different sizes, each of which contains three bars and two spaces. In the largest square, both the bars and spaces are 6 pixels wide; in the smallest square, the features are 1 pixel wide. For simplicity, we set the background noise B to zero, i.e., only taking the Poisson noise and the system jitter into consideration. We simulated the FoV of two different sizes, 5×5 and 15×15, to illustrate the FoV for short and long distance. From Fig. 2, the results in the first column show that with the resolution of the image captured by the conventional FoV-by-FoV scanning method degrades with the expansion of the FoV. In the second column, we sub-pixel scan the scene with the smallest scanning step (shifting the FoV pixel by pixel on the chart), and compute the data with the conventional pixel-wise maximum likelihood (ML) method. In the last column, we compute the data with our 3D deconvolutional algorithm. Clearly, with our algorithm, the resolution of the reconstructed image has a substantial boost.
To realize the high resolution in Fig. 2, a large number of signal photons are required. This is difficult to meet for practical long-range LiDAR system. In practice, only weak echo signal photons can return but mixed with a strong background noise. In Fig. 3, we simulate the practical low-light conditions by setting the SBR ratio to 0.2 (within 100 ns time window similar to previous algorithms [12,15,16]), and the inter-pixel spacing to 1/8 FoV. We choose a typical scene from the Middlebury dataset [31]. We simulated the results with the detected number of signal photons at 10, 5, 1 PPP, and compared the reconstruction results with state-of-the-art photon-efficient algorithms [15,16,27]. From the results shown in Fig. 3, it is clear that our 3D deconvolutional method has a much better performance even under low-light levels and practical SBR conditions. This simulation only takes the Poisson noise and system jitter into account. We simulated the FoV of two different sizes, 5×5 and 15×15, to illustrate the FoV for short and long distance. The first column shows the results without sub-pixel scanning. The second column shows the results with sub-pixel scanning and conventional pixelwise ML processing. The third column shows the results with sub-pixel scanning and our 3D deconvolutional processing. The step of the sub-pixel scanning is set to 1 pixel of the chart, i.e., the inter-pixel spacing is set to the size of the 1/4 FoV and 1/14 FoV respectively for the 5×5 FoV and the 15×15 FoV. From the results, we can get three inferences. First, when the imaging distance is becoming farther, the size of the FoV is becoming larger, and the resolution of the image becomes worse, as shown in the first column. Second, with the sub-pixel scanning scheme, the resolution becomes better, as shown in the second column. Third, our 3D deconvolutional algorithm substantially outperforms pixelwise ML.   Both transmitter and receiver path pass through the same 28x expander coaxially which is the combination of the telescope (f = 2800 mm) and eyepiece (f = 100 mm). With this configuration, the receiver's FoV is slightly smaller than transmitter divergence. In addition, a standard camera (f=700 mm) was paraxially mounted on the telescope to provide a convenient direction and alignment aid for long distances.

Experimental setup
Our experimental setup is shown in Fig. 4. A summary of the system parameters is listed in Table 1. The optical transceiver system incorporates a commercial Cassegrain telescope with 279 mm aperture. We assembled optical components on a custom-built aluminum platform integrated with the telescope tube. The imaging system used a fiber laser operating at the wavelength of 1550 nm, which generates a 0.5 ns duration pulse at the repetition rate of 100 kHz. The 1550nm operating wavelength is used with the benefits of eye safety, reduction of the solar background and low atmospheric loss. The telecom brand components are readily available. The Commercial telescopes are typically coated with a 1550 nm coating to minimize the loss and back-reflections from the internal surfaces of the telescope. This is important in the case of the coaxial scanning LiDAR system with the amplified spontaneous emission (ASE) noise present. Furthermore, in particular, we have adopted an acousto-optical modulator (AOM) at the light source to filter the ASE component in time domain. The AOM acts as a high-speed switch. The emitted pulsed light passes through the AOM with the transmissivity of about 60% , and the ASE light is isolated after the pulse in every repetition period. The ASE noise is almost eliminated, and after the moderation, the maximum transmitting power is about 120 mW.
The transceiver system was coaxial, allowing the area illuminated by the beam and the field of view (FoV) to remain matched while scanning. Precise sub-pixel scanning is implemented by a closed-loop piezo tip-tilt platform in both x and y axial directions. This coplanar dual-axis scanning scheme offers a capability of high-precision angle scanning. The arbitrary function generator (AFG) providing the scanning signal is set to precise voltage stepwise changes. A laser beam came out from a collimator and passed through a perforated mirror before expanded and transmitted from the telescope with a divergence angle of about 35 µrad. The returned photons reflected by the perforated mirror and passing through two wavelength filters (including a 1500-nm long-pass filter and a 9-nm bandpass filter), are collected by a focal lens into a filter based on multimode fiber (1.3 nm bandpass). Finally, the photons were detected by an InGaAs/InP single-photon avalanche diode detector (SPAD) [32]. The sensitive area of SPAD is a circle with a diameter of 25µm. When working in free-running mode, the dark count is ∼ 4.5 counts/(s · µm 2 ). In our experiment, the detector is enabled only 40% of the time in each detection cycle, and the dark count is about 880 counts/s. In the transmitter, we employ a single mode fiber (SMF) and a fiber collimator with focal length f = 11 mm. Meanwhile, in the receiver path, we use a longer coupling lens with f = 100 mm. We employ a fiber filter (FF) based on multi-mode fiber (MMF) with a core diameter of 62.5 um before the detector. Both transmitter's and receiver's path coaxially pass through the same 28x expander which is the combination of the telescope (f = 2800 mm) and eyepiece (f = 100 mm). With this configuration, the receiver's FoV is slightly smaller than the divergence of the transmitter. The FoV of our transceiver system is set near the diffraction limit, which is about 22.3 µrad.
This is equivalent to a spatial resolution of ∼18 cm at a stand-off distance of 8.2 km. To beat the resolution limit, we perform an inter-pixel scanning. We set the inter-pixel scan spacing 1/8 receiver FoV (2.8 µrad). After scanning at 128×128 points (pixels), a high-resolution image data containing (8 ×8) sub-scanning shifts is produced. As shown in Fig. 5(a), we tested and verified the capability of our super-resolution photonefficient LiDAR system based on outdoor experiments in an urban environment of Shanghai city. Our aim is to recognize the postures of a mannequin model at a stand-off distance of 8.2 km in both daylight and night. Before data acquisition, a photograph of the target was taken with a visible-band astronomical camera in daylight [see Fig. 5 turbulence in Shanghai. Fig. 5(c) shows a visible-band image taken by a commercial camera from a nearby building of a few meters. In experiment, our single-photon LiDAR system operates with the maximum laser power of 120 mW and the acquisition time of 10 ms per pixel. According to the calibration of the signal and background noise in the acquired data, the number of signal photons is ∼1 to 6 PPP and the SBR is ∼0.16 to 0.24 (defined within 100-ns gate window similar to ref. [16]). We successfully captured the fine depth maps of the mannequin postures in size of (128 × 128) pixels over 8.2 km in all time. The whole system jitter is measured to be 1 ns which is equivalent to 15 cm depth uncertainty.

Experiment results
The imaging results taken at night were shown in Fig. 5, including five different postures. The first column shows the ground-truth photos of the mannequin postures taken in the target room by a commercial camera. The second column shows the imaging results taken by the single-photon LiDAR without the sub-pixel scanning scheme. Basically, nothing can be outlined with such low resolutions. The third column shows the imaging results taken by the single-photon LiDAR with the sub-pixel scanning scheme and the pixelwise ML processing. It is obvious that with the finer scanning and the increase of the number of pixels, the resolution of the image becomes better. But, most of the postures still can not be distinguished. The last column shows our computed results on the sub-pixel scanning data with the 3D deconvolutional algorithm. This confirmed that the sub-pixel scanning scheme, together with the proposed reconstruction framework can substantially improve the imaging resolutions even with low signal photons on the level of ∼1 PPP. Different postures of the human-size mannequin can be clearly recognized and the head and arms can be seen from the depth imaging map. More importantly, the size of the arm of the mannequin model is ∼5.5 cm. This resolution beats about twice of the system's resolution of ∼11.1 cm (or angle resolution of 13.5 µrad).
We also performed the experiment in daylight and captured some other postures of the mannequin model. These results are exhibited in Fig. 6. In contrast to the visible-band image in Fig. 5(b), our approach can successfully recognize different postures of the mannequin model over 8.2-km urban environment in daylight. These results demonstrate our system's all-time working ability. Certainly, due to the atmospheric influence such as turbulence in daylight, it is difficult for our LiDAR system to reach the highest resolution determined by the scanning spacing. Nevertheless, our sub-pixel scanning scheme greatly enhances the spatial resolution of the image over long ranges.

Discussion
We have proposed a super-resolution method for long-range single-photon LiDAR, including a sub-pixel scanning approach and a 3D deconvolutional algorithm. The superior performance of our method has been numerically and experimentally demonstrated. In experiment, depth profiles of the postures of human-size mannequin were clearly obtained at a stand-off distance of 8.2 km. We beat the diffraction limit of our LiDAR system, and achieved sub-Rayleigh resolution over kilometers range. The high-resolution results of different mannequin postures prove the effectiveness and practicability of our method. Additionally, the results captured in daylight and night show our system's adaptability for all-time applications. Overall, the high-resolution imaging under low light levels show the potential for target recognition and identification over long ranges.