Super-Resolution Remote Imaging using Time Encoded Remote Apertures

Imaging of scenes using light or other wave phenomena is subject to the diffraction limit. The spatial profile of a wave propagating between a scene and the imaging system is distorted by diffraction resulting in a loss of resolution that is proportional with traveled distance. We show here that it is possible to reconstruct sparse scenes from the temporal profile of the wave-front using only one spatial pixel or a spatial average. The temporal profile of the wave is not affected by diffraction yielding an imaging method that can in theory achieve wavelength scale resolution independent of distance from the scene.


Previous attempts to beat the diffraction limit of resolution
The diffraction limit naturally limits the spatial resolution of an image acquired by a conventional imaging system with a finite aperture. A wave traveling from the scene to the imaging system is subject to diffraction. The wave reaching the aperture of the imaging system has been degraded by diffraction. It can no longer be used to reconstruct an image of the object at the optimal resolution. Any spatial wavefront is subject to this diffraction that causes the well known linear drop in resolution with distance. The resulting linear dependence of resolution on imaging distance is known as the Rayleigh criterion.
Over the last several years, many works in different fields have been done to beat the diffraction limit. Astronomical imaging is area where super-resolution methods are extensively studied. Most of the super-resolution methods used in astronomy combine image capturing techniques with image post-processing. In early years co-addition methods, capturing successively multiple pictures within a short exposure time and applying shift-and-add algorithm [3] or linear reconstruction scheme [7] to combine the images, were actively used. Different regularization-based optimization methods were developed: Orieux et al. [18] applied successfully applied quadratic-regularized optimization to the "Spectral and Photometric Imaging Receiver" data. Jarret et al [16] proposed a method using a deconvolution technique, arXiv:2007.08667v1 [eess.IV] 16 Jul 2020 Figure 1: Comparison between conventional and TERA imaging system. a) Conventional imaging system. To infer the position of an object point within the focal plane, the imaging system needs to determine the difference in length between two rays. The figure shows a fundamental relation between distance, resolution, and aperture. The resolution of the system is determined by the Rayleigh's criterion. b) TERA imaging system. A pulsed laser illuminates the scene; the light reflected back from the scene is detected by the SPAD. The collected time response contains information about the distance between the two targets. The resolution of the system is determined by the time resolution of the SPAD. maximum correlation method, combined with a re-sampling kernel to super-resolve "Wide-field Infrared Survey Explorer" data.
Several image post processing methods exists in digital imaging applications to construct high-resolution images from low-resolution images. Nonuniform interpolation approach [4], frequency domain approach [24], regularized optimizations (deterministic [1,10,12,13] and stochastic [2,9,21,22]), iterative back-projection [14], adaptive filtering approach [6] are widely used super-resolution imaging methods. These super-resolution methods provide highresolution image outputs, mitigating the effects of diffraction. However, all method s still suffer from the fundamental limit: The resolution of super-resolved images still depends on the distance and aperture of the imaging system. The further the region of interest, the lower the image resolutions.
It is well known that a scattering medium around an object can be used to couple evanescent waves into the far field. This means that images of the embedded object with resolutions similar to the wavelength can be captured from large distances [19]. In existing approaches, this requires prior characterization of the scattering medium from the location of the object, making the method unsuitable for many practical applications. Here we show that in sparse scenes, scattering can be leveraged to determine the geometry of the entire scattering scene (object and medium) without prior information based on just a single pixel time response of the scene.

Our contribution
In this work, we propose a novel super-resolution imaging approach that reconstructs scenes from internally scattered light using their time response. This temporal signal is not subject to diffraction. We show that the resolution of our method is independent of distance and aperture size, and we can reconstruct the scene up to Euclidean congruence. We demonstrate the capabilities of the novel approach using simulated and experimental data. We consider the imaging system as part of the point cloud. p 0 is the location of the imaging system. p 1 , p 2 , ..., p n are point objects in the scene. c) Dashed green lines are pings, and solid red lines are loop paths. If the measurement ensemble β contains four pings and six loops then we say that the sub-graph is contained in the measurement ensemble d) If "x" are known points and the measurement ensemble β contains one ping and three loops, then we say that the measurement allows for trilateration.

Fundamentals of imaging
In this section, we overview the fundamental process of obtaining an image from an object and define terms to avoid confusion. Consider an object emitting or reflecting light and an imaging system with an aperture size d placed at a distance L from for the object (Figure 1). To infer the position of an object point O i within the scene, the imaging system redirects all rays from point O i such that they constructively interfere on exactly one point in the focal plane. In other words, the imaging system evaluates the length of a light ray as encoded in its phase. In this case, the resolution r of the imaging system is determined by the Rayleigh's diffraction limit: where λ is the wavelength of the emitted light and determines how accurately the imaging system can determine the length of each ray, d is a distance between focal plane of the imaging system and the object, and D is the diameter of the aperture of the imaging system. Another way of imaging is to measure the time of flight of intensity fluctuation. We call these intensity fluctuations as second-order coherence. One can use a short-pulsed laser to illuminate the object and lens-less time-of-flight detector to observe back-scattered light and reconstruct an image. The resolution of such an image is described by the Rayleigh criterion, except that the wavelength λ is replaced by the time resolution τ of the time-of-flight detector. We call it a transient Rayleigh criterion.

TERA imaging overview
Consider a situation when one can detect not only direct(first bounce) back-scattered lights but also the lights that bounced between the objects(second bounce). Such signal can be obtained by illuminating the scene with a short pulsed laser and detecting the returning light with a single time-of-flight detector. The temporal signal of these multi-bounce lights directly encodes the information about the distance between two object points in the scene. Figure 1b) shows the simplified example of how the information about the scene is encoded in the temporal signal. Suppose the scene contains two small point objects p 1 and p 2 with negligibly small diameter δ. The distance from p 1 and p 2 to the imaging system C are d 1 and d 2 correspondingly. The distance between objects p 1 and p 2 is d 3 . The scene is being illuminated by a pulsed laser and the returning light from the scene is measured by a time-of-flight detector. The time-resolved measurement or time response of the scene is illustrated on the left side of Figure 1b). The first impulse is due to the light directly reflected from object p 1 and it appears at t 1 = 2d1 c where c is speed of light. Similarly, the second impulse appearing at t 2 = 2d2 c , corresponds to the light reflected from object p 2 . The last impulse at t 3 = 2d3 c is due to the light traveled following 2 paths: (C → p 1 → p 2 → C) and (C → p 2 → p 1 → C). Clearly, these two paths have same travel distances, and thus in the time response they both appear at t 3 = d1+d2+d3 c . By using these 3 values, t 1 , t 2 , t 3 , one can find d1, d2 and d3. In other words, one can completely reconstruct relative positions of objects p 1 and p 2 up to Euclidean congruence. Now, we fix the distance d 3 and keep increasing the distances d 1 and d 2 . Naturally, at some point, the distance d 3 will be below the Rayleigh diffraction limit for the conventional imaging system, i.e. one can not visually separate two points (p 1 and p 2 ). However, for the proposed imaging system, the increase of the distances d 1 and d 2 results only in the time shift of the entire signal. The difference between 3 time tags, t 1 , t 2 , t 3 , is conserved. Therefore, one can still recover the scene(d 1 , d 2 and d 3 ) up to Euclidean congruence.
Naturally, the arising question is whether it is possible to reconstruct the scene when the number of objects is larger. Is the reconstruction unique, i.e., is it possible that two or more configurations generate the same time response? Bellow, we show that the reconstruction of a point-cloud with n objects is possible under some assumptions. In this section, we use the results of Gkioulekas et al. [8], which shows that when the measured time-response of the scene contains a sufficiently rich set of first and second bounces we can reconstruct the point-cloud up to Euclidean congruence. Here we follow the same notation as [8].
Now, consider a scenario where the scene consist of multiple point objects (Figure 2a). Let p 1 , .., p n ∈ R 3 be n point objects in the scene and p 0 ∈ R 3 be a position of our imaging system(laser and detectors are co-located). See Figure  2b. We call graph S = (p 1 , .., p n ) as a scene configuration and K = (p 0 , p 1 , .., p n ) as a total configuration. The scene is illuminated with a pulsed laser and the detector observes returning signal from the scene. We define light path α k = [p 0 , p 1 , .., p z , p 0 ](z ∈ N) as a finite sequence of points where light has traveled. Where z is some integer. Note that any sequence α starts from p 0 and ends at p 0 , since our light source and detectors are co-located. First bounce, ping, is a light path α k = [p 0 , p i , p 0 ] for i = 1, .., n. Second bounce, loop, is a light path α k = [p 0 , p i , p j , p 0 ] for i = j.
Let v k be a length of α k . Then, our measurement contains ensemble β = [v 1 , v 2 , ..., v k ], where v k are returned first or second bounce.
Next, let K 5 be a sub-graph of K that has 5 vertices including p 0 . If the measurement ensemble β contains all pings and triangles of K 5 that starts and ends at p 0 , then we say K 5 is contained within a β. See figure 2c. In this example, the measurement ensemble β contains four pings (α = [p 0 , p i , p 0 ] for i = 1, .., 4) and six loops (α = [p 0 , p i , p j , p 0 ] for i, j ∈ (1, 2, 3, 4) and i = j). Next, if the measurement ensemble β contains a ping (α = [p 0 , p 4 , p 0 ]) and three loops (α = [p 0 , p i , p 4 , p 0 ] for i = 1, .., 3) then we say that β allows for trilateration. See figure 2d.
Finally, we can apply the theorem from Gkioulekas et al. [8] which says that if K = (p 0 , p 1 , .., p n ) is unknown configuration and β is the measurement ensemble of K that allows for trilateration, then one can find trilateration based process to reconstruct K up to Euclidean congruence. Readers can find detailed mathematical proof of the statement in [8]. Note that there exists a generic(or trivial) point configuration K for which unique reconstruction is not possible. However, these special configurations occur very rarely [8].

Reconstruction algorithm
In this section, we design a trilateration based reconstruction algorithm for our imaging system. There exist algorithms(TRIBOND [5] and LIGA [17]) that reconstruct a point-cloud given list of unassigned edge measurements. TRIBOND is a deterministic algorithm that addresses the unassigned distance geometry problem(uDGP). It has been successfully applied to reconstruct the structure of molecules and nanoparticles using edge distance lists extracted from X-ray or neutron diffraction data. However, we can not apply TRIBOND algorithm directly to our data, because it contains not only first bounce but also second bounce. Moreover these bounces are unlabeled. It requires modification. Gkioulekas et al. [8] showed that it's possible to reconstruct point-cloud using unlabeled edge measurements. Here we modify the TRIBOND algorithm [5] with method described in [8] to process data acquired by our imaging system. The modified TRIBOND algorithm consists of two parts: core finding and adding a vertex.

Core finding
First step of the buildup algorithm is to find the core. The core of the embedding point-cloud is a over-constrained set of 5 points including the source. See figure 2c. The core can be broken down into 3 pieces: the base triangle and two tetrahedra. The base triangle (Figure 2c (p 0 , p 1 , p 3 )) is constructed using two first (Figure 2c dashed green line) and one second (Figure 2a solid red line) bounces. Each tetrahedra (Figure 2a (p 0 , p 1 , p 2 , p 3 ) and (p 0 , p 1 , p 4 , p 3 )) uses one first and two second bounces. Since the bounces are not labeled, one has to exhaustively search over all possible first and second bounce pairs to build the base triangle and tetrahedra. Finally, we loop through all remaining bounces to find one second bounce to test and check if it fits into bridge bond (p 2 , p 4 ) between vertices's of two tetrahedra. If a correct second bounce is found and bridge bond is satisfied, we found a core structure and we can move to "adding a vertex". If the bridge bond is not satisfied, we restart "core finding" and choose another base triangle and tetrahedra by exhaustive search. This process is repeated until the core structure is found.

Adding a vertex
After core is found, next step is to iteratively add a vertex to the core. First, we choose a random tetrahedron from the current structure. The next step is to search over all possible combination of four distances from the remaining distance pool. Three distances will form one first and two second bounce distances. The remaining distance is used to test the bridge bond for the chosen rigid substructure. For instance, in figure 2d (p 0 , p 1 , p 2 , p 3 ) is chosen as a rigid substructure and p 4 is added point using one first bounce [p 0 , p 4 , p 0 ] and two second bounces [p 0 , p 1 , p 4 , p 0 ] and [p 0 , p 2 , 0 4 , p 0 ]. The second bounce [p 0 , p 3 , p 4 ] is used for the bridge bond check.

Simulations
Here we test the modified TRIBOND algorithm using simulated data. We generate simulated data using a simplified version of the transient light transport renderer [11,15]. The renderer is successfully used in many times of flight applications [20,23]. Similar to the previous chapter, we consider the following scenario. Let p 1 , .., p n ∈ R 3 be n point objects in the scene and p 0 ∈ R 3 be a position of the TERA imaging system. Let d i be the distance from p i to p 0 for i = 1..n and d ij be the distance from p i to p j for all i = j. See figure 4. Generated time response contains a mixture of first and second bounce. Some of the signals can be missing because of occlusion or overlap. Figure 3 shows an Algorithm 1: Modified TRIBOND input : Distance list β output : Reconstructed point-cloud CoreFinding() while Core is not found do Choose D=3 entries(random) from the distance list β, and test all possible two first and one second bounce pairs to construct base triangle if a base triangle is found then Choose D=6 entries from the distance list β, and test all possible two first and four second bounce pairs to construct two tetrahedra with the base triangle if two tetrahedra are found then choose D=1 entry from the distance list β, and do bridge bond test. if bridge bond test is passed then Core is found end end end end AddVertex() while Distance list β is not empty do Choose(random) four points(tetrahedra) from the substructure including the origin point Choose D=3 entries from the distance list β, and test all possible one first and two second bounce pairs to construct tetrahedra with one side lie on the previously chosen tetrahedra if tetrahedra is found then Choose D=1 entry from the distance list β, and apply bridge bond test. if bridge bond test is satisfied then Add tetrahedra to the existing substructure and remove used entries from β end end end return point-cloud; example simulated time response. The scene contains n = 10 point objects and located far away from the imaging system. The figure shows the entire time response and, separately, first and second bounces. Blue triangles represent locations of the peaks in the signal. We transform these locations of the peaks into a distance list and use it as an input to the modified TRIBOND algorithm. On the right side of Fig.3 result of reconstruction is shown. The reconstruction matches the original point-cloud configuration up to Euclidean congruence. In other words, all pair-wise distances were recovered; however, the rotation and transpose of the point-cloud are still remaining unknown. Figure 7 shows Statistical analysis of scene recoverability using multiple circular patches with different sizes.

Proof of concept Experiments
In this section, we demonstrate the performance of the algorithm using experimental data. The experimental setup is shown in the figure 4a. The imaging system contains an ultra-fast laser that emits laser pulses with 50[ps] width and 10MHz repetition rate. An example of the laser illumination area is shown in figure 4b. The wavelength of the laser is set to 532 [nm]. The returning light is collected by a single-photon avalanche diode (SPAD), which has 20 micron diameter active-area. At wavelength 532[nm], it has a photon detection efficiency of 30%. The total effective temporal impulse response of our imaging system is 80 [ps]. Note that we don't use any lenses in front of the SPAD. . See figure 5a-c. The first column shows the actual scene picture. The second column is a simulated picture of the scene of using a camera that has an aperture size of 20 microns. The third column is a reconstruction of the scene using acquired data. Finally, fourth column shows the actual data. In the reconstruction, one can see that the two patches clearly are separated. Next, we test the method using targets(stars, plane, circular patch) with different shapes and materials(wooden star). See figure 5d,e. The shape of the peak is slightly different from the circular patches, but one   The number of spheres varies from 5 to 20 and the sphere diameter from 1 to 8 centimeters. The plot is generated from simulated data using the realistic parameters of detector and photon noise and assuming a 10 W illumination pulse from the laser.
can still find three peaks in the data. Lastly, in figure 6 reconstruction of 3 circular patches are shown. In the data, there are three first bounces, and three second bounces peaks.

Conclusion
In this paper, we present a novel imaging method that does not depend on the fundamental diffraction limit. The method makes use of hardware and algorithm to break the resolution limit under certain conditions of the scene (point-cloud assumption of the scene). In conventional imaging systems, the fundamental diffraction limit is governed by the size of the aperture, wave-length, and distance to the target. With fixed wave-length and distance, our proposed method does not depend on the size of the aperture, but instead, depends on the time resolution of the imaging system. The method is robust to change in surface materials. Currently, the main limitation is that the method is only applicable to the point-cloud scenes. Such sparse scenes commonly occur in aerial or space imaging scenarios. The paper gives base theory and introduces a method to use multiply scattered light (second bounce). As a next step, we are exploring possibilities of using multiply scattered light for continuous surfaces. As well as using machine learning to do the classification of objects below resolution limit using multiply scattered light.