Fast back-projection for non-line of sight reconstruction

Recent works have demonstrated non-line of sight (NLOS) reconstruction by using the time-resolved signal from multiply scattered light. These works combine ultrafast imaging systems with computation, which back-projects the recorded space-time signal to build a probabilistic map of the hidden geometry. Unfortunately, this computation is slow, becoming a bottleneck as the imaging technology improves. In this work, we propose a new back-projection technique for NLOS reconstruction, which is up to a thousand times faster than previous work, with negligible quality loss.


Introduction
One of the core applications of time-resolved imaging (see e.g.[1,2] for recent surveys on the field) is the capability to robustly capture depth from a scene, by being able to track the time of arrival of photons.Geometry reconstruction techniques have significantly benefited from this, but in the last years the applicability of time-resolved imaging has gone beyond directly visible geometry, to include non-line of sight (NLOS) imaging [3].This technique allows reconstructing occluded objects by analyzing multiple-scattered light, even in the presence of turbid media [4][5][6].
The first prototype of this technology was demonstrated with a femtosecond laser and a streak camera [7].Further work explored alternative hardware setups, including correlation-based time-of-flight cameras [8], single photon avalanche diodes (SPAD) [9], laser-gated sensors [10], or even common DSLR cameras [11].These setups are cheaper and more portable, although at the cost of sacrificing time resolution.
Despite all these available hardware options, the reconstruction step is still a bottleneck, limiting the applicability of this technology in the wild.Different approaches have been proposed for reconstruction, either by solving a non-convex optimization on three-dimensional geometry [11,12] or a depth map [8], or by back-projecting the space-time captured image on a voxelized geometry representation [3,13].In both cases, the large amount of data being processed, together with the complexity of the reconstruction algorithms, impose computation times in the order of several hours.
Back-projection reconstruction builds a 3D probability map based on the recorded timeresolved image, encoding the time of arrival of photons reflected by the hidden geometry.It exploits the correlation between the time of arrival of a photon and the total distance traveled, back-projecting this distance to the reconstruction volume.This approach has several advantages over the alternative methods: First, it avoids the need to solve a non-convex optimization problem, which may fall in local minima.Second, the memory cost is significantly lower.And third, it is relatively robust to noisy captures, which is particularly important when aiming for low capturing times.However, current back-projection methods are very inefficient and redundant, which results in a poor scalability with the resolution of the voxelized scene.This imposes a hard limit of the quality of the reconstruction, regardless of the used imaging technology.
In this work we propose a new back-projection reconstruction method that yields a speedup factor of three orders of magnitude over previous NLOS reconstruction approaches, thus addressing the main pending issue limiting the applicability of recent approaches.We build on the key observation that the manifold of probably points that might reflect light towards an observed point at a certain time form an ellipsoid with poles at the light source and the observed point, and with radii the time of flight of photons (Figure 1).This allows us to build the probability map as the intersection of the ellipsoids of multiple space-time measurements.Posing the problem in this ellipsoid space allows to avoid redundancy, which reduces the theoretical complexity of the algorithm to just the number of space-time measurements.
More importantly, formulating the problem this way is equivalent to performing voxelization of the ellipsoids geometry, which is very suitable for modern GPUs.
This combination of lower computational complexity and an efficient hardware-accelerated implementation allows us to increase efficiency further: our techniques allows computing the probability maps of large datasets in the order of seconds with a minimum increase in error, while on the other hand allows significantly higher-resolution reconstructions with a negligible added cost.We demonstrate these capabilities by evaluating our method using both synthetic and real captured data, and comparing with existing approaches.In addition, we make our code publicly available at http://giga.cps.unizar.es/~ajarabo/download/Fast_BackProjection.zip.Fig. 1: Overview of our method.Left: Illustration of our reconstruction setup.A laser pulse is emitted towards a diffuse wall, creating a virtual point light s illuminating the occluded scene.The reflection of the occluded geometry travels back to the diffuse wall, which is imaged by the camera.The total propagation time from a hidden surface point x forms an ellipsoid with focal points at s and p. Right: The intersection of several of these ellipsoids defines a probability map for the occluded geometry, from which the reconstruction is performed with a speed-up factor of three orders of magnitude over previous approaches.

Back-projection for NLOS Reconstruction
The goal of NLOS reconstruction methods is to recover an unknown hidden scene X ∈ R 3 from the measurements on visible known geometry P ∈ R 3 .Such P is defined by a bijective map ψ : R 2 → R 3 from pixel values P imaged on the sensor.The scene is illuminated by a set of laser shots directed towards the visible geometry, hitting at positions S; an image is captured for each shot.Figure 1 shows an example of our setup, where s ∈ S is a virtual point light source created by the laser's reflection, and p ∈ P is the projection of pixel p in the known geometry P as ψ( p) = p, with inverse mapping ψ −1 (p) = p.It can be seen how the total propagation time from a hidden surface point x forms an ellipsoid with focal points at s and p.
In our particular context, the captured signal I is a time-resolved image indexed by the spatial and the temporal domains, measuring the light's time of arrival on each pixel.Physically, the signal formation model is where I( p, t) is the captured signal for pixel p at time instant t, L o (x → p, t − τ(x → p)) is the reflected radiance at x towards p, and τ(x → p)) = x − p c −1 , with c the speed of light in the medium.G and V are respectively the geometric attenuation and binary visibility function between x and p.
Assuming that all recorded radiance at x is due to third bounce reflection, we can remove the visibility term V. We also consider that all captured illumination comes from the virtual point light at s.With that in place, we transform Equation (1) to where f represents the BRDF of the material at x, G(s is the geometric attenuation of the full light path, and 1) and ( 2) assume that the time coordinate starts when s emits light, and that the sensor is placed at p. t s and t p are the times of flight from the laser to the virtual light s, and from the camera to the imaged surface point p, respectively.However, since these additional terms do not affect the shape of the ellipsoids, we omit them in the rest of the paper for clarity (although we do take them into account in our calculations).

NLOS by back-projection
Taking into account the signal formation model ( 2), we would like to recover the unknown geometry X from a set of spatio-temporal measurements I s ( p, t).
Back-projection methods [3,9] aim to recover a discrete approximation X of the unknown scene X. Intuitively, only points x ∈ X where there is an object will scatter light back towards P.However, since we only know the spatial and temporal domain of the signal, given a measurement I s ( p, t) there is an infinite number of candidate points x ∈ X that might have reflected light towards ψ( p) = p at instant t (see Figure 1).This means that the scene cannot be recovered from a single measurement, so instead multiple measurements need to be taken.
In essence, the main idea is to build a probabilistic model where, assuming Lambertian reflectances, the probability of point x being part of the occluded geometry is modeled as where the signal I s ( p, t) is corrected by an estimate of G(s → x → p), and δ is the delta function centered at zero.This probability map is later used to reconstruct the geometry, by using some operator over the voxelized representation, typically a three-dimensional Laplacian filter [3].
In practice, the most common straight forward form of computing the probability map consists of evaluating Equation ( 4) for each unknown point x ∈ X.This unfortunately is very expensive; the computational cost grows linearly with the number of pixels P, lights S, and voxels X, thus yielding O( P × S × X).In the following, we introduce our novel formulation, which significantly reduces the theoretical complexity of the computations, and allows for a very efficient implementation in commodity hardware.

Our Method
Our method builds on the observation that the set of points that can potentially contribute to I s ( p, t) from a given laser shot hitting at s is defined by the ellipsoid E(s, p, t) with focal points at s and p, and focal distance t • c (see Figure 1).This means that the more ellipsoids intersecting at point x, the higher the probability p(x) of having occluded geometry at x, since more light arriving at pixel p can have potentially been reflected at x.
Following this observation, we pose Equation ( 4) as an intersection of ellipsoids E(s, p, t) with a voxelized representation of the scene, as where t = τ(s → x → p), and isect(x, E(s, p, t)) is a binary function returning 1 if the ellipsoid E(s, p, t) intersects voxel x, and 0 otherwise.Note that here we are not correcting for the geometric attenuation as in Equation (4); we opt for this approach to avoid some possible numerical singularities when G(s → x → p) → 0, and to keep the maximum values bounded (which is important in our implementation to avoid overflow, see Appendix A).Given that the probability map is latter processed by the Laplacian filter to reconstruct geometry, and the geometry term has almost no effect between neighboring voxels, we find that this does not affect the final result.
Our new formulation would be slower than Equation (4) if computed naively, due to the need to calculate the intersection between the ellipsoid and point x.However, the intersection operand allows us to compute p(x) by testing the ellipsoid-voxel intersection directly.Instead of evaluating each voxel x against the captured data, we now simply project the captured data into the voxelized scene representation.
Posing the problem this way has two main benefits: On the one hand, it significantly reduces the complexity of the required computations by only testing on locations with signal information, resulting in a theoretical complexity order of O( P × S × T), with T the temporal resolution.Note that this complexity is independent on the voxelization resolution, and that since the captured signal is in general sparse, in practice the complexity is even lower.On the other hand, our new formulation is equivalent to performing a voxelization of the full set of ellipsoids defined by the combination of tuples p, s, t (Figure 1, right).Voxelization is a well-studied problem in computer graphics, which can be efficiently performed in commodity hardware [14].In the following we describe the details of our implementation.

Fast Back-projection
In order to perform the voxelization of the ellipsoids we rely on hardware-based voxelization [15], although other custom GPU-based voxelization methods could be used (e.g.[14,16]).
To achieve this, we need to overcome two main problems: i) we need to create a large number of ellipsoids, which can be very expensive and memory consuming; and ii) hardware rasterization does not work with parametric surfaces beyond triangles, so we need to tessellate the ellipsoids before rasterization; this aggravates the cost/memory problem.
We address the first point by taking advantage of instanced rendering, which is standard in most modern GPUs, and allows to re-render the same primitive while applying a different linear transformation to each instance.For this, we create a base sphere, which is later transformed in the target ellipsoid i by scaling, translating and rotating it, by using a standard linear sphere-to-ellipsoid transformation matrix M i .
Before rendering, we apply recursive geodesic tessellation to the base sphere.Ideally, we would like all triangles' sizes to be smaller than the voxel size, so that high curvatures are accurately handled.However, since the transformations required for each ellipsoid might vary, the final size of each rendered triangle is not known in advance; this means that we cannot set a particular tessellation level for all ellipsoids.We instead precompute a set of spheres O with different tessellation levels o, and dynamically choose what level o will be used as where α o is the approximation error per triangle at tessellation level o, Eig(M i ) are the eigenvalues of transformation matrix M i , and is an error threshold, which we set to the voxel size.We offer additional implementation details in the Appendix.

Cost Analysis
The computational cost of our method is linear with the number of ellipsoids O( P × S × T), which is independent on the resolution of the reconstruction space.Voxelizing each ellipsoid has a linear cost with the number of triangles per ellipsoid T times the cost of each triangle , which is approximately proportional to the number of voxels X containing the triangle.Thus, the cost for each ellipsoid O(E) is: As we will show later, setting (Equation ( 6)) so that X ≈ 1 gives the best trade-off between quality and cost.With this cost per ellipsoid, the total order of our technique is O( P × S ×T × Fig. 2: Reconstruction of a mannequin (see bottom right) captured with a streak camera and a femtosecond laser [7], reconstructed using traditional back-projection (left, in blue) and our method (right, in orange), which is two orders of magnitude faster while yielding similar quality.
The inset on the top-right shows the same reconstructed object under a different camera angle (inset from [3]).
which results in a speed-up with respect to traditional back-projection of O( T ).In the following section we demonstrate empirically the performance of our method.

Results
We evaluate our method by using datasets from three different sources (captured with femtophotography, a SPAD, and generated with transient rendering), each showing variable ranges of complexity, as well as different levels of signal quality.We compare our method against traditional back-projection [3], using Velten et al.'s optimized implementation.All our tests have been performed on an Intel i5-6500 @3.2GHz with 8 GB of RAM equipped with a GPU Nvidia GTX 1060.
Figure 2 shows the reconstruction of a hidden mannequin captured using a streak camera and a femtosecond laser [7].A voxel grid of resolution X = 162 3 is reconstructed from a set of S = 59 spatio-temporal measurements with different lighting position s.Each measurement has spatial resolution of P = 336 pixels, and temporal resolution of T = 512 frames.Although our method is mathematically equivalent to traditional back-projection (see Section 3), reconstructing this scene takes less than 20 seconds with our method, compared to more than half an hour with traditional back-projection.This represents a speed-up of 96.7x.
Figure 3 shows a similar reconstruction using recent data obtained using a single-photon avalanche diode (SPAD) [9].The SPAD is a more affordable transient imaging device, at the cost of lower quality measurements (i.e. less spatio-temporal resolution and higher levels of noise).A voxel grid of resolution X = 150 3 is reconstructed from a set of S = 185 temporal measurements with different lighting position s.Each measurement has a single pixel P = 1, and a temporal resolution of T = 14000 frames.Note that this is a difficult scenario for our method, given the high levels of noise in the captured signal, as well as the large temporal resolution; however, our method takes only 1.8s to reconstruct the scene, in comparison to 14.8s for traditional back-projection.

Analysis
We compare the performance of both our method and traditional back-projection [3] varying the three parameters involved in the reconstruction: The resolution of the output voxelized space X, the input spatial and temporal resolution P, and T. We use a synthetic scene (Figure 4) using the Fig. 3: Reconstruction of the scene captured using a SPAD [9] using traditional back-projection (left), and our method (right).The rightmost images show the capture setup (insets from [9]).We have used a different color to differentiate each object (blue: T; pink: big patch; green: small patch; yellow: noise from the camera).The quality of the reconstruction is almost identical.Note that the original dataset had a higher amount of camera noise, which we have filtered before reconstruction (the insets show the reconstruction with each method for the original unfiltered data).Our method takes only 1.8s to reconstruct the scene, in comparison to 14.8s for traditional back-projection.
Ours MSE X Ours Traditional 16 3 0.0103 0.0107 32 3 0.0022 0.0020 64 3 0.0006 0.0006 128 3 0.0003 0.0002 256 3 0.0002 0.0002 Traditional Fig. 4: Left: Comparison between the reconstruction quality computed with our method (top) and traditional back-projection (bottom), for an increasing number of voxels X. Right: MSE with respect to the ground truth.A comparison of the cost of both reconstructions can be found in Figure 5, left.time-resolved rendering framework by Jarabo et al. [17].The scene is similar to scenes captured in previous works [8,9].We opt for a synthetic scene to eliminate errors that depend on the camera and the capture setup, and obtain a clean ground truth solution.
Figure 5 shows a comparison of the cost of traditional back-projection and our method, varying one parameter and fixing the other two.In all cases our method is significantly more efficient for all practical resolutions.In particular, our method shows a much better scalability than the previous work with respect to the number of reconstructed voxels X, showing speed-ups of up to 10000x for large resolutions, while having similar convergence with respect to the number of input pixels P. On the other hand, our method scales linearly with the number of frames T, while the cost of traditional back-projection remains constant with respect to this parameter; however, note that even for large temporal resolutions (e.g. 10 4 frames) our method is still two orders of magnitude faster.In terms of error with respect to the ground truth solution, our algorithm scales similarly than traditional back-projection.Figure 4 shows the result for a varying reconstruction resolution X, including the numerical error with respect to the ground truth.Thus, despite the cost is significantly lower with our algorithm, the error introduced is always comparable with the previous work.
T P Fig. 5: Cost comparison between our method (solid orange) and traditional back-projection (dashed blue) for the synthetic scene shown in Figure 4.Each graph varies one reconstruction parameters while fixing the other two, namely the number of voxels X (right), input spatial resolution P (middle) and input temporal resolution T (right).We fix the parameters to a reconstruction space of X = 256 3 voxels, an input spatial resolution P = 128 pixels, and temporal resolution T = 1024 frames.For all reconstruction we use 128 measurements with different virtual light positions s.
An important additional parameter of our method is the quality of the ellipsoids' tessellation (Equation ( 6)).This determines the number of triangles used to represent the ellipsoid during the voxelization, and has an impact in both the cost and the quality of the reconstruction.Figure 6 shows the result with varying number of triangles, compared with the traditional method for a reconstructed space of X = 256 3 voxels.For tessellation levels leading to triangles of approximately the voxel size, our reconstruction achieves a reconstruction quality similar to traditional back-projection, more than a thousand times faster.This is further illustrated in Figure 7: We can see that our reconstruction quality is bounded by the reconstruction resolution, and at certain point increasing the number of triangles does not improve the result, while the cost increases linearly.In our tests we found that setting the triangle size to roughly the voxel size leads to a sweet-spot in terms of reconstruction quality and cost.

Discussion and Limitations
Our method takes advantage of the relatively sparse signal of time-resolved data to scale below the theoretic computational order (Section 3.2), by ignoring ellipsoids E(s, p, t) with I s ( p, t) ≈ 0. Thus, our worst-case scenario occurs for non-sparse signals.While this is not problematic in surface-based NLOS reconstruction, it becomes dominant in the presence of participating media.In these cases, our technique still performs significantly faster than traditional back-projection: For an input resolution of P = 128 2 and T = 1024 and a voxel resolution of X = 256 3 , traditional back-projection takes 475.78s, as opposed to 12.51s for for our method in the worst-case scenario.
Our algorithm shares some of the limitations from previous back-projection-based NLOS methods.In particular, our probabilistic model (Equation ( 4)) assumes Lambertian surfaces, and ignores the effect of albedo.Moreover, Equation ( 4) assumes that all light arriving the sensor is due to third-bounce reflection.While given the relative intensity between bounces is a sensible assumption, this might incur into errors in areas where higher-order scattering is dominant such as concavities.
In addition, while our technique scales very well with respect the reconstruction resolution, the maximum reconstruction resolution is limited by the number of ellipsoids that can be generated from the input.If the input resolution is too low for the reconstruction voxelization, then several gaps due to insufficient ellipsoid coverage might appear.This can be solved by simply upsampling the input signal so that more ellipsoids can be generated, therefore increasing the cost in our algorithm.In this case the result would be similar to upsampling a lower-resolution reconstructed voxelized space.This imposes a maximum boundary on the resolution that can be achieved from a given input capture resolution.

13167.44s
Traditional Fig. 6: Reconstruction results of our method with increasing ellipsoid tessellation quality (168, 656 and 2592 triangles per ellipsoid, respectively).The rightmost image shows the reconstruction result using using traditional back-projection.Our final reconstruction is comparable to traditional back-projection, computed with a speed-up of 1300x.Fig. 7: Left: MSE of our method with respect to a ground truth reconstruction for different voxel resolutions X, and for increasing number of triangles per ellipsoid.The minimum error is bounded by the reconstruction resolution, while increasing the number of triangles past a certain point does not improve the result.Right: Reconstruction time as a function of the number of triangles per ellipsoid used (reconstruction resolution of X = 256 3 ).In practice we found that a near-optimal result can be obtained with a tessellation level of roughly the size of the voxel.

Conclusions
Current reconstruction methods for NLOS reconstruction are still too slow for practical perfomance, becoming a key bottleneck of the process.In this work, we have presented a new method for NLOS reconstruction based on back-projection.Our work builds on the observation that the light incoming at a given pixel at instant t can have been reflected only from the points defined by an ellipsoid with poles at the observed point and the light source, allowing to define the probability of the NLOS geometry as the intersection of several of these ellipsoids, and exploits it by posing NLOS reconstruction as a voxelization problem.This allows us to reduce the computational cost significantly, and achieve speed-ups up to three orders of magnitude.Our technique can be efficiently implemented on the GPU.We hope our work will help current and future NLOS reconstruction techniques, enabling their use in practical real-world setups: With that aim, we have made our code public at http: //giga.cps.unizar.es/~ajarabo/download/Fast_BackProjection.zip.