Pose Estimation Using Time-resolved Inversion of Diffuse Light References and Links

We present a novel approach for evaluation of position and orientation of geometric shapes from scattered time-resolved data. Traditionally , imaging systems treat scattering as unwanted and are designed to mitigate the effects. Instead, we show here that scattering can be exploited by implementing a system based on a femtosecond laser and a streak camera. The result is accurate estimation of object pose, which is a fundamental tool in analysis of complex scenarios and plays an important role in our understanding of physical phenomena. Here, we experimentally show that for a given geometry, a single incident illumination point yields enough information for pose estimation and tracking after multiple scattering events. Our technique can be used for single-shot imaging behind walls or through turbid media. Visualization of a lost painting by Vincent van Gogh using synchrotron radiation based x-ray fluorescence elemental mapping, " Anal. Image transmission through an opaque material, " Nat. Correction of atmospheric distortion with an image-sharpening telescope, " J. Coherence-based imaging through turbid media by use of degenerate four-wave mixing in thin liquid-crystal films and photorefractives, " Appl. Two-dimensional imaging through diffusing media using 150-fs gated electronic holography techniques, " Opt. Transillumination imaging performance: a time-of-flight imaging system, " Med.ing with nature: A universal analog compressive imager using a multiply scattering medium, " arXiv:1309.0425 (2013). Recovering three dimensional shape around a corner using ultra-fast time-of-flight imaging, " Nat. Estimating wide-angle, spatially varying reflectance using time-resolved inversion of backscattered light, " J. Rotation-invariant target recognition in ladar range imagery using model matching approach, " Opt. 3D pose estimation of ground rigid target based on ladar range image, " Appl. Distortion-tolerant 3D recognition of occluded objects using computational integral imaging, " Opt. Fluorescence lifetime based tomography for turbid media, " Opt. reconstruction of absorption and scattering images by multichannel measurement of purely temporal data, " Opt. Single-molecule localization super-resolution microscopy: deeper and faster, " Microsc.


Space Time
Fig. 1.Light propagation (left) and synthesized streak camera image (right).Coherent light (A) hits the diffuser in one position, scattered behind the diffuser (B) hits the object, scattered once again from many positions (C), hits the diffuser a second time, and reaches the streak camera (D) which measures one spatial line over time, generating a 2D image.Window: 1ns × (≈)10 cm.oped to mitigate scattering effects.Usually, for strong scattering or volumetric imaging, these methods require multiple measurements, raster scanning, or time scanning.The consequent increase in acquisition time is often impractical for dynamic scenes of interest.
On the other hand, scattering is a natural "information mixer," with each scattered wave containing information about the entire scene [18].In this respect, scattering can be exploited (rather than mitigated) for object estimation using far fewer measurements than needed in existing techniques.This intuition is especially useful when only recovery of low-dimensional data is required.In such cases, the problem is simplified to estimation of several unknown parameters, such as position and orientation, and has many applications in pose estimation, 3D tracking, and target recognition.However, relatively little work has studied pose estimation through turbid layers.
Here, we demonstrate a computational method for estimating the six degrees of freedom (DOF), the 3D location and the orientation, of rigid objects hidden behind a diffuse layer.The method relies on non-invasive, time-resolved sensing of scattered light.We extend previous time-resolved imaging methods using diffuse reflections [19,20] or scattering [21] using a streak camera.Coupling prior information about the object geometry with the inherent mixing that scattering provides, we show that the acquisition ultimately requires only a single illumination point, rather than a raster scan.Our work here extends the technique by recognizing that a full image reconstruction is often unnecessary, i.e., that recovery of fewer unknowns (e.g., location and orientation) in a low-dimensional data system is sufficient for many applications.This allows for (1) a simplified image acquisition without raster scanning and (2) a different numerical optimization algorithm for robust recovery.
This scenario is applicable in several areas where geometry either is captured through other modalities or is known a priori.Therefore, one-shot localization has potential in locating individuals in hazardous environments (such as natural disasters or war zones), detecting objects remotely [22][23][24][25][26], tracking tumors or organs over time with with less radiation exposure, or improving inversion in time-resolved diffuse optical tomography or fluorescence tomography [27,28].
Schematically, the method is shown in Fig. diffuse layer, which scatters light (B) toward an object (C).Subsequently, light is scattered from the object back toward the diffuser, where it is then imaged (D) onto a time-resolved detector.
The orientation and location of the object determines the path length of the scattered light and hence the intensity time profile of each sensor pixel.Therefore, as the object moves and rotates, the time-resolved scattered light image will change accordingly.The question we answer is the following: given a single illumination point, is it possible to recover the orientation of the object and track its motion through space?By considering the six-dimensional space consisting of 3D translations and 3D rotations, our problem thus changes from object reconstruction to optimization over the space of possible transformations.The rest of the paper is organized as follows.Section 2 describes the forward light transport model, and Section 3 outlines the associated optimization method.The numerical algorithm is detailed in Section 4, follows by a presentation of results in Section 5 .Features and extensions of the method are discussed in Section 6. Section 7 concludes the paper.

Forward light model
We model scattering and propagation in the geometric approximation, assuming no occlusions, with the geometric factors shown in Fig. 2 and outlined in Table 1.A pulsed laser (L) is focused onto a thin diffuser (D) to a point d l , scatters towards a given object point w and back towards the diffuser at point d c .A one-dimensional, time-resolved sesnor observing the diffuser will record an image I l (d c ,t).For each laser position d l , we have [21] , where c is the speed of light, and g(d l , w, d c ) is a physical factor that depends on only the system geometry and the diffuser properties.f a (w) is the albedo of the object point w and I 0 is the the laser intensity.Angles (α, β , γ, δ , η, and ζ ) and distances (r l and r d ) are defined by the coordinates w, d l , and d c , as shown in Fig. 2. N(•) = exp (• − µ) 2 /σ 2 is the Gaussian scattering profile of the diffuser, for scattering angle and mean σ and µ, respectively.The delta function restricts all recorded light propagating along the path d l → w → d c to arrive at the same instant.Note here that, although we treat scattering in a single layer, multiple scattering primarily affects the signal-to-noise (SNR), rather than the time resolution, for the parameter space considered here [21].

Pose estimation
For a given experimental geometry, we model a rigid object M by a set of |M| uniformly spaced points w ∈ M ⊂ R 3 .The recorded image is determined by the location and orientation of M, which can be parametrized by six unknowns, Θ: where t * is the relative translation in the * direction, and θ , φ , ψ are the three Euler angles, measured about the center of mass C = 1 |M| ∑ w∈M w.We define the repositioning function L Θ : where R is the rotation matrix constructed from the Euler angles, and t = (t x ,t y ,t z ) is the translation vector.Similarly, Because not all points w are illuminated, we define the frontal projection π l (M) ⊆ M as the points in space that are illuminated directly by diffuse light from laser position d l .Every point p ∈ π l (M) generates a hyperbolic curve in the streak image, and their sum is the total expected image, I l (d c ,t).We define the image of a point w generated from laser position l as S l (w).
Our goal is to recover the pose that best agrees with the measured data.Hence, given a set of streak images I l and an a prior on the geometry M we search for the unknown pose Θ where ρ(A, B) is the "stress," a measure of agreement between two quantities A and B. Here, we used the L 1 metric between images, i.e., ρ(A, B) = ||A − B|| 1 .

Optimization scheme
Equation ( 5) resembles the Iterative Closest Points (ICP) paradigm [29,30], where rigid motion is estimated in alternating steps of correspondence search and energy minimization to find the optimal transformation of an estimated point set into a reference one.However, our system is not amenable to the ICP algorithm because of the added complexity due to the time dimension: each object point generates a space-time hyperbola (S l (w) above) [19].Thus, overlapping space-time curves and noise pose new challenges than those encountered in ICP.Consequently, we do not minimize the spatial location between matching points, but instead compare their nonlinear space-time functions.
To solve the Eq. ( 5), we use a stochastic gradient descent numerical optimization approach [31], in which each parameter is minimized separately.For each of the six parameters, we synthesize, according to our forward model, the expected streak images for two optional positions: one above and one below the current value.The step size is dynamic and is reduced for each subsequent iteration.In each step, 13 images, two options per unknown plus the identity, are synthesized, and the best option that reduces the stress is chosen.The procedure was terminated when the stress reached a threshold, or when the change in parameters was infinitesimal.In each iteration, π l should be re-evaluated, but in our experiments there was no need due to the narrow geometric structure of the objects in usage.The procedure is summarized in Alg. 1.

Algorithm 1: Localization algorithm
Input: Scene geomety.( A model M / Laser point locations L / inner parameters of the diffuser, camera, and laser ) Output: Θ = t x ,t y ,t z , θ , φ , ψ .Position and orientation of the model in space.

Pose estimation
We performed 2352 synthetic experiments for different spatial positions and orientations, and we evaluated the success rate and convergence quality.For an object, we chose three characters, "M," "I," and "T," all lying in the same plane.See Fig. 1.For this object we generated 7 different spatial translations and rotated each of the Euler angles 7 times in incremenets of 20 • , for a total of 49 poses.We evaluated each location and orientation using a single laser position, sets of three positions (per row and per column), and all 9 positions together, as can be seen in Table 2.The set of laser positions determines the summation over l in Eq. ( 5).We repeated each experiment three times with different values of SNR (100, 75, 50 using awgn MATLAB code), which corresponds to both spatial and temporal intensity noise.Table 2 shows the mean norm error between the correct data and the calculated one, after removing outliers that converged to local minima far (5 • , 1 cm in the X direction (depth), and 2 cm in YZ plane) from the real shift.Each column in the table represents a different translation, and each row a different orientation.Each cell contains the averaged mean error of 48 instances of the corresponding parameter estimate (the 16 different sets of laser positions times the 3

Laboratory experiments with streak camera
The experimental setup is photographed in Fig. 4 and sketched in Fig. 5.A mode-locked Titanium:Sapphire laser emits pulses of duration 50 fs at a repetition rate of 80 MHz with average power of 1 W. The pulse train is focused and directed via two computer-controlled galvo mirrors onto a 10 × 10 cm 2 holographic diffuser (Edmund Optics, 65-886), which has a scattering angle of approximately σ = 8.8 • .The objects behind the diffuser are Lambertian.The front side of the diffuser is imaged onto streak camera (Hamamatsu C5680), which records a horizontal line view of the diffuser with a time resolution of 2 ps.The galvo mirrors relocate the incident laser to different spatial positions, and a streak image is recorded for each one, though for most experiments, we use only one laser position.To improve SNR, the exposure time for each streak image is 10 s (so that each image averaged 10 s × 80 MHz = 8 • 10 8 pulses).The estimated pose is compared to the object's ground truth pose, which is measured with a FARO digitizer arm.We measure three points on the base of the shape and calculate the Euler angles and positions accordingly.The object volume was approximately 10 × 5 × 1 cm 3 (height width depth) and was located 18 cm from the diffuse layer.The diffuser itself was placed about 0.5 m from the streak camera.The space-time (x-t) pixel resolution of the streak camera is 672 × 512.
In addition to the galvo-controlled signal beam, a beamsplitter was used to create a separate calibration beam, which was focused onto the diffuser in the field of view of the camera.This allows a continual reference measurement to correct for any timing or intensity noise in the laser itself.Because a first-principles calculation of the optimal point source location is beyond the scope of the manuscript, we measure streak images from multiple incident laser spots.But, we show that only using one in the algorithm allows for successful pose estimation.We performed two separate experiments using a human-like figure.The first was dedicated to estimating angular changes and the second for estimating translations.For the angle estimation, we searched for 3 different angles rotating around the vertical axis, orthogonal to the table.We used 3 different laser positions and evaluated the angle with 7 different sets of measurements (3 individual laser positions, all pairs of laser positions, and all three laser positions).We compared our results with the object's ground truth measurement, which has a precision of approximately 3 mm.We rotated the object clock-wise by 16 and 32 degrees and tried to recover those angles.
For translation measurements, we used a similar procedure for 3 different locations of the object.We moved the object by 25 mm, both in-plane and out-of-plane.We searched for only the unknown parameters (one rotating angle for the first experiment, and three translation vectors (one for each location) for the second.
A summary of estimation of the angular parameter is shown in in Table 3.We used the angle measured in one orientation as a reference (R) and measured the angles of the second  and third orientations (A and B, respectively) relative to it.The reference orientation provides a calibration measurement to compensate our light transport model for small changes (≈ 1 • ).
The bottom row of laser positions (1,4,7 in Fig. 5) provided the best results, less than a 1 • error for the first comparison and less than a 2 • error for the second.This is within the error bounds of our true data measurements using the robotic arm.The top positions gave the worst results but still less than a 3 • error.Options 4 through 7 in the table show different variations of laser positions and error values.Because we do not know a priori which location of laser spot will provide the best results, we conclude that using all three is the best approach.Figure 6 shows the acquired streak images and steps 1, 5, 10, and the last step (from right to left) in the optimization procedure.In the first row, we see the convergence for position A, and in the second row for position B, both using only laser position 1 (see Fig. 5).We recorded 50 sample points of the object with the robotic arm.Because it is a coarse, noisy version of the model, it yields slight differences between the synthesized images and the measured ones. ,456) ,456) The translation results are shown in Table 3. Once more, we used one location as a reference R, and measured the other two (C and D) accordingly.The depth (X axis) is calculated with high accuracy, below 1 mm, which is within the error bound of our ground truth data.Movement parallel to the diffuser is harder to evaluate and differs for each laser position.With all three positions used, we achieved approximately 2 mm accuracy along the Y axis (orthogonal to the table) and between 0.5 cm and 1.5 cm along the Z axis (parallel to the table).In Fig. 7, we present the corresponding convergence images.The first row corresponds to position C, and the second row to position D, both given laser position 1 (see Fig. 5).

Discussion
Generally, our pose estimation method yields good results for a single incident laser position, though more positions provide angular diversity and improves the result.Interestingly, stronger/multiple scattering is a benefit, because the increasingly scattered light provides wider and more uniform illumination compared to that from a diffuse layer.Nevertheless, a firstprinciples derivation of the minimum number of streak images must be performed to understand the role of noise in the optimization problem.The generalization to thick turbid media is also possible, provided that time resolution of detector can overcome any resulting temporal blurring.For biological samples of moderate thickness, the streak camera resolution will suffice [21].This can be useful for dynamic scenes, such as tracking the flow of blood cells [32], and can be integrated with super-resolution fluorescence microscopy techniques in thick media [33], assuming the temporal profile still holds enough information.
Additionally, because no raster scanning is needed, the acquisition time is fast.Although we average over a pulse train for improved SNR, the method is amenable for a single incident pulse if, for example, the laser source is amplified, or the camera sensitivity is increased.Coupled with scaled versions of the method [35,37], this technique is thus readily applicable for longrange imaging after appropriate calibration [36].Interestingly, recovery is unique up to certain ambiguous symmetries.In Fig. 3 we see degragated accuracy near the symmetric axes (every 90 degrees).However, the symmetry can be easily broken by adding a second illumination source.Moreover, note that for tracking, we utilize prior information on the pose (calculated from the previous step) for convergence to the correct solution (cf. the second column from the left versus the second column from the right in Fig. 3 ).

Conclusions
In conclusion, the spatial location and angular orientation of objects hidden behind turbid layers can be estimated given a known prior of the shape and a single illumination point, with accuracy below 3 degrees, about 1mm in depth, and several millimeters in-plane, for the provided setup.Unlike previous work in the time-resolved inversion, we consider here a small number of incident illumination sources coupled to a mathematical optimization technique to retrieve the low-dimensional structure such as location and orientation of objects.In practice, even a single illumination point provides enough information, up to certain symmetries, for successful recovery.The method relies on time-resolved sensing and treats scattering as a benefit, rather than as a hindrance, and it can be integrated with other time-resolved methods of motion detection in cluttered environments.Future work includes: improving robustness to handle larger displacements, extending the method for integration in remote sensing applications, and considering volumetric scattering for biological applications.

Fig. 2 .
Fig. 2. Left: Pattern of light ray traveling from the laser L, through a diffuser D at point d l , striking an object at point w, bouncing towards the diffuser at point d c , and captured by the camera.Right: Spatial orientation of objects for the three Euler's angles.

2
Set sT = 30.Set sA = 30π 180 .(Chosen arbitrarily) 3 Calculate π l (L Θ k (M) for every l ∈ L by choosing only the visible points with relation to laser position l.

Fig. 4 .
Fig. 4. The human model behind the diffuser.A photograph of the experiment from the model's point of view.

Fig. 5 .
Fig. 5. Scene setup for a human-like model.We used several laser spots for different positions and orientations of the model in space.Bottom row: raw images captured by the streak camera (positions 1,2 and 3 from right to left).The top-left bright spot is a physical reference beam used for spatio-temporal alignment.Window: 1ns × (≈)10 cm.The source is split into two beams with a beamsplitter.The small focused spot in the upper-left corner of each measurement is the calibration beam, used to normalize the global intensity of the data and correct for any timing jitter.The second beam scatters through the diffuser toward the object, and is scanned across the diffuser with a galvo mirror to generate multiple streak images.

Fig. 7 .
Fig. 7. Convergence of the algorithm for laser position 1 (see Fig. 5).First row: translation C. Second row: translation D. From right to left: initial iteration, 5th iteration, 10th iteration, final generated model, and real captured image.A threshold was added to remove noise.Window: 1ns × (≈)10 cm.

Table 2 .
Depth (X), in planar (YZ), and angular convergence for the synthesized MIT banner given 2352 experiments.Each column in the table represent a different translation, and each row a different orientation.In each cell, we averaged the mean error of 48 evaluated distances or angles (16 different laser spots locations times 3 different SNR levels), and the percentage of experiments that converged to a minima near the correct solution (right member of each cell).Depth (X) can be recovered with high accuracy, and in-plane translations (YZ) can be recovered in most circumstances.Angular changes which have strong spatial shift (ψ for the MIT banner) are found with high accuracy, while other rotations are harder to evaluate.

Table 3 .
Translation (in mm) and angular (in degrees) of real experiments.Distances and angles are measured with reference to one spatial setup (R).While not mandatory, it provides an additional alignment.In the top table, we see that one laser position is sufficient for angle evaluation to within 1-2 • , and from the tables below, we learn that depth (X) can evaluated with high accuracy, with in-plane movements (YZ) are harder to find.