Adaptive pulsed laser line extraction for terrain reconstruction using a dynamic vision sensor

Mobile robots need to know the terrain in which they are moving for path planning and obstacle avoidance. This paper proposes the combination of a bio-inspired, redundancy-suppressing dynamic vision sensor (DVS) with a pulsed line laser to allow fast terrain reconstruction. A stable laser stripe extraction is achieved by exploiting the sensor's ability to capture the temporal dynamics in a scene. An adaptive temporal filter for the sensor output allows a reliable reconstruction of 3D terrain surfaces. Laser stripe extractions up to pulsing frequencies of 500 Hz were achieved using a line laser of 3 mW at a distance of 45 cm using an event-based algorithm that exploits the sparseness of the sensor output. As a proof of concept, unstructured rapid prototype terrain samples have been successfully reconstructed with an accuracy of 2 mm.


INTRODUCTION
Motion planning in mobile robots requires knowledge of the terrain structure in front of and underneath the robot; possible obstacles have to be detected and their size has to be evaluated. Especially legged robots need to know the terrain on which they are moving so that they can plan their steps accordingly. A variety of 3D scanners such as the Microsoft Kinect © (Palaniappa et al., 2011) or LIDAR (Yoshitaka et al., 2006;Raibert et al., 2008) devices can be used for this task but these sensors and their computational overhead typically consume on the order of several watts of power while having a sample rate limited to tens of Hertz. Passive vision systems partially overcome these limitations but they exhibit a limited spatial resolution because their terrain reconstruction is restricted to a small set of feature points (Weiss et al., 2010).
Many of the drawbacks in existing sensor setups (active as well as passive) arise from the fact that investigating visual scenes as a stroboscopic series of (depth) frames leads to redundant data that occupies communication and processing bandwidth and limits sample rates to the frame rate. If the redundant information is already suppressed at the sensor level and the sensor asynchronously reports its output, the output can be evaluated faster and at a lower computational cost. In this paper such a vision sensor, the so called dynamic vision sensor (DVS; Lichtsteiner et al., 2008) is combined with a pulsed line laser, forming an active sensor to reconstruct the terrain in front of the system while it is moved. This terrain reconstruction is based on a series of surface profiles based on the line laser pulses. The proposed algorithm allows extracting the laser stripe from the asynchronous temporal contrast events generated by the DVS using only the event timing so that the laser can be pulsed at arbitrary frequencies from below 1 Hz up to 500 Hz. The flexibility in choosing the pulsing frequencies allows fast and detailed surface reconstructions for fast robot motions as well as saving laser power for slow motions.

THE DYNAMIC VISION SENSOR (DVS)
The DVS used in this setup is inspired by the functionality of the retina and senses only changes in brightness (Lichtsteiner et al., 2008). Each pixel reports a change in log-illuminance larger than a given threshold by sending out an asynchronous address-event: if it becomes brighter it generates a so called "ON event," and if darker, it generates an "OFF event." The asynchronously generated address-events are communicated to a synchronous processing device by a complex programmable logic device (CPLD) which also transmits the time in microseconds at which the event occurred. Each event contains the pixel horizontal and vertical address (u,v), its polarity (ON/OFF) and the timestamp. After the event is registered, it is written into a FIFO buffer which is transferred through a high-speed USB 2.0 interface to the processing platform. Real-time computations on the processing platform operate on the basis of so called event packets which can contain a variable number of events but are delivered at a minimum frequency of 1 kHz. This approach of sensing a visual scene has the following advantages: 1. The absence of a global exposure time lets each pixel settle to its own operating point which leads to a dynamic range of more than 120 dB. 2. Because the pixels only respond to brightness changes, the output of the sensor is non-redundant. This leads to a decrease in processor load and therefore to a reduction in power consumption of the system. 3. The asynchronous readout allows a low latency of as little as 15 us. This latency allows to close control loops very quickly as demonstrated in Delbruck and Lichtsteiner (2007); Conradt et al. (2009);Ni et al. (2012). Figure 1 shows the speed of the DVS, which is capable of resolving fast movements such as a wheel spinning at 3000 rpm. 4. Since the events are timestamped as they occur (with a temporal resolution of 1 us), the output allows a detailed analysis of the dynamics in a scene or to process its output using temporal filters.
In the following, the output of the DVS is described as a set of events and each event Ev carries its u-and v-address, a timestamp and its polarity as a value of +1 if it is an ON event and a −1 for OFF events [with notation adapted from Ni et al. (2012)].
Ev (u, v, t where ln (I u,v ) denotes the change in illumination at the pixel with coordinates u,v since the last event. ON and OFF denote the event thresholds that must be crossed to trigger an event. These thresholds can be set independently which allows balancing the number of ON and OFF events. In addition to these visually triggered events, the DVS allows the injection of special, timestamped trigger events to the output stream by applying a pulse to a pin on the back of the sensor. These Et events are numbered in software so that they carry a pulse number and a timestamp: (2)

HARDWARE SETUP
As reviewed in Forest and Salvi (2002), there are several variations of combining a line laser and a camera to build a 3D scanner. Since it is intended to apply this scanner setup on a mobile robot that already has a motion model for the purpose of navigation, a mirror free, fixed geometry setup was chosen. As shown in Figure 2, a red line laser (Laser Components GmbH LC-LML-635) with a wavelength of 635 nm and an optical power of about 3 mW was mounted at a fixed distance above the DVS. (The laser power consumption was 135 mW.) The relative angle of the laser plane and the DVS was fixed. To run the terrain reconstruction, the system is moved over the terrain while the laser is pulsed at a frequency f p . Each pulse of the laser initiated the acquisition of a set of events for further analysis and laser stripe extraction. The background illumination level was a brightly-lit laboratory at approximately 500 lx. For the measurements described in the results section, the system was fixed and the terrain to scan was moved on an actuated sled on rails underneath it. This led to a straight-forward camera motion model controlled by the speed of the DC motor that pulled the sled toward the sensor system. The sled was fixed to rails which locked the system in one dimension and led to highly repeatable measurements. The DVS was equipped with a lens having a focal length of 10 mm and it was aimed at the terrain from a distance of 0.45 m. The laser module was placed at a distance of 55 mm from the sensor at an inclination angle α L of 8 • with respect to the principal axis of the DVS. The system observed the scene at an inclination angle α C of 39 • .
To enhance the signal to noise ratio, i.e., the percentage of events originating from the pulsed laser line, the sensor was equipped with an optical band pass filter (Edmund Optics NT65-167) centered at 636 nm. The filter has full width at half maximum (FWHM) of 10 nm and a transmittance of 85% in the pass band and less than 0.01% in the stop band (optical density 4.0).
To mark the laser pulses within the event stream, the event trigger pin on the back of the DVS was connected to the function generator triggering the laser.

CALIBRATION
To extract the laser stripe, i.e., the pixels whose events originate from the laser line, the sensor is calibrated based on the approach described in Siegwart (2011). The model was simplified by the following assumptions: 1. For the intrinsic camera model, rectangular pixels with orthogonal coordinates u,v are assumed. This leads to the following transformation from pixel coordinates to camera coordinates x C , y C , z C : where k denotes the inverse of the pixel size, f l the focal length in pixels, and u 0 , v 0 the center pixel coordinates. 2. For the extrinsic camera model it was assumed that the rail restricts the origin of the camera x C0 , y C0 , z C0 to a planar translation (by t y and t z ) within a plane spanned by the y-and z-axis of the world reference frame x R , y R , and z R as depicted in Figure 3. In the setup used for the measurement, the rotational degrees of freedom of the system were constrained so that the that the camera could only rotate (by α C ) around its x-axis which leads to following transformation from camera to world coordinates: The fact that the DVS does not produce any output for static scenes makes it difficult to find and align correspondences and therefore the typical checkerboard pattern could not be used for calibration. As an alternative, the laser was pulsed onto two striped blocks of different heights as depicted in Figure 4. The black stripes on the blocks absorb sufficient laser light to not excite any events in the DVS. This setup allows finding sufficient correspondence points between the real world coordinates and the pixel coordinates to solve the set of calibration equations (Equations 3-5). This procedure is done manually in Matlab but needs only to be done once.

LASER STRIPE EXTRACTION
The stripe extraction method is summarized in Figure 5. Most laser stripe extraction algorithms perform a simple columnwise maximum computation to find the peak in light intensity e.g., Robinson et al. (2003); Orghidan et al. (2006). Accordingly for the DVS the simplest approach to extract the laser stripe would be to accumulate all events after a laser pulse and find the column-wise maximum in activity. This approach performs poorly due to background activity: Even with the optical filter in place, contrast edges that move relative to the sensor also induce events which corrupt the signal to noise ratio. For a more robust laser strip extraction, spatial constraints could be introduced but this would restrict the generality of the approach (Usamentiaga et al., 2010). Instead the proposed approach exploits the highly resolved temporal information of the output of the DVS.
FIGURE 3 | The coordinate systems used along the scanning direction. y R , z R are the real world coordinates, y C , z C the ones of the camera. x L is the distance of the laser line plane perpendicular to n L from the camera origin. α C is the inclination angle of the sensor with respect to the horizontal plane and α L the laser inclination angle with respect to the camera. With the help of the laser trigger events Et n , the event stream can be sliced into a set of time windows W n each containing a set of events S n where n denotes the n'th trigger event. ON and OFF events are placed into separate sets (for simplicity only the formulas for the ON events are shown): The timing of the events is jittered by the asynchronous communication and is also dependent on the sensor's bias settings and light conditions. Our preliminary experiments showed that it is not sufficient to only accumulate the events in a fixed time window after the pulse. Instead a stable laser stripe extraction algorithm must adaptively collect relevant events. This adaptation is achieved by using of a temporal scoring function P which is continually updated as illustrated in Figure 6.
The scoring function is used as follows: Each event obtains a score s = P(Ev) depending only on its time relative to the last trigger. From these s a score map M n (Figure 5) is established where each pixel (u,v) of M n contains the sum of the scores of all the events with address (u,v) within the set S n [these subsets of S n are denoted as C n (u, v)]. In other words, M n is a 2D histogram of event scores. This score map tells us for each pixel how well-timed the events were with respect to the n'th trigger event, and it is computed by Equations 8-9: The scoring function P that assigns each event a score indicating how probable it is that it was caused by the laser pulse Et n is obtained by using another histogram-based approach. The rationale behind this approach is the following: All events that are caused by the laser pulse should be temporally correlated with it while noise events should show a uniform temporal distribution. In a histogram with binned relative times the events triggered by the laser pulse should form peaks. In the proposed algorithm, the histogram H n consists of k bins B n of width fk. For stability, H n is an average over m laser pulses. H n is constructed by Equations 10-12: FIGURE 5 | Schematic overview of the laser stripe extraction filter. At the arrival of each laser pulse the temporal histograms are used to adapt the scoring function P, and each event's score is calculated and mapped on the score maps. The maps are averaged and the laser stripe is extracted by selecting the maximum scoring pixel for each column, if it is above the threshold θ peak .
where f is the laser frequency, l is the bin index, k is the number of bins, D n (l) is a temporal bin of the set S n , B n (l) is a bin of the averaged histogram over the m and the histogram H n is the set of all bins B n . It is illustrated in Figure 6A.
To obtain the scoring function P, the H ON n and H OFF n histograms are normalized by the total number T of events in them. To penalize bins that have a count below the average i.e., bins that are dominated by the uniformly distributed noise, the average bin count T/k is subtracted from each bin. An event can have a negative score. This is the case if it is more probable that it is noise than signal. T n is computed from Equation 13: The n'th scoring function P n (illustrated in Figure 6B) is computed from Equation 14: To extract the laser stripe, the last o score maps are averaged and the maximum score s(u,v) and its y value are determined for each column. If the maximum value is above a threshold ϑ peak it is considered to be a laser stripe pixel. If the neighboring pixels are also above the threshold, a weighted average is applied among them to determine the center of the laser stripe. The positions of the laser stripe are then transformed into real world coordinates using Equations 3-5 and thus mapped as surface points.
The pseudo-code shown in Algorithm 1 illustrates how the algorithm is executed: Only on the arrival of a new laser trigger event, the histograms are averaged, the score maps are averaged to an average score map and the laser stripe is extracted. Otherwise, for each DVS event only its contribution to the current score map is computed, using the current scoring function. The laser stripe extraction and computation of the scoring function operate on different time scales. While the length o of the moving average for the scoring function is chosen as small as possible to ensure a low latency, the number of histograms m to be averaged is chosen as large as possible to obtain higher stability and dampen the effect of variable background activity.

Algorithm optimization
To reduce the memory consumption and the computational cost of this "frame-based" algorithm, the computations of the scoring function, the accumulation of evidence into a score map, and the search for the laser line columns were optimized to be event-based.
The average histogram changes only on a long time scale (depending on lighting conditions and sensor biasing) and this fact is exploited by only updating the averaged histogram every m'th pulse. The m histograms do not have to be memorized and each event only increases the bin count. The new score function is computed from the accumulated histogram by normalizing it only after the m'th pulse.
The score map computation is optimized by accumulating event scores for o laser pulses. Each event requires a lookup of its score and a sum into the score map. After each sum, if the new score value is higher than the previous maximum score for that column, then the new maximum score value and its location are stored for that column. This accumulation increases the latency by a factor of o, but is necessary in any case when the DVS events are not reliably generated by each pulse edge.
After the o laser pulses are accumulated, the search of the column wise maxima laser line pixels is based on the maximum values and their locations stored during accumulation. For each column, the weighted mean location of the peak is computed starting at the stored peak value and iterating over pixels up and down from the peak location until the score drops below the threshold value. This way, only a few pixels of the score map are inspected for each column.
The final step is to reset the accumulated score map and peak values to zero. This low-level memory reset is done by microprocessor logic hardware and is very fast.
Results of these optimizations are reported in Results.

PARAMETER SETTINGS
Because the DVS does analog computation at the pixel level, the behavior of the sensor depends on the sensor bias settings. These settings can be used to control parameters such as the temporal contrast cutoff frequency and the threshold levels. For the experiments described in the following, the bias settings were optimized to report small as well as fast changes. These settings lead to an increase in noise events which does not affect the performance because they are filtered out successfully with the algorithm described previously. Furthermore, the biases are set to produce a clear peak in the temporal histogram of the OFF events (Figure 6). The variation in the peak form for ON and OFF events is caused by the different detection circuits for the two polarities in the pixel (Lichtsteiner et al., 2008) and different starting illumination conditions before the pulse edges. The parameters for the algorithm are chosen heuristically: The bin size is fixed to 50 us, the scoring function average is taken over a sliding window size m = 1000 histograms, the stripe detection is set to average o = 3 probability maps, and the peak threshold for the line detection is chosen to be peak = 1.5.
Firstly, the performance of the stripe extraction algorithm was measured. Because the performance of the system is limited by the strength of the laser used, the capabilities of the DVS using a stronger laser were characterized to investigate the limits of the approach. Finally, a complex 3D terrain was used to assess the performance under more realistic conditions.

RESULTS
The laser stripe extraction results presented in the following were run in real-time as the open-source jAER-filter FilterLaserLine (jAER, 2007) on an Intel Core i7 975 @ 3.33 GHz Windows 7 × 64 platform using Java 1.7u45. The 3D reconstruction was run off-line in Matlab on the same platform.
Comparing the computational cost to process an event (measured in CPU time) between the frame-based and the event-based algorithm with o = 10 pulses showed an 1800% improvement from 900 to 50 ns per event. This improvement is a direct result of the sparse sensor output: For each laser line point update, only a few active pixels around the peak value in the score map column are considered, rather than the entire column. At the typical event rate of 500 keps observed in the terrain reconstruction example, using a laser pulse frequency of 500 Hz, a single core of this (powerful) PC is occupied 2.5% of its available processor time using the event-based algorithm. Turning off the scoring function histogram update further decreases compute time to an average of 30 ns/event, only 25 ns more than processing event packets with a "no operation" jAER filter that iterates over packets of DVS events without doing anything else.

EXTRACTION PERFORMANCE
To assess the line-detection performance of the stripe extraction algorithm, a ground truth was manually established for a scenario in which a plain block of uniform height was passed under the setup. The block was moved at about 2 cm/s to investigate the performance of the laser stripe extraction algorithm at different frequencies. In Table 1, the results of these measurements are displayed: "False positives" designates the ratio of events wrongly associated to the line over the total number of events. The performance of the algorithm drops at a frequency of 500 Hz and because the DVS should be capable of detecting temporal contrasts in the kHz regime, this was further investigated. For optimal algorithm performance, each pulse should at least excite one event per column. This is not the case for the line laser pulsed at 500 Hz because the pixel bandwidth at the laser intensity used is The line laser is not strong enough to perform well at frequencies above 200 Hz. limited to about this frequency. Therefore, not every pulse results in a DVS event, and so the laser stripe can only be found in a few columns which leads to a degradation of the reconstruction quality.
To explore how fast the system could go, another laser setup was used: A stronger point laser (4.75 mW, Class C) was pulsed using a mechanical shutter to avoid artifacts from the rise and fall time of the electronic driver. This point was recorded with the DVS to investigate whether it can elicit more at least one event per polarity and pulse at high frequencies. The measurements in Figure 7 show that even at frequencies exceeding 2 kHz sufficient events are triggered by the pulse. The mechanical shutter did not allow pulsing the laser faster than 2.1 kHz so the DVS might even go faster. The increase of events per pulse above 1.8 kHz is probably caused by resonances in the DVS photoreceptor circuits which facilitate the event generation. These findings indicate that a system using a sufficiently strong line laser should be capable of running at up to 2 kHz.

TERRAIN RECONSTRUCTION
As a proof of concept and as well for studying possible applications and shortcomings of the approach, an artificial terrain was designed with a CAD program and it was fabricated on a 3D printer (Figure 8). The sensor setup of Figure 2 was used together with the sled to capture data at a speed of 1.94 cm/s over this terrain using a laser pulse frequency of 200 Hz, translating in the t y direction (Equation 5). (This slow speed was a limitation of the DC motor driving the sled.) Figure 9 shows results of these measurements: Figure 9A shows the CAD model and Figure 9B shows the raw extracted line data after transformation through Equation 5 using the calibration parameters and the measured sled speed. The blind spots where the laser did not reach the surface and the higher sampling density on front surfaces are evident. These blind spots were filled by applying the MATLAB © function TriScatteredInterp on the sample points as shown in Figure 9C. Finally, Figure 9D shows the error between the reconstruction and model as explained in the next paragraph.
To quantify the error, the data was compared to the ground truth of the CAD model. However, the model and data lack alignment marks and therefore they were first aligned by hand using a global translation. Next, the alignment was refined using the iterative closest point algorithm (ICP; Besl and McKay, 1992), which slightly adjusted the global translation and rotation to minimize the summed absolute distance errors. Thirdly the closest 3D point of the model was determined for each point of the non-interpolated Figure 9B raw data and fourthly the distance to this model point was measured. The resulting accuracy i.e., the mean 3D distance between these two points in the 3D data is 1.7 ± 1.1 mm, i.e., the mean absolute distance between the sample and data points is 1.7 mm but the errors vary with a standard deviation of 1.1 mm. This accuracy represents ±0.25 pixel precision of measurement of the laser line given the geometry of the measurement setup. In the resampled, linearly interpolated data shown in Figure 9D, most of the error originates from the parts of the surface where the line laser is occluded by the surface, which are interpolated as flat surfaces, and in particular the bottoms of the valleys show the worst error, as could be expected.
An online movie showing the stripe extraction for the terrain reconstruction using a higher laser pulse frequency of 500 Hz is available (Adaptive filtering of DVS pulsed laser line response for terrain surface reconstruction, 2013). This video also shows various stages of the sensor output and laser line extraction. This recording is done at a sled speed of about 1 m/s using a free-falling sled on an incline, which was not limited by the DC motor speed. In this movie it is also clear that some parts of the terrain where the laser hits the surface at a glancing angle do not generate line data. The movie also shows that background DVS activity caused by image contrast is also effectively filtered out by the algorithm although at this high frequency many pixels do not generate events on each laser pulse.

DISCUSSION
In this paper the first application of a DVS as a sensing device for terrain reconstruction was demonstrated. An adaptive eventbased filtering algorithm for efficiently extracting the laser line position was proposed. The proposed application of DVSs in active sensor setups such as 3D scanners allows terrain reconstruction with high temporal resolution without the necessity of using a power-consuming high-speed camera and subsequent high frame rate processing or any moving parts. The event-based output of DVSs has the potential to reduce the computational load and thereby decreasing the latency and power consumption of such systems. The system benefits from the high dynamic range and the sparse output of the sensor as well as the highly resolved time information on the dynamics in a scene. With the proposed algorithm, temporal correlations between the pulsed stimulus and the recorded signal can be extracted as well as used as filtering criterion for the stripe extraction.
Further improvements to the system are necessary to realize the targeted integration to mobile robots. The Java and jAER overhead would have to be removed and the algorithm would have to be implemented on a lower level programming language (such as C) using the optimized event-based algorithm. A camera motion model and surface reconstruction would have to be integrated into the software and for portability of the system it would need to be embedded in a camera such as the eDVS (Conradt et al., 2009). Motion models could be obtained from 3D surface SLAM algorithms (Newcombe et al., 2011) and/or inertial measurement units (IMUs). The use of DVSs with a higher sensitivity (Serrano-Gotarredona and Linares-Barranco, 2013) would allow using weaker lasers to save power. Higher resolution sensors that include a static readout (Posch et al., 2011;Berner et al., 2013) would facilitate the calibration and increase the resolution. The use of a brighter line laser would allow higher laser pulsing frequencies, a wider sensing range as well as possible outdoor applications.
But despite its immature state, the proposed approach compares well to existing commercial depth sensing systems like the Microsoft Kinect © and a LIDAR optimized for mobile robots such as the SOKUIKI (comparison shown in Table 2). The system has a higher maximal sampling rate than the other sensors, a much lower average latency of 5 ms at a 200 Hz pulse rate, and it is more accurate at short distances. These features are crucial for motion planning and obstacle avoidance in fast moving robots. The latency of the proposed approach is, however, dependent on the reliability of the DVS pixel responses, so there is a tradeoff between latency and noise that has not yet been fully  (Besl and McKay, 1992).
This section of the reconstruction was chosen for display because in the surrounding area border effects were observed caused by the Gaussian profile of the laser line that reduced the DVS event rate to be too low to result in acceptable reconstruction.  (Viager, 2011) but spatial pattern used reduces to ∼ ±1 pixel in each direction (Andersen et al., 2012). c Khoshelham and Elberink, 2012. d (Livingston et al., 2012).
studied, and this tradeoff will also depend on other conditions such as background lighting and surface reflectance. On the downside, the system's spatial resolution is limited by the use of the first-generation DVS128 camera and the field of view for the proposed system is narrow. But these drawbacks are not fundamental and they can easily be improved (e.g., by using newer sensors, shorter lenses and stronger lasers). The limitation that the system does not deliver depth maps but surface profiles could be overcome by projecting sparse 2D light patterns instead of a laser line. The power consumption of 500 mW for the USB camera and laser does not include the power to process the events nor to reconstruct the surface but because the sensor system power consumption is comparably lower, the data processing will probably fit into the power budget of the other two approaches when embedded into a 32-bit ARM-based microcontroller, e.g., as in Conradt et al. (2009). In summary, this paper demonstrates the applicability of DVSs combined with pulsed line lasers to provide surface profile measurement with low latency and low computational cost, but integration onto mobile platforms will require further work.