A Novel Low-Cost Adaptive Scanner Concept for Mobile Robots

A fundamental problem in mobile robot applications is the need for accurate knowledge of the position of a vehicle for localizing itself and for avoiding obstacles in its path. In the search for a solution to this problem, researchers and engineers have developed different sensors, systems and techniques. Modern mobile robots relay information obtained from a variety of sensors and sophisticated data fusion algorithms. In this paper, a novel concept for a low-cost adaptive scanner based on a projected light pattern is proposed. The main advantage of the proposed system is its adaptivity, which enables the rapid scanning of the robot’s surroundings in search of obstacles and a more detailed scan of a single object to retrieve its surface configuration and perform some limited analyses. This paper addresses the concept behind such a scanner, where a proof-of-concept is achieved using an office DLP projector. During the measurements, the accuracy of the proposed system was tested on obstacles and objects with known configurations. The obtained results are presented and analyzed, and conclusions about the system’s performance and possible improvements are discussed.


Introduction 1234
A fundamental problem in mobile robot applications is the need for accurate knowledge of the position of a vehicle with respect to its surroundings (Borenstein, 1997;Kucsera, 2006).Obtaining accurate information about a robot's environment in a fast and reliable manner is an essential step in the development of successful navigation systems for robots.Current systems usually employ a variety of strategically placed sensors or are guided by an operator (Abu Dalhoum, 2008).Designing autonomous mobile robots requires the integration of many sensors and actuators, a task that requires many compromises (Aufrere, 2003;Fotiadis, 2013;Liu, 2012;Perrollaz, 2006).Thus, mobile robots are provided with one or more perception systems, whose sole function is to detect surrounding objects, to avoid possible collisions and to avoid situations of potential risk during navigation.A single sensor is unable to adequately and reliably capture all objects in a robot's vicinity.To overcome this problem, it is necessary to combine data from different sensors, a process known as sensor (data) fusion (Fotiadis, 2013;Perrollaz, 2006).
Mobile robots' sensors, according to their functionality, can be generally divided into two groups (each of which can be further subdivided):

•
Obstacle avoidance sensors, which sense dynamic or static obstacles in the robot's vicinity and allow the robot to avoid collisions.

•
Localization sensors, which collect data to determine an accurate position of the robot for navigational purposes.
Localization sensors that enable robot navigation are commonly divided into two subgroups: relative position and absolute position sensors (Borenstein, 1997).Odometer and internal navigating systems, which are the most common relative position measuring systems, are capable of measuring the displacements of mobile robots but are unable to detect surrounding obstacles.The same principle has been applied to other absolute measuring systems, such as active beacons (or GPS for outdoor use).These systems are solely used in limited environments, where an accurate map of an area and localization are sufficient for navigating mobile robots.
Mobile robots can also use landmarks (distinct features) for improving localization accuracy.
Obstacle detection is a primary requirement for any autonomous robot (Darms, 2009).Various types of sensors can be used for obstacle detection.Commonly used sensing devices for obstacle detection include contact sensors, infrared sensors, ultrasonic sensors, laser range finders and vision systems (Fayad, 2007;Sungbok, 2010).In contrast to contact sensors, ranging sensors require no physical contact with the object being detected.They allow a robot to detect an obstacle without having to come into contact with it.Ranging sensors that collect information from the surrounding world can be generally divided into two subgroups according to their operating principle (Kucsera, 2006).Time-offlight (TOF) sensors make use of the propagation speed of an emitted signal and measure the traveling time or phase shifts between transmitted and detected signals.The transmitted signal can be sound, light or radar waves.An ultrasonic sensor's basic principle is to transmit ultrasonic waves, generated by a piezo transducer, and to measure the time the signal takes to return to the receiver.The main disadvantage of ultrasonic sensors is that while they can determine if there is an obstacle in an area, they cannot provide any additional information about the detected obstacle.A set of ultrasonic sensors are commonly installed at regular intervals along the sides of mobile robots (Sungbok, 2010).If more than one sensor is used, interference needs to be avoided, which increases scanning time.The same principle is applied when other types of TOF sensors are used, where the transmitted wave determines the sensors' minimum and maximum range, resolution and speed.
Sensors that utilize simple geometric principles (Hartley, 2003;Shapiro, 2001) are members of a second subgroup, which include multi-camera systems and various laser scanners.The most commonly used sensor is a camera that is sensitive to either visible or infrared light in a variety of setups, such as monocular or stereo vision.The basic principle is similar to the principle governing human vision (Hartley, 2003).Two downsides of this method are that it is very complex and calculation intensive.In general, robots' embedded computers do not have sufficient computing power to fully benefit from vision systems.Constraints on the on-board computational power, because of the need for real-time processing, make this problem especially demanding.In contrast, if simple sensors or sensors that output simple analog or digital signals are used, guidance by a relatively simple onboard computer can be achieved.Kumari et al. (2012) created a mobile robot that can navigate in a building without the need for human intervention.
Signals received from infrared sensors are analyzed by an Atmega 32 microcontroller, which controls actuators and guides the robot through an environment.In contrast to systems that are designed for indoor use, mobile robots that are primarily used outdoors, where a more dynamic environment is expected, require more complex sensors that output complex signals.Real-time vision systems used in UGVs (Unmanned Ground Vehicles (Fisher 2013)) are constrained to lower resolutions because of their limited processing power and low payload capacity.
In contrast to vision systems, laser scanners offer simpler calculations of an object's location.Most common laser-based devices use laser beams pointed toward rotating mirrors, where fast scanning can be accomplished in the vertical direction, but the entire scanner has to be able to move horizontally if 3D scanning is required.Because the number of points measured is relatively low, detection based solely on laser data may be unreliable (Fotiadis, 2013).An additional drawback of laser-based scanners is that in normal daylight, the system cannot distinguish the reflected laser beam from the surrounding sunlight (Kucsera, 2006).This drawback can be addressed by using higher power lasers and optical filters at the wavelength of the utilized laser.The laser ranging system typical covers an area of a half-circle with a 50-m radius for large objects (more than 1 m as viewed from the scanner) (Mertz, 2013).The CMU-RI Navlab group has developed such a system that uses a laser scanner as its primary sensor.(Mertz, 2013).The MIT Urban Grand Challenge team has made their data available (Huang, 2010).Their vehicle had a Velodyne, five cameras, and 12 SICK laser scanners.Camera vision systems generally provide more information than do laser systems and include information about an object's texture and shape, which makes them ideal for object recognition (Fotiadis, 2013).
There are three well-known reasons as to why multiple sensors are installed on mobile robots (Borenstein, 1997;Kucsera, 2006).The first reason is that one sensor alone cannot cover the entire area of interest.The second reason is that the combination of sensors with different properties achieves greater robustness and higher quality detection results that cannot be achieved with one type of sensor.These properties include different ranges, resolutions, update rates, and even different sensing modalities.The third reason is redundancy.A fast obstacle detection algorithm for mobile robots based on the fusion of a vision sensor and an ultrasonic sensor was proposed by Liu et al (Liu, 2012).The distance between the robot and the rectangular obstacle is obtained by the ultrasonic sensor located on the head of the robot.The purpose of the described algorithm is local path planning for a hexapod robot.Using obstacle detection sensors in systems that generally operate in closed and well-known environment is a useful improvement, especially when the robot has to avoid humans, animals or other types of obstacle.In recent years, great effort has been made toward developing systems for pedestrian detection and avoidance (Fuerstenberg, 2005;Gandhi, 2007;Navarro-Serment, 2008;Premebida, 2009).Small and low-cost mobile robots are limited in terms of the selection of sensors.The complexity and high processing power requirements of stereo vision systems (or any vision system) precludes the use of such systems in robots with limited payloads.Industry-grade laser scanners (SICK) (Cang, 2002) are in some cases more expensive than the robot itself or are too bulky to be efficiently utilized.The recent development of the Microsoft Kinect system, whose primary function is as a game console input device, inspired researchers to use it as a robot navigation device.Correa et al. (Correa, 2002) successfully implemented the Microsoft Kinect as a sensor for the indoor navigation of an autonomous surveillance mobile robot.However, it does not currently have adaptive capabilities, which could reduce costs, especially for smaller robots.Based on the literature review, it is evident that a simple and cost-effective solution is required, which would combine properties of fast obstacle detection sensors and high-resolution sensors for recognizing objects in a scene.This article addresses the analysis and proof of such a concept.

3D Scanner System
In this paper, a simple and low-cost laser range finder with 3D scanning capabilities is proposed.The system should enable mobile robots to detect obstacles in their vicinity and, when required, to obtain detailed surface scans of a single object to analyze various properties such as shape and size.A detailed surface scan might be required in situations where a robot has to distinguish between humans or other living objects in its workspace to avoid collisions and to predict its trajectory.
The adaptability of the proposed scanner system is achieved by allowing the scanner to project multiple patterns at the target area.A longer scanning time with more than one projected pattern enables a more detailed scan, which enables limited object recognition, while short scanning times (a single pattern is projected) allows the scanner to detect obstacles without any detail analysis.A more detailed description of the adaptability of the system is contained in the following subsections.

Materials and methods
The proposed concept of an adaptive scanner finder was tested in a laboratory with an off-the shelf DLP (Digital Light Processing) projector as a main component (Figure 1).The DLP projector projects light patterns as would a laser projector.The DLP projector paired with a compact digital camera creates a simple stereovision system (Rocchini, 2001), where the projector is the active component and the camera is the passive one.The same stereovision principle is implemented in human vision to detect a range map of the surrounding world (Hartley, 2003).The video projector used was an Optoma EP739 DLP, projecting video at a resolution of 1024 x 768 pixels at a frame rate of 60 Hz, while the Cannon G9 digital camera was recording at a resolution of 640 x 480 pixels at a frame rate 60 Hz.The described components are to be used in proof-of-concept tests, while the completed system would include a laser light source instead of a projector and a high-speed camera, enabling scans to be executed in a matter of a few milliseconds.
Stereovision system and calibration: If a single light ray is projected from a projector (denoted as B in Figure 1), it passes through the projector frame (denoted as point B'), and it hits the target at point C. The reflected light ray is captured by the camera at its plane (denoted as point A').The pixel on the camera plane (A') and the pixel on the projector plane (B') correspond to an angle between the camera and object (α) and an angle between the projector and object (β).If the exact position of the camera and projector in the world coordinate frame is known, the problem of reconstructing the exact position of point C is reduced to a simple triangulation problem.Calibration is a mandatory requirement of any multi-camera vision system (Heikkila, 1997;Tsai, 1987;Zhang, 1999;Zollner, 2004).It is used for the calculation of the relations between cameras and scenes, thus enabling the simple reconstruction of objects if their locations on the camera and projector planes are known (A' and B').The measurement procedure is initiated with a calibration step, which is performed once when the camera and projector are placed at the desired locations with respect to each other.The outputs of the calibration process are the camera and projector matrices Pc and Pp, respectively, which contain orientation and position information in the coordinate frame defined by the calibration object.An extrinsic part of the calibration procedure requires the scanning of a scene with a well-known configuration (covering most of the system's field of view).Given a set of corresponding pixel pairs, system calibration is achieved using an approach proposed by Heikkila (Heikkila, 1997).To achieve better calibration results and to minimize possible errors due to manual key point selection, a certain level of redundancy was built into the calibration process through the use of nine calibration points, denoted as 1-9 (Figure 2), with five objects present in the scene.The intrinsic part of the calibration was not performed because lens distortion from both the projector and camera were found to be minimal.Using a Direct Linear Transformation (DLT), the coordinates for each point on the scanned surface are calculated (Abdel-Aziz, 1971).The position of point X in the reference coordinate frame is derived using equations ( 1) and ( 2), where Xc and Xp are its coordinates in the camera and projector planes, respectively; Pc and Pp are the camera and projector matrices, respectively; τ is a triangulation function; and H is the linear transformation that transforms p c HX X = (Hartley, 2003).
) , , , ( Projected patterns: The projection must contain as many points as possible but be arranged in such a way that any point from the pattern can be uniquely identified during a scan.Instead of projecting a single point onto the target area, the proposed system projects a pattern containing 49 points in a star configuration.The proposed configuration consists of 6 star segments (arms) with 8 points and one central point (49 points in total).A total of 8 points per segment was selected considering the camera's resolution (480 vertical lines), allowing at least 2 pixels of vertical distance between projected points in the full scan pattern.A higher number of points in the star segment would not dramatically increase the system's detection accuracy, while a lower number of points would decrease the scanner's total resolution.A total of six star segments was the maximum number of segments that would not project neighboring points too close to each other.A more advanced camera (in terms of resolution) would allow for a larger number of points in the star configuration.Several other pattern configurations were tested but with less promising results in terms of the trade-off between the algorithm's complexity (point recognition and racking) and the area coverage.Thus, the proposed configuration was considered to be optimal in that sense.By projecting multiple points at the target area, the system is able to scan the desired area in a single frame (if the camera and projector are synchronized).By aligning points in the form of dotted lines, identifying each point from the camera image is made easier.A simple prediction algorithm based on pattern dynamics is implemented and predicts the location of a point in the next scanning frame, thus minimizing errors when points overlap or are not visible in some frames.In a single projected pattern (Figure 3, left), points are arranged (the sides of the dotted lines are shifted) in such a way that when the matrix rotates, the points do not overlap previous projections.One full scan cycle (Figure 3, right) consists of several single projected patterns.The center of a projected pattern is fitted with more points, resulting in a higher resolution scan at the center and a lower resolution scan at the projection edges.The final result of the 3D structured light scanning process is a "point cloud" that represents a scanned surface.

Experimental Measurement
The system was tested in two scenarios, both of which could occur during normal operation.A set of five rectangular objects with known dimensions are arranged similar to what is shown in Figure 2. A set of nine small markers is placed in the scene and are denoted as 1-9 in Figure 2. The locations of the markers were measured with a precision caliper (0.2 mm) and are used in the calibration process.The object dimensions were also measured with a caliper, which was used in later accuracy trials.The minimum allowed distance of objects from the experimental scanner was 20 cm, while the maximum distance was constrained by the light intensity of the projected patterns (up to 5 m under our laboratory conditions).In the experiments, the measurements were performed with a scanner at a distance of 150 cm from the scanned objects.
After calibration, a partial scan was executed (with only one projected frame), after which a full scan was executed (full cycle of 10 projected frames).Each projected and detected point was paired, from which a point cloud was derived.Using a simple image interpolation method (2D interpolation) and data from each point, a simple depth map was constructed.The projector and camera were set to operate at 60 Hz.Because they were not synchronized, the resulting scanner frame rate was lowered to 30 Hz.Because a full scan cycle contains 10 frames, a cycle is completed in 330 ms (33 ms per projected frame).

Results and Discussion
The results are derived from the reconstructed depth map obtained in the previous step, where partial and full scan cycles were independently analyzed.The depth map of the original scene (object formation from Figure 2) is shown in Figure 4a.The letter tags A-F mark a single object on the scene, while the circle shows the area covered by the projected patterns.All the calculations and the graphical presentation were performed using Matlab 2010.The resulting depth map of one full scan cycle is shown in Figure 4b, which contains all objects from the original scene in terms of sizes and shapes that resemble the object in the original configuration.Using only the depth map from Figure 4b, simple calculations of the objects' sizes and types could be performed.
The depth map reconstructed from a partial scan is shown in Fig- ure 4c.Nearby objects are detected, and the distance from them is calculated, but no other useful information about the object shape and size can be extracted from the depth map.When the partial scan is performed, it is possible that some smaller objects are not detected by the scanner in the current frame.As shown in Figure 4c, object C is not detected, which is because the object is not covered with the pattern in the current projection frame.The rotation of the projected pattern in the next frames enables the detection of a previously undetected object.The introduction of a high-speed camera (and laser projector) enables shorter scanning times (less than 33 ms per frame), which minimizes the possibility of an obstacle not being detected during navigation.
When increasing the number of points in the projected pattern (full scan), the resulting depth map is more similar to the ideal depth map (Figure 4 a), and some limited analyses of the objects' characteristics are feasible.Table 1 shows the results of the depth reconstruction for the full scan cycle for all five objects, denoted as objects A, B, C, D, E, and the reference background, denoted as F. The last row (Table 1) shows the results for all objects (all points detected on objects).One can conclude that the system's depth reconstruction performs with a mean error of 1.5 mm and an RMSE better than 1 cm, which is more than adequate for its intended application.The accuracy of the proposed system could be improved by implementing a sub-pixel point center detection algorithm (Ling, 2005;Stancic, Grujic, 2013), while including more points in the projected pattern would increase the system's final resolution.To summarize, the only difference between the partial and full scans is the number of points projected onto the area, and thus, the resolution of the scan is different.

Future developments
The final system should use a laser projector and a high-speed camera (Figure 5).A single source laser beam would pass through a matrix that scatters light in a desired configuration.Motor 1 would rotate the laser matrix with a controlled speed, thus controlling the system's scanning speed.A slower rotation means a higher resolution scan at the cost of longer scanning times.This scan setting is useful for the recognition of objects.Increasing the matrix rotation speed results in a lower resolution scan, which can be completed quickly.An evident drawback of this scan setting is the inability to recognize and analyze the object's properties.A small alteration of the scanner's field of view can be achieved by modifying the laser based on the matrix's distance, thus enabling an even higher density of projected points at the desired area.Finally, by increasing the rotation speed with the largest FOV (field of view), a mobile robot can only obtain information about the obstacles present in its vicinity, even if the robot is on the move.By pointing the scanner module at a single target and by reducing the scanning speed (Motor 1 rotation speed), a scanned object could be detected with more points, thus enabling object recognition and analysis.The scanner module is intended to be placed on a rotating platform, which enables the scanner to horizontally scan the area.The algorithm for scene reconstruction has to be modified in a way that considers the rotation of the scanner module relative to the robot's main body and the displacement of the robot using odometry or a similar technique.As shown in (Stancic, Music, 2013), vibrations slightly affect a scanner's accuracy, but this can be partially compensated for by introducing an accelerometer in the scanner module.This improvement is planned as a component of robotic aids for blind and low-vision persons.

Conclusions
Information about the exact position of a robot in a coordinate system and sensing obstacles in its vicinity is a fundamental requirement for successful mobile robot navigation.To this end, modern mobile robots relay information obtained from multiple sensors or are guided by an operator.Physically small or low-cost robots do not contain advanced scanning sensors or vision systems.Therefore, a novel concept for a low-cost, adaptive 3D scanner and range finder was proposed in this paper.The scanner is capable of working in two modes: fast scanning, which is used for detecting nearby obstacles and measuring distances to obstacles, and slow scanning, which is useful when a more detailed scan of a single object is required.When a high-resolution scan of static objects is needed, the robot may be required to stop, while working in obstacle detection mode, the system compensates for the robot's motion.The accuracy of both modes is similar, and longer scan times only provide higher resolution and consequently the capability of limited object analyses.A partial scan covers an area with 49 points, while a full scan covers the scanned area with up to 490 points.The mean error for the depth map reconstruction was 1.52 mm, with an RMSE below 1 cm, which is more than adequate for its intended application.The inclusion of a laser projector instead of a DLP projector and a high-speed camera instead of a compact digital camera would greatly increase the system's operating speed.

Figure 1 .
Figure 1.Stereovision principle used in the proposed 3D scanner

Figure 2 .
Figure 2. Arrangements of rectangular objects of different sizes and shapes in test area

Figure 3 .
Figure 3. Projected pattern configuration for partial scan (left) and full scan (right)

Figure 5 .
Figure 5. Laser based measurement system

Table 1 .
Results of the accuracy for the full scan depth reconstruction