Adaptive Feedback in Local Coordinates for Real-time Vision-Based Motion Control Over Long Distances

We studied the differences in noise-effects, depth-correlated behavior of sensors, and errors caused by mapping between coordinate systems in robotic applications of machine vision. In particular, the highly range-dependent noise densities for semi-unknown object detection were considered. An equation is proposed to adapt estimation rules to dramatic changes of noise over longer distances. This algorithm also benefits the smooth feedback of wheels to overcome variable latencies of visual perception feedback. Experimental evaluation of the integrated system is presented with/without the algorithm to highlight its effectiveness.


Introduction
In service robotics, the majority of tasks are based on robot intervention with known or unknown objects in indoor environments where global positioning systems are not accessible. Service robots benefit from using visual information to deal with objects their neighborhood. As reviewed in Walter et al. [1], state estimation and navigation for mobile platforms are commonly done using either global or local coordinates. Generally speaking, measurements using global coordinates (e.g. absolute position) are coarse and low frequency. Measurements using local coordinates, in which wheel odometry is available, are known to be smoother, fast and with a high resolution. In our case, the estimation from the environment is available in 30-40Hz while the dead-reckoning updates in 200Hz. To obtain smooth and globally valid feedback, fusion of these two types of measurements is a necessity but it is challenging and prone to drift [2], especially in heading estimation, which can affect coordinate mappings [3]. Some methods are based on stop, detect, look ahead and move [4]. However, our aim is to provide the object pose as a real-time feedback without the need for frequent replanning. Research that concerns itself with real-time detection [5] pays more attention to the precision of the image and less attention to the actual coordinates of the robot. For manipulation tasks, the repeatability of detection plays a vital role because the system might have physical interactions with the environment. Moreover, a path-follower controller with exact solutions needs a smooth and lowlatency localization. The object detection (ODT) results in this paper are comparable to Ha et al. [6]. However, in contrast with their conclusion, our method is capable of reaching smoother and more precise positioning of the robot with respect to the object without involving any image-based visual servoing in longer distances.

1234567890''""
International Conference on Robotics and Mechantronics (ICRoM 2017) IOP Publishing IOP Conf. Series: Materials Science and Engineering 320 (2018) 012009 doi:10.1088/1757-899X/320/1/012009 Long distance detections are important for non-holonomic mobile bases and autonomous vehicles because they need some space for steering and positioning before approaching the target area regardless of whether they are dealing with known markers or semi-unknown geometrical primitives.

Problem Definition and Architecture
The main goal of this research is to position a non-holonomic mobile manipulator in a location defined by target object(s). Each target object is recognizable by its longitude axis and geometric centroid, which builds a tensor of position/orientation M C T with respect to the camera frame. Since the result of ODT will be used as feedback for positioning tasks, it must be precise, smooth, well-behaved and available in real-time not only in terms of the image coordinates, but also in terms of the robot's coordinates and object's coordinates. This paper is a generalization of Aref et al. [7] for markerless detections. Figure 1 illustrates the hardware architecture for the experiments and Figure 2 shows the test case robot with position gauge marker. Compared to Aref et al. [7], we use the ODT described in Sec.2.1 instead of ROS and ALVAR. This method benefits from the use of a stereo rig with two JAI-GO5000 USB3.0 cameras with a 0.16 meter baseline. Image grabbing and processing are done in C++ and OpenCV 2.4.9 developing environments. The sensor dimensions of the cameras are 2560x2048 pixels from which a cropped centered view of 1920x1080 pixels is used. Cameras operate at 35FPS, which was deemed as the shortest usable exposure time under the illumination conditions. These modules run on a Lenovo laptop with a Core i7 CPU. The latest results are transmitted to the Real-Time Target (RTT) on a LAN network. The algorithms described in Sec.2 are implemented on RTT.

Position Estimation for Manipulation in a Local Coordinate
The main aim of our project is to discover the potential of stereoscopic imaging for remote handling. The setup of stereo cameras was done according to the topology design, which takes view angle, stereoscopic precision, and lighting into account. Figure 3 describes the processing stages of the 3D Node module, i.e., the eye-in-hand vision system.

Object Detection (ODT)
ODT follows that of Astola et al. [8] where a stereo camera system was used for object detection in 30-40Hz and pose estimation. In their research, object detection was based on color image thresholding in the hue channel and subsequent contour estimations. Object pose estimation performs stereo matching on the contour to derive the pose of the object in 3D coordinates. In Astola et al. [8], two versions of contour processing were studied -one using the convex hull and the other using the full contour. While the convex hull version performed faster computationally, the use of full contours with an additional filtering step provided superior results. The output of ODT is the object pose tensor M C T , which is constructed by the pose estimation algorithm using the centroid and orientation of the detected object. In Astola et al. [8], the immediate goal of the ODT was to maintain high enough rate of detection and pose estimation, as the sensor fusion prefers high frequency outputs from ODT. As a consequence, virtually no filtering or image processing is done jointly on the acquired frames and pose estimates. This results in a fair amount of high frequency pose estimation noise originating from ODT. The accuracy of the pose estimation is heavily range dependent, as the object's dimensions in the image plane diminish as a function of distance. Additionally, for a fixed stereo baseline, the system's ability to accurately estimate object depth also decreases with distance. The distance-based non-linearities in pose estimation, especially in the heading, are accommodated in a novel sensor fusion approach.

Experimental Results on Open-loop
To compare open-loop performance we used a controller independent of the vision feedback, which smoothly drives the robot in a straight line. It accelerates in the beginning and then maintains a constant velocity. As shown in Figure 4, if both systems are given the same input data, the M-Estimator is capable of generating smoother feedback. The acceleration moment of the robot is magnified on the graph to highlight the fast and smooth reaction of the filter. As illustrated, while the M-Estimator has faster motion feedback, it remains smoother across longer distances.

Integration in feedback
Exploiting the results of ODT in a closed-loop control system requires smoothness, availability and consistency of data at each sample time. Initially, an ODT module does not have such characteristics. Therefore, in this section we propose an algorithm that can integrate the ODT result in a real-time controller loop. To highlight the differences, we compare the following subsystems at each step:    [3].
• Modifications of M-Estimator: Gain adaptation to measurement changes in Sec.2.4, and gain adaptation to depth in Sec.2.5.

M-Estimation for Real-time Feedback
For each component of , such as , while we are mapping all the vectors into the object's local coordinate { }, we also apply the following estimation equations to estimate : where is the corresponding velocity component from the interoceptive sensor, such as wheel odometery [7]. Moreover, w accommodates the varying nose statistics by introducing the so-called robust Welsch function. Although this filter is capable of generating a smooth and robust feedback for marker-based ODT, for marker-less ODT the parasitic disturbances are heavily range-dependent (as shown in Figures 4 and 5) and should be accommodated accordingly.

Adaptive Filtering
As shown in Figures 4 and 5, the method described in 2.4 is capable of rejecting noises and disturbances initially. However, across longer distances, the object's image size becomes too small (e.g. Figure 4) and detection characteristics are different, especially for the orientations. For the scene illustrated in Figure 3, assume that triangulation between the image frame and actual object places in a pin-hole camera model is supposed to be valid for objects 1, 2 and 3. Then, the ratio of 1 1 / ' z d linearly affects the tangent of the object orientation. Practically, this deterministic calculation is not valid for object places such as 3, where the distance 2 d is comparable to real pixel wideness, p . Quantization errors simply can be equal to or bigger than p and signal to noise ratio is considerable. This  in (2). Therefore, w gain of (1), which is supposed to prevent integration of unexpected measurements in longer distances, prevents convergence to the final value, as shown in Figure 6. and the left graph of Figure 7. From the lessons we learned during the experimental studies on the system, the Welsch weight should be adaptive to the camera-object distance, where dˆ is the estimated distance to the object and min d is the minimum detection distance where we want value c to converge to 3 1 w w k k + . Usually, this distance is the grasping point. According to the experiments, for Ver.1, there is also a significant change of the mean values while approaching the object. Thus, we can conclude that Ver.1 is not useful for the closed-loop control loop as it adds significant non-linear behavior to the position estimation in local coordinates. So, (1) is suitable for the ODTs similar to Ver.2 where the quantization errors do have a dramatic effect on the convergence rate. For these ODTs, substitution of (1) in (2) compensates the dependency of noise dynamics on the distance. As shown in Figure 7b, the adaptive M-Estimator is capable of following the camera output even after a considerable amount of outliers. These methods provide the robot with real-time feedback of position suitable for closed-loop position-based visual servoing as shown in Figure 8 by adaptation of estimation weights. Some of the tests are also demonstrated on the YouTube video 1 . 1 Object tracking on the go, by iMoro https://www.youtube.com/watch?v=m4vsr1rvxUg

Conclusions
In this paper, by combining our previous experience on machine vision application in non-holonomic steerable platforms for mobile manipulation, we modified the widely-applied principles of machine vision, and complementary and Kalman filtering, for pose estimation. On one hand, we demonstrated how modifications in ODT can reduce non-linearity of detection for unknown objects. On the other hand, by proposing an adaptive estimation method, we achieved nearly the same quality of mobile robot positioning compared to marker-based ODT in distances that had been considered too far for position-based motion control. It is important to note that long range ODT and position estimation is essential for non-holonomic robots to keep their maneuvers smooth without stopping and replanning. The results are experimentally demonstrated on a video for detection, estimation and positioning on the go. Extension of this work for higher degrees of freedom is a future work.