Elsevier

Pattern Recognition

Volume 33, Issue 3, March 2000, Pages 483-501
Pattern Recognition

Localizing a polyhedral object in a robot hand by integrating visual and tactile data

https://doi.org/10.1016/S0031-3203(99)00059-XGet rights and content

Abstract

We present a novel technique for localizing a polyhedral object in a robot hand by integrating visual and tactile data. Localization is performed by matching a hybrid set of visual and tactile features with corresponding model features. The matching process first determines a subset of the object's six degrees of freedom (DOFs) using the tactile feature. The remaining DOFs, which cannot be determined from the tactile feature, are then obtained by matching the visual feature. A couple of touch and vision/touch-based filtering techniques are developed to reduce the number of model feature sets that are actually matched with a given scene set. We demonstrate the performance of the technique using simulated and real data. In particular, we show its superiority over vision-based localization in the following aspects: (1) capability of determining the object pose under heavy occlusion, (2) number of generated pose hypotheses, and (3) accuracy of estimating the object depth.

Introduction

Determining the pose of an object in a robot hand is important in many robotic tasks that involve manipulation of objects (e.g., automated assembly). Such pose information is needed to close the feedback loop of the robot controller. This enables the controller to intelligently react to undesirable object movements, which can be due to slippage or collision, for example.

There are two types of sensory data that can be utilized in the task of localizing an object in a robot hand: visual and tactile. The visual data can be provided by a visual sensor monitoring the robot workspace, while the tactile data can be obtained from tactile sensors mounted on the hand. These types of data have different characteristics. Firstly, visual data are relatively global; i.e., they capture the visible part of the object as seen from the visual-sensor viewpoint. Unfortunately, visual data provide only a 2D projection of the 3D world. Thus, there is a substantial loss of 3D information. Tactile data, on the other hand, provide 3D information about the object, but unfortunately these data are local and in many cases insufficient to localize the object. Secondly, visual sensing is sensitive to occlusion of the object by visual obstacles (e.g., the robot hand), a problem that is not experienced by tactile sensing. In fact, tactile data often provide information about visually occluded parts of the object. Thirdly, it is generally not guaranteed whether a given visual feature (e.g., an edge or a junction) belongs to the model object. This is not the case with tactile features, since they come from direct contact with the object. From this comparison, summarized in Table 1, it is reasonable to expect that by integrating both types of sensory data, more robust object localization can be achieved.

The problem of 3D object recognition1 has received significant attention during the last two decades (e.g., see surveys [1], [2], [3], [4]). Most 3D object recognition systems rely on a single type of sensory data such as vision [5], [6], [7], [8], [9], [10], range [1], [11], [12], [13], or touch [14], [15], [16]. Thus, they are unsuitable for utilizing various types of sensory data which may be readily available in some tasks, such as ours, in order to improve efficiency and robustness. There have been few efforts for integrating visual and tactile data in the context of 3D object recognition. Luo and Tsai [17] use a pre-compiled decision tree to recognize 3D objects on a plane. The first level of this tree utilizes moment invariants of the object silhouette, while the latter levels use tactile data to completely discriminate between model objects. The tactile data are provided by two tactile sensing arrays mounted on a parallel-jaw gripper that approaches the object at pre-determined directions. Allen [18] uses passive stereo vision to guide a tactile sensor, mounted on a robot arm, to explore and construct a partial 3D description of the scene object. This description is then matched with model objects to recognize the scene one. Ambiguity is resolved by further tactile exploration of the scene object. These two approaches assume that the object to be recognized is not grasped by a robot hand, and so they are unsuitable in our case. Perhaps the most relevant technique to ours is the one presented by Browse and Rodger [19]. They propose a system to recognize 3D objects with three discretized degrees of freedom (one rotational and two translational). For each visual or tactile feature, the discrete set of consistent object/pose hypotheses is generated. These sets are then intersected to identify the object and determine its pose. Limitations of this approach are recognition of objects with only three degrees of freedom, and limited accuracy due to discretization of the pose space.

In this paper, we present a novel technique for localizing a polyhedral object in a robot hand by integrating visual and tactile data. These data are assumed to be provided by a monocular visual sensor and a square planar-array tactile sensor. Hypotheses about the object pose are generated by matching a hybrid set of visual and tactile features with corresponding model ones. The scene feature set consists of a visual junction, of any number of edges, and a tactile feature. For the sake of presentation, we consider two types of tactile features resulting from the following cases of tactile contact: (1) The object surface in contact with the tactile sensor totally covers the sensor array (see Fig. 1a), and (2) the contact surface partially covers the sensor and only one edge appears in the sensor array (see Fig. 1b). We refer to the polygonal patches resulting from these two contacts as S-patch and SE-patch, respectively. These two types of tactile features correspond to model surface and surface-edge pair, respectively. We refer to these model features as S-surface and SE-surface, respectively. The matching process first determines a subset of the object's six degrees of freedom (DOFs) using the tactile feature. The remaining DOFs, which cannot be determined from the tactile feature, are then obtained by matching the visual feature. In addition, we present a couple of filtering techniques to reduce the number of model feature sets that are actually matched with a given scene feature set. One technique utilizes touch-based constraints on the 3D space occupied by the object. The other one uses vision/touch-based constraints on a number of transformation-invariant attributes associated with the model set.

The proposed vision/touch-based localization technique has several advantages over common vision-based localization techniques. Firstly, it is capable of determining object pose in heavily occluded visual images that are problematic to vision-based methods. Secondly, the average number of generated hypotheses, per scene feature set, is considerably less than the number of those generated visually. This reduces the computational load on subsequent processes that verify the generated hypotheses (e.g. [8], [20], [21], [22]). Thirdly, the accuracy of estimating the object depth (with respect to the visual sensor) can be significantly better when vision is integrated with touch. These advantages are demonstrated experimentally in Section 4.

The rest of the paper is organized as follows. In the next section, we describe the technique used to generate pose hypotheses in detail. Touch- and vision/touch-based filtering techniques are presented in Section 3. In Section 4, we present experimental results using simulated and real data, and finally, in Section 5, we provide conclusions.

Section snippets

Pose estimation

In this section, we present the technique used to generate pose hypotheses by integrating visual and tactile data.

Filtering techniques

In this section, we describe a couple of filtering techniques for reducing the number of model junctions that are actually matched with a visual junction. These techniques are especially important since the size of the junction database can be large. For a polyhedron with NS surfaces, and NE edges per surface, the total number of junctions in the junction database is of order O(N2SNE) and O(N2SN2E) for S- and SE-patch cases, respectively. Section 3.1 presents a filtering technique that utilizes

Experimental results

In this section, we present a number of simulation and real experiments to demonstrate the performance of the proposed vision/touch-based localization technique.

Conclusions

A novel technique has been presented for localizing a polyhedral object in a robot hand by integrating visual and tactile data. Localization is performed using a hybrid set of visual and tactile features. The matching process first determines a subset of the object's DOFs using the tactile feature. The remaining DOFs, which cannot be obtained from the tactile feature, are then determined by matching the visual feature. A couple of touch-based and vision/touch-based filtering techniques are

About the Author—MICHAEL BOSHRA received B.Sc. and M.Sc. degrees in Computer Science from the University of Alexandria, Egypt, in 1988 and 1992, respectively, and Ph.D. degree in Computing Science from the University of Alberta, Canada, in 1997. He is currently a post-doctoral fellow in the Center for Research in Intelligent Systems (CRIS) at the University of California, Riverside. From 1989 to 1992, he worked as a research assistant at the National Research Center, Giza, Egypt. His current

References (26)

  • K. Kanatani

    Constraints on length and angle

    Comput. Vision Graphics Image Process.

    (1988)
  • D.G. Lowe

    Three-dimensional object recognition from single two-dimensional images

    Artificial Intell.

    (1987)
  • K.C. Wong et al.

    Recognizing polyhedral objects from a single perspective view

    Image and Vision Comput.

    (1993)
  • F. Arman et al.

    Model-based object recognition in dense-range images – a review

    ACM Comput. Surveys

    (1993)
  • P.J. Besl et al.

    Three-dimensional object recognition

    ACM Comput. Surveys

    (1985)
  • R.T. Chin et al.

    Model-based recognition in robot vision

    ACM Comput. Surveys

    (1986)
  • P. Suetens et al.

    Computational strategies for object recognition

    ACM Comput. Surveys

    (1992)
  • H.H. Chen

    Pose determination from line-to-plane correspondencesexistence condition and closed-form solutions

    IEEE Trans. Pattern Anal. Mach. Intell.

    (1991)
  • M. Dhome et al.

    Determination of the attitude of 3D objects from a single perspective view

    IEEE Trans. Pattern Anal. Mach. Intell.

    (1989)
  • S. Ullman et al.

    Recognition by linear combinations of models

    IEEE Trans. Pattern Anal. Mach. Intell.

    (1991)
  • P. Flynn et al.

    BONSAI3D object recognition using constrained search

    IEEE Trans. Pattern Anal. Mach. Intell.

    (1991)
  • W. Kim et al.

    3D object recognition using bipartite matching embedded in discrete relaxation

    IEEE Trans. Pattern Anal. Mach. Intell.

    (1991)
  • F. Stein et al.

    Structural indexingefficient 3D object recognition

    IEEE Trans. Pattern Anal. Mach. Intell.

    (1992)
  • Cited by (0)

    About the Author—MICHAEL BOSHRA received B.Sc. and M.Sc. degrees in Computer Science from the University of Alexandria, Egypt, in 1988 and 1992, respectively, and Ph.D. degree in Computing Science from the University of Alberta, Canada, in 1997. He is currently a post-doctoral fellow in the Center for Research in Intelligent Systems (CRIS) at the University of California, Riverside. From 1989 to 1992, he worked as a research assistant at the National Research Center, Giza, Egypt. His current research interests include object recognition, sensor fusion, performance prediction, and multi-dimensional indexing structures.

    About the Author—HONG ZHANG is an Associate Professor in the Department of Computing Science, University of Alberta, Edmonton, Alberta, Canada. He is director of the Robotics Research Laboratory and his current research interests include sensor data fusion, tactile sensing, dextrous manipulation, tele-robotics, and collective robotics. He received his B.S. degree from Northeastern University in 1982 and his Ph.D. degree from Purdue University in 1986, both in Electrical Engineering. He subsequently worked as a post-doctoral fellow in the GRASP Laboratory, University of Pennsylvania, for 18 months before joining University of Alberta. His other work experience includes Kajima visiting professorship at Kagoshima University in 1993 and an STA fellowship at the Mechanical Engineering Laboratory, Japan, in 1994–1995.

    This research was partially supported by the National Science and Engineering Research Council of Canada.

    View full text