Elsevier

Pattern Recognition

Volume 36, Issue 12, December 2003, Pages 2927-2944
Pattern Recognition

A review on egomotion by means of differential epipolar geometry applied to the movement of a mobile robot

https://doi.org/10.1016/S0031-3203(03)00183-3Get rights and content

Abstract

The estimation of camera egomotion is an old problem in computer vision. Since the 1980s, many approaches based on both the discrete and the differential epipolar constraint have been proposed. The discrete case is used mainly in self-calibrated stereoscopic systems, whereas the differential case deals with a single moving camera. This article surveys several methods for 3D motion estimation unifying the mathematics convention which are then adapted to the common case of a mobile robot moving on a plane. Experimental results are given on synthetic data covering more than 0.5 million estimations. These surveyed algorithms have been programmed and are available on the Internet.

Introduction

Camera calibration is the first step toward computational computer vision. The use of precisely calibrated cameras makes the measurement of distances in a metric world from their projections on the image plane possible. The camera model is a mathematical description of the geometric relationship between the 3D geometric entities and their 2D projections on the image plane consisting of a set of internal camera parameters which describes the internal geometry and optics of the camera, and a set of extrinsic parameters which describes the position and orientation of the camera in the scene. Perspective cameras can be represented by several models depending on the desired level of accuracy.

Given a 3D point p in metric coordinates with respect to the world coordinate system {W}, its projection m in pixels with respect to the image coordinate system {I} can be computed through some linear (sometimes non-linear) equations.2 This set of equations encapsulates several transformations which are broken down into four steps (see Fig. 1).

  • 1.

    First the coordinates of point p in the world coordinate system are transformed into the camera coordinate system by using an Euclidean transformation.

  • 2.

    Next, it is necessary to carry out the projection of point p onto the image plane by using a projective transformation, obtaining point q.

  • 3.

    The third step models lens distortion causing a disparity of the real projection on the image plane. Then, point q is transformed into the real projection m.

  • 4.

    Finally, the last step consists of transforming the m point from the metric coordinate system of the camera into the image coordinate system of the computer in pixels.

Small variations in the definition of the geometric transformations used imply the use of different camera models, resulting in different calibration techniques. For instance, the technique proposed by Hall [1] in 1982 is based on an implicit linear camera calibration by computing the 3×4 transformation matrix which relates 3D object points with their 2D image projections. The latter work of Faugeras [2], proposed in 1986, was based on extracting the physical parameters of the camera from this sort of transformation technique. Some years later, work proposed by Faugeras was adaptated to include radial lens distortion [3]. Two other interesting contributions are the widely used method proposed by Tsai [4], is based on a two-step technique modelling only radial lens distortion, and the complete model of Weng [5], proposed in 1992, which includes three different types of lens distortion. Research efforts are still being carried out on obtaining new camera models to improve both accuracy in computing the optical ray and in extracting the camera parameters which best model reality. For additional details concerning camera calibration methods, please check the recent calibration survey [6].

When we get into the binocular case (that is two views from a stereoscopic system or two different views from a single moving camera) another interesting relationship is defined in the so-called epipolar geometry. This information is contained in the fundamental matrix which includes the intrinsic parameters of both cameras and the position and orientation of one camera with respect to the other. The fundamental matrix can be used to simplify the matching process between the viewpoints and to get the camera parameters in active systems where optical and geometrical characteristics might change dynamically depending on the imaging scene. In this case, the camera parameters can be extracted by using Kruppa equations [7]. Moreover, the epipolar geometry can be considered from both a continuous and a discrete point of view.

Probably the most well-known viewpoint is the discrete epipolar constraint formulated by Longuet–Higgins [8], Huang [9] and Faugeras [10]. In this case the relative 3D displacement between both views is recovered by the epipolar constraint from a set of correspondences in both image planes. Then, given an object point p with respect to one of the two camera coordinate system and its 2D projections q and q on both image planes (in metric coordinates), the 3 points define a plane Π which intersects both image planes at the epipolar lines lq and lq, respectively, as shown in Fig. 2. Note that the same plane Π can be computed using both focal points co and co and a single 2D projection, which is the principle used to reduce the correspondence problem to a single search along the epipolar line. Moreover, the intersection of all the epipolar lines defines an epipole on both image planes, which can also be extracted by intersecting the line defined by both focal points co and co on both image planes. All the epipolar geometry is contained in the so-called Fundamental matrix [8] as shown in Eq. (1)mTFm′=0,where the fundamental matrix depends on the intrinsic parameters of both cameras and the rigid transformation between themF=ATRTt̂A−1.

When the intrinsic camera parameters are known, it is possible to simplify , , obtaining,qTEq′=0,where q=A−1m,E=RTt̂,q′=A−1m. The matrix E is called essential [9].

Many papers describe different methods to estimate the fundamental matrix [11], [12], [13], [14].

The differential case is the infinitesimal version of the discrete case, in which both views are always given from a single moving camera. If the velocity of the camera is low enough and the frame rate is very high, the relative displacement between two consecutive images becomes very small. The 2D displacement of image points can then be obtained from an image sequence using the optical flow. In this case, the 3D camera motion is described by a rigid motion using a rotation matrix and a translation vector, as in (Fig. 3)p(t)=R(t)p(0)+t(t),where differentiatingṗ(t)=Ṙ(t)p(0)+ṫ(t).Then, replacing the parameter p(0) to R−1(t)[p(t)−t(t)] in Eq. (5), the following equation is obtained:ṗ(t)=Ṙ(t)R−1(t)p(t)+ṫ(t)−Ṙ(t)R−1(t)t(t),which leads to the following differential epipolar constraint:qTυ̂q̇+qTω̂υ̂q=0.ω=(ω123)T is the angular velocity of the camera and υ=(υ123)T is the linear velocity of the camera. By projecting p and ṗ in the image plane, q in camera coordinates and its corresponding optical flow q̇ are obtained.

For a complete demonstration, the reader is directed to Haralick's book, Chapter 15 [15], where the movement of a rigid body related to a camera is explained. In our case, the demonstration is used to describe the movement of a camera related to a static object, in which only the sign of the obtained velocities differs from the previous one. Nevertheless, Eq. (7) can also be demonstrated in different ways as explained in Viéville [16] and Brooks [17]. Also, another equivalent form of Eq. (7) is shown in Eq. (8). In this case, since matrix s is symmetric, the number of unknowns is reduced to sixqTυ̂q̇+qTSq=0,whereS=12(ω̂υ̂+υ̂ω̂).

The existence of two forms indicates that a redundancy exists in Eq. (7) (for a demonstration see Viéville [16], Brooks [17] and Ma [18]). Several books describe the optical flow such as Trucco and Verri [19], and the article published by Barron et al. [20] gives a state-of-the-art in optical flow estimation.

When comparing the discrete and differential methods, the discrete epipolar equation incorporates a single matrix, whereas the differential epipolar equation incorporates two matrices. These matrices encode information about the linear and angular velocities of the camera [15].

Approaches to motion estimation can be classified into discrete and differential methods depending on whether they use a set of point correspondences or optical flow. Another possible classification takes into account the estimation techniques used for motion recovery (linear or non-linear techniques). In Table 1, the algorithms are summarized and classified in terms of their nature (discrete and differential case), and estimation method (linear and non-linear technique).

This article analyzes several different algorithms for camera motion estimation based on differential image motion. The surveyed methods have been compared with them and experimental results are given. Moreover, this article analyzes the adaptation of general methods used in free 3D movement to planar motion which corresponds to the common case of a robot moving on a plane with the aim of studying how much accuracy improves by constraining the camera movement. Hence, this article focuses on linear techniques, as the motion has to be recovered in real-time.

This article is structured as follows. Section 2 describes up to 12 algorithms for 3D motion estimation based on optical flow. Section 3 focuses on the estimation of planar motion by constraining the free movement explained in the previous section. Then, Section 4 deals with the experimental results obtained. The article ends with conclusions.

Section snippets

Overview of 3D motion estimation

In this section, we detail some methods used for the recovery of every 6-DOF3 motion parameter from optical flow, providing insights into the complexity of the problem. The surveyed methods have been classified considering whether they are based on the Differential Epipolar Constraint or not.

Adaptation to a mobile robots

The aim of this work is to estimate the motion of a mobile robot. Due to the fact that the permitted movements of a robot are limited, it is possible to establish some modifications in the differential epipolar equation by applying new constraints. With these modifications, the number of potential solutions is reduced so the obtained results improve considerably.

Our robot (see Fig. 5) is constrained to only two independent movements: a translation along rx axis and a rotation around rz axis.Rυr

Experimental results

All the methods surveyed have been programmed and tested under the same conditions of image noise with the aim of giving an exhaustive comparison of most of 6-DOF motion estimation methods. Hence, Section 4.1 compares the twelve surveyed methods of 3D motion estimation and Section 4.2 deals with the six proposed adaptations to a 2-DOF mobile robot movement estimation. Section 4.3 shows results in real image sequences.

Conclusions

This article presents an up-to-date classification of the methods and techniques used to estimate the movement of a single camera. A survey of several motion recovering methods is done and experimental results are given with synthetic data considering both gaussian noise and outliers.

The general methods to estimate a 6-DOF movement have been adapted to the common case of a mobile robot moving on a plane obtaining better results and stability even under important noise conditions.

This article is

Acknowledgements

We greatly appreciate Dr. Tina Y. Tian, Dr. Carlo Tomasi and Dr. David J. Heeger who implemented the methods explained in Section 2.2 which have been compared with the rest of methods explained in the article, specially Dr. Heeger who gave us insightful information and the source code of such methods.

About the AuthorXAVIER ARMANGUÉ received the B.S. degree in Computer Science in the University of Girona in 1999 before joining the Computer Vision and Robotics Group. At present he is involved in the study of stereovision systems for mobile robotics and he is working for his Ph.D. in the Computer Vision and Robotics Group in the University of Girona and in the Institute of Systems and Robotics in the University of Coimbra.

References (48)

  • J. Weng et al.

    Camera calibration with distortion models and accuracy evaluation

    IEEE Trans. Pattern Anal. Mach. Intell.

    (1992)
  • R.I. Hartley

    Kruppa's equations derived from the fundamental matrix

    Pattern Anal. Mach. Intell.

    (1997)
  • H.C. Longuet-Higgins

    A computer algorithm for reconstructing a scene from two projections

    Nature

    (1981)
  • T.S. Huang et al.

    Some properties of the E matrix in two-view motion estimation

    IEEE Trans. Pattern Anal. Mach. Intell.

    (1989)
  • O.D. Faugeras

    Three-Dimensional Computer Vision

    (1993)
  • Z. Zhang

    Determining the epipolar geometry and its uncertaintya review

    Int. J. Comput. Vision

    (1998)
  • Q.-T. Luong et al.

    The fundamental matrixtheory, algorithms, and stability analysis

    Int. J. Comput. Vision

    (1996)
  • P.H.S. Torr et al.

    The development and comparison of robust methods for estimating the fundamental matrix

    Int. J. Comput. Vision

    (1997)
  • R.M. Haralick, L.G. Shapiro, Computer and Robot Vision, Vol. 2, Addison-Wesley Publishing Company, Reading, MA,...
  • M.J. Brooks et al.

    Determining the ego-motion of an uncalibrated camera from instantaneous optical flow

    J. Opt. Soc. Am.

    (1997)
  • Y. Ma et al.

    Linear differential algorithm for motion recoverya geometric approach

    Int. J. Comput. Vision

    (2000)
  • E. Trucco et al.

    Introductory Techniques for 3D Computer Vision

    (1998)
  • J. Barron, D. Fleet, S. Beauchemin, T. Burkitt, Performance of optical flow techniques, in: Proceedings of the IEEE...
  • X. Zhuang, R.M. Haralick, Rigid body motion on optical flow image, in: Proceedings of the First International...
  • Cited by (30)

    • Camera motion estimation through monocular normal flow vectors

      2015, Pattern Recognition Letters
      Citation Excerpt :

      The matched features can describe the motion correspondence and can be used to estimate the epipolar geometry and the fundamental matrix. Finally, the camera motion parameters can be obtained by decomposing the fundamental matrix [2,5]. However, establishing accurate feature correspondences is itself a very challenging task.

    • A review and evaluation of methods estimating ego-motion

      2012, Computer Vision and Image Understanding
      Citation Excerpt :

      In our simulations we focused on properties like statistical bias, consistency, robustness, and fusion between visual image motion and depth incorporation. Armangue et al. proposed a review on the estimation of ego-motion where they focused on methods utilizing the discrete epipolar constraint and the adaptation of these methods to the setting of their mobile robotics platform [3]. Due to the different focus of this study it is a good complement to ours.

    • Survey of Ship Detection in Video Surveillance Based on Shallow Machine Learning

      2021, Xitong Fangzhen Xuebao / Journal of System Simulation
    • CCTV Scene Perspective Distortion Estimation From Low-Level Motion Features

      2016, IEEE Transactions on Circuits and Systems for Video Technology
    View all citing articles on Scopus

    About the AuthorXAVIER ARMANGUÉ received the B.S. degree in Computer Science in the University of Girona in 1999 before joining the Computer Vision and Robotics Group. At present he is involved in the study of stereovision systems for mobile robotics and he is working for his Ph.D. in the Computer Vision and Robotics Group in the University of Girona and in the Institute of Systems and Robotics in the University of Coimbra.

    About the AuthorHELDER ARAÚJO is currently Associate Professor at the Department of Electrical and Computer Engineering of the University of Coimbra. He is Deputy Director of the Institute for Systems and Robotics, Coimbra. His main research interests are computer vision and mobile robotics. He has been working in vision and robotics for the last 13 years.

    About the AuthorJOAQUIM SALVI graduated in Computer Science in the Polytechnical University of Catalunya in 1993. He joined the Computer Vision and Robotics Group in the University of Girona, where he received the M.S. degree in Computer Science in July 1996 and the Ph.D in Industrial Engineering in January 1998. He received the best thesis award in Industrial Engineering of the University of Girona. At present, he is an associate professor in the Electronics, Computer Engineering and Automation Department of the University of Girona. His current interest are in the field of computer vision and mobile robotics, focusing on structured light, stereovision and camera calibration.

    1

    This research is partly supported by Spanish project CICYT-TAP99-0443-C05-01.

    View full text