A review on egomotion by means of differential epipolar geometry applied to the movement of a mobile robot

doi:10.1016/S0031-3203(03)00183-3

Pattern Recognition

Volume 36, Issue 12, December 2003, Pages 2927-2944

https://doi.org/10.1016/S0031-3203(03)00183-3 Get rights and content

Abstract

The estimation of camera egomotion is an old problem in computer vision. Since the 1980s, many approaches based on both the discrete and the differential epipolar constraint have been proposed. The discrete case is used mainly in self-calibrated stereoscopic systems, whereas the differential case deals with a single moving camera. This article surveys several methods for 3D motion estimation unifying the mathematics convention which are then adapted to the common case of a mobile robot moving on a plane. Experimental results are given on synthetic data covering more than 0.5 million estimations. These surveyed algorithms have been programmed and are available on the Internet.

Introduction

Camera calibration is the first step toward computational computer vision. The use of precisely calibrated cameras makes the measurement of distances in a metric world from their projections on the image plane possible. The camera model is a mathematical description of the geometric relationship between the 3D geometric entities and their 2D projections on the image plane consisting of a set of internal camera parameters which describes the internal geometry and optics of the camera, and a set of extrinsic parameters which describes the position and orientation of the camera in the scene. Perspective cameras can be represented by several models depending on the desired level of accuracy.

Given a 3D point $p$ in metric coordinates with respect to the world coordinate system {W}, its projection $m$ in pixels with respect to the image coordinate system {I} can be computed through some linear (sometimes non-linear) equations.² This set of equations encapsulates several transformations which are broken down into four steps (see Fig. 1).

1.
First the coordinates of point $p$ in the world coordinate system are transformed into the camera coordinate system by using an Euclidean transformation.
2.
Next, it is necessary to carry out the projection of point $p$ onto the image plane by using a projective transformation, obtaining point $q$ .
3.
The third step models lens distortion causing a disparity of the real projection on the image plane. Then, point $q$ is transformed into the real projection $m$ .
4.
Finally, the last step consists of transforming the $m$ point from the metric coordinate system of the camera into the image coordinate system of the computer in pixels.

Small variations in the definition of the geometric transformations used imply the use of different camera models, resulting in different calibration techniques. For instance, the technique proposed by Hall [1] in 1982 is based on an implicit linear camera calibration by computing the 3×4 transformation matrix which relates 3D object points with their 2D image projections. The latter work of Faugeras [2], proposed in 1986, was based on extracting the physical parameters of the camera from this sort of transformation technique. Some years later, work proposed by Faugeras was adaptated to include radial lens distortion [3]. Two other interesting contributions are the widely used method proposed by Tsai [4], is based on a two-step technique modelling only radial lens distortion, and the complete model of Weng [5], proposed in 1992, which includes three different types of lens distortion. Research efforts are still being carried out on obtaining new camera models to improve both accuracy in computing the optical ray and in extracting the camera parameters which best model reality. For additional details concerning camera calibration methods, please check the recent calibration survey [6].

When we get into the binocular case (that is two views from a stereoscopic system or two different views from a single moving camera) another interesting relationship is defined in the so-called epipolar geometry. This information is contained in the fundamental matrix which includes the intrinsic parameters of both cameras and the position and orientation of one camera with respect to the other. The fundamental matrix can be used to simplify the matching process between the viewpoints and to get the camera parameters in active systems where optical and geometrical characteristics might change dynamically depending on the imaging scene. In this case, the camera parameters can be extracted by using Kruppa equations [7]. Moreover, the epipolar geometry can be considered from both a continuous and a discrete point of view.

Probably the most well-known viewpoint is the discrete epipolar constraint formulated by Longuet–Higgins [8], Huang [9] and Faugeras [10]. In this case the relative 3D displacement between both views is recovered by the epipolar constraint from a set of correspondences in both image planes. Then, given an object point $p$ with respect to one of the two camera coordinate system and its 2D projections $q$ and $q ′$ on both image planes (in metric coordinates), the 3 points define a plane Π which intersects both image planes at the epipolar lines $l_{q′}$ and $l_{q} ′$ , respectively, as shown in Fig. 2. Note that the same plane Π can be computed using both focal points $c_{o}$ and $c_{o} ′$ and a single 2D projection, which is the principle used to reduce the correspondence problem to a single search along the epipolar line. Moreover, the intersection of all the epipolar lines defines an epipole on both image planes, which can also be extracted by intersecting the line defined by both focal points $c_{o}$ and $c_{o} ′$ on both image planes. All the epipolar geometry is contained in the so-called Fundamental matrix [8] as shown in Eq. (1) $m^{T} Fm ′=0,$ where the fundamental matrix depends on the intrinsic parameters of both cameras and the rigid transformation between them $F = A^{−T} R^{T} t ̂ A ′^{−1} .$

When the intrinsic camera parameters are known, it is possible to simplify , , obtaining, $q^{T} Eq ′=0,$ where $q = A^{−1} m, E = R^{T} t ̂, q ′= A ′^{−1} m ′$ . The matrix $E$ is called essential [9].

Many papers describe different methods to estimate the fundamental matrix [11], [12], [13], [14].

The differential case is the infinitesimal version of the discrete case, in which both views are always given from a single moving camera. If the velocity of the camera is low enough and the frame rate is very high, the relative displacement between two consecutive images becomes very small. The 2D displacement of image points can then be obtained from an image sequence using the optical flow. In this case, the 3D camera motion is described by a rigid motion using a rotation matrix and a translation vector, as in (Fig. 3) $p (t)= R (t) p (0)+ t (t),$ where differentiating $p ̇ (t)= R ̇ (t) p (0)+ t ̇ (t).$ Then, replacing the parameter $p (0)$ to $R^{−1} (t)[p (t)− t (t)]$ in Eq. (5), the following equation is obtained: $p ̇ (t)= R ̇ (t) R^{−1} (t) p (t)+ t ̇ (t)− R ̇ (t) R^{−1} (t) t (t),$ which leads to the following differential epipolar constraint: $q^{T} υ ̂ q ̇ + q^{T} ω ̂ υ ̂ q =0.$ $ω =(ω_{1},ω_{2},ω_{3})^{T}$ is the angular velocity of the camera and $υ =(υ_{1},υ_{2},υ_{3})^{T}$ is the linear velocity of the camera. By projecting $p$ and $p ̇$ in the image plane, $q$ in camera coordinates and its corresponding optical flow $q ̇$ are obtained.

For a complete demonstration, the reader is directed to Haralick's book, Chapter 15 [15], where the movement of a rigid body related to a camera is explained. In our case, the demonstration is used to describe the movement of a camera related to a static object, in which only the sign of the obtained velocities differs from the previous one. Nevertheless, Eq. (7) can also be demonstrated in different ways as explained in Viéville [16] and Brooks [17]. Also, another equivalent form of Eq. (7) is shown in Eq. (8). In this case, since matrix $s$ is symmetric, the number of unknowns is reduced to six $q^{T} υ ̂ q ̇ + q^{T} S q =0,$ where $S = 12 (ω ̂ υ ̂ + υ ̂ ω ̂).$

The existence of two forms indicates that a redundancy exists in Eq. (7) (for a demonstration see Viéville [16], Brooks [17] and Ma [18]). Several books describe the optical flow such as Trucco and Verri [19], and the article published by Barron et al. [20] gives a state-of-the-art in optical flow estimation.

When comparing the discrete and differential methods, the discrete epipolar equation incorporates a single matrix, whereas the differential epipolar equation incorporates two matrices. These matrices encode information about the linear and angular velocities of the camera [15].

Approaches to motion estimation can be classified into discrete and differential methods depending on whether they use a set of point correspondences or optical flow. Another possible classification takes into account the estimation techniques used for motion recovery (linear or non-linear techniques). In Table 1, the algorithms are summarized and classified in terms of their nature (discrete and differential case), and estimation method (linear and non-linear technique).

This article analyzes several different algorithms for camera motion estimation based on differential image motion. The surveyed methods have been compared with them and experimental results are given. Moreover, this article analyzes the adaptation of general methods used in free 3D movement to planar motion which corresponds to the common case of a robot moving on a plane with the aim of studying how much accuracy improves by constraining the camera movement. Hence, this article focuses on linear techniques, as the motion has to be recovered in real-time.

This article is structured as follows. Section 2 describes up to 12 algorithms for 3D motion estimation based on optical flow. Section 3 focuses on the estimation of planar motion by constraining the free movement explained in the previous section. Then, Section 4 deals with the experimental results obtained. The article ends with conclusions.

Section snippets

Overview of 3D motion estimation

In this section, we detail some methods used for the recovery of every 6-DOF³ motion parameter from optical flow, providing insights into the complexity of the problem. The surveyed methods have been classified considering whether they are based on the Differential Epipolar Constraint or not.

Adaptation to a mobile robots

The aim of this work is to estimate the motion of a mobile robot. Due to the fact that the permitted movements of a robot are limited, it is possible to establish some modifications in the differential epipolar equation by applying new constraints. With these modifications, the number of potential solutions is reduced so the obtained results improve considerably.

Our robot (see Fig. 5) is constrained to only two independent movements: a translation along $r_{x}$ axis and a rotation around $r_{z}$ axis. $^{R} υ_{r}$

Experimental results

All the methods surveyed have been programmed and tested under the same conditions of image noise with the aim of giving an exhaustive comparison of most of 6-DOF motion estimation methods. Hence, Section 4.1 compares the twelve surveyed methods of 3D motion estimation and Section 4.2 deals with the six proposed adaptations to a 2-DOF mobile robot movement estimation. Section 4.3 shows results in real image sequences.

Conclusions

This article presents an up-to-date classification of the methods and techniques used to estimate the movement of a single camera. A survey of several motion recovering methods is done and experimental results are given with synthetic data considering both gaussian noise and outliers.

The general methods to estimate a 6-DOF movement have been adapted to the common case of a mobile robot moving on a plane obtaining better results and stability even under important noise conditions.

This article is

Acknowledgements

We greatly appreciate Dr. Tina Y. Tian, Dr. Carlo Tomasi and Dr. David J. Heeger who implemented the methods explained in Section 2.2 which have been compared with the rest of methods explained in the article, specially Dr. Heeger who gave us insightful information and the source code of such methods.

About the Author—XAVIER ARMANGUÉ received the B.S. degree in Computer Science in the University of Girona in 1999 before joining the Computer Vision and Robotics Group. At present he is involved in the study of stereovision systems for mobile robotics and he is working for his Ph.D. in the Computer Vision and Robotics Group in the University of Girona and in the Institute of Systems and Robotics in the University of Coimbra.

References (48)

J. Salvi et al.
A robust-coded pattern projection for dynamic 3D scene measurement
Pattern Recognition Lett.
(1998)
J. Salvi et al.
A comparative review of camera calibrating methods with accuracy evaluation
Pattern Recognition
(2002)
X. Armangué et al.
Overall view regarding fundamental matrix estimation
Image Vision Comput.
(2003)
T. Viéville et al.
The first order expansion of motion equations in the uncalibrated case
Comput. Vision Graphics Image Process.: Image Understanding
(1996)
X. Zhuang et al.
A simplified linear optic flow-motion algorithm
Comput. Vision Graphics Image Process.
(1988)
K. Prazdny
Determining the instantaneous direction of motion from optical flow generated by a curvilinearly moving observer
Comput. Graphics Image Process.
(1981)
A.R. Bruss et al.
Passive navigation
Comput. Vision Graphics Image Process.
(1983)
E.L. Hall et al.
Measuring curved surfaces for robot vision
Comput. J.
(1982)
O.D. Faugeras, G. Toscani, The calibration problem for stereo, in: Proceedings of the IEEE Conference on Computer...
R.Y. Tsai
A versatile camera calibration technique for high-accuracy 3D machine vision metrology using off-the-shelf TV cameras and lenses
IEEE Int. J. Robotics Automat.
(1987)

J. Weng et al.

Camera calibration with distortion models and accuracy evaluation

IEEE Trans. Pattern Anal. Mach. Intell.

(1992)

R.I. Hartley

Kruppa's equations derived from the fundamental matrix

Pattern Anal. Mach. Intell.

(1997)

H.C. Longuet-Higgins

A computer algorithm for reconstructing a scene from two projections

Nature

(1981)

T.S. Huang et al.

Some properties of the E matrix in two-view motion estimation

IEEE Trans. Pattern Anal. Mach. Intell.

(1989)

O.D. Faugeras

Three-Dimensional Computer Vision

(1993)

Z. Zhang

Determining the epipolar geometry and its uncertaintya review

Int. J. Comput. Vision

(1998)

Q.-T. Luong et al.

The fundamental matrixtheory, algorithms, and stability analysis

Int. J. Comput. Vision

(1996)

P.H.S. Torr et al.

The development and comparison of robust methods for estimating the fundamental matrix

Int. J. Comput. Vision

(1997)

R.M. Haralick, L.G. Shapiro, Computer and Robot Vision, Vol. 2, Addison-Wesley Publishing Company, Reading, MA,...

M.J. Brooks et al.

Determining the ego-motion of an uncalibrated camera from instantaneous optical flow

J. Opt. Soc. Am.

(1997)

Y. Ma et al.

Linear differential algorithm for motion recoverya geometric approach

Int. J. Comput. Vision

(2000)

E. Trucco et al.

Introductory Techniques for 3D Computer Vision

(1998)

J. Barron, D. Fleet, S. Beauchemin, T. Burkitt, Performance of optical flow techniques, in: Proceedings of the IEEE...

X. Zhuang, R.M. Haralick, Rigid body motion on optical flow image, in: Proceedings of the First International...

Cited by (30)

Camera motion estimation through monocular normal flow vectors
2015, Pattern Recognition Letters
Citation Excerpt :
The matched features can describe the motion correspondence and can be used to estimate the epipolar geometry and the fundamental matrix. Finally, the camera motion parameters can be obtained by decomposing the fundamental matrix [2,5]. However, establishing accurate feature correspondences is itself a very challenging task.
In this paper, we propose a method to directly estimate a camera's motion parameters by using normal flow vectors. In contrast to traditional methods, which tackle the problem by calculating optical flows or establishing motion correspondences, our proposed approach does not require conventional assumptions about the captured scene, such as consistent smoothness or distinct feature availability. In the proposed algorithm, the normal flows are classified into different groups, and each group will provide a possible solution regarding the camera's motion parameters. Then, the strategy of hypothesis and confirmation is adopted to eliminate the incorrect solutions. Finally, the optimal solution is obtained via the clustering algorithm. We have tested the proposed method on both synthetic image data and real image sequences. The experimental results illustrate the feasibility and reliability of the algorithm.
A review and evaluation of methods estimating ego-motion
2012, Computer Vision and Image Understanding
Citation Excerpt :
In our simulations we focused on properties like statistical bias, consistency, robustness, and fusion between visual image motion and depth incorporation. Armangue et al. proposed a review on the estimation of ego-motion where they focused on methods utilizing the discrete epipolar constraint and the adaptation of these methods to the setting of their mobile robotics platform [3]. Due to the different focus of this study it is a good complement to ours.
If a visual observer moves through an environment, the patterns of light that impinge its retina vary leading to changes in sensed brightness. Spatial shifts of brightness patterns in the 2D image over time are called optic flow. In contrast to optic flow visual motion fields denote the displacement of 3D scene points projected onto the camera’s sensor surface. For translational and rotational movement through a rigid scene parametric models of visual motion fields have been defined. Besides ego-motion these models provide access to relative depth, and both ego-motion and depth information is useful for visual navigation.
In the past 30 years methods for ego-motion estimation based on models of visual motion fields have been developed. In this review we identify five core optimization constraints which are used by 13 methods together with different optimization techniques.¹ In the literature methods for ego-motion estimation typically have been evaluated by using an error measure which tests only a specific ego-motion. Furthermore, most simulation studies used only a Gaussian noise model. Unlike, we test multiple types and instances of ego-motion. One type is a fixating ego-motion, another type is a curve-linear ego-motion. Based on simulations we study properties like statistical bias, consistency, variability of depths, and the robustness of the methods with respect to a Gaussian or outlier noise model. In order to achieve an improvement of estimates for noisy visual motion fields, part of the 13 methods are combined with techniques for robust estimation like m-functions or RANSAC. Furthermore, a realistic scenario of a stereo image sequence has been generated and used to evaluate methods of ego-motion estimation provided by estimated optic flow and depth information.
Outlier rejection by oriented tracks to aid pose estimation from video
2006, Pattern Recognition Letters
This paper introduces a method for rejecting the false matches of points between successive views in a video sequence used to perform Pose from Motion for a mobile sensing platform. Typical methods for pose estimation require point correspondences to estimate the epipolar geometry between the two views. Algorithms for determining these correspondences invariably output false matches along with the good. We present an algorithm for identifying and removing these mismatches for scenes generated by a mobile scanning platform. The algorithm utilizes the motion characteristics of a rear-wheel drive sensing platform to identify correct point matches through their common motion trajectories. Our algorithm works in cases where the percentage of false matches may be as high as 80%, providing a set of correspondences whose correct/incorrect match ratio is higher than the mutual best match approach found in the literature. This algorithm is intended as a post-processing step for any point correspondence algorithm and its output can be used in standard pose estimation algorithms to enhance their speed and accuracy. Experimental results show the computational savings of our approach over the mutual best match method, resulting in comparable or better outlier rejection—increasing the true/false match ratio by 2–3 times—in only a fraction of the time.
Survey of Ship Detection in Video Surveillance Based on Shallow Machine Learning
2021, Xitong Fangzhen Xuebao / Journal of System Simulation
Methodology for indoor positioning and landing of an unmanned aerial vehicle in a smart manufacturing plant for light part delivery
2020, Electronics (Switzerland)
CCTV Scene Perspective Distortion Estimation From Low-Level Motion Features
2016, IEEE Transactions on Circuits and Systems for Video Technology

View all citing articles on Scopus

About the Author—HELDER ARAÚJO is currently Associate Professor at the Department of Electrical and Computer Engineering of the University of Coimbra. He is Deputy Director of the Institute for Systems and Robotics, Coimbra. His main research interests are computer vision and mobile robotics. He has been working in vision and robotics for the last 13 years.

About the Author—JOAQUIM SALVI graduated in Computer Science in the Polytechnical University of Catalunya in 1993. He joined the Computer Vision and Robotics Group in the University of Girona, where he received the M.S. degree in Computer Science in July 1996 and the Ph.D in Industrial Engineering in January 1998. He received the best thesis award in Industrial Engineering of the University of Girona. At present, he is an associate professor in the Electronics, Computer Engineering and Automation Department of the University of Girona. His current interest are in the field of computer vision and mobile robotics, focusing on structured light, stereovision and camera calibration.

¹: This research is partly supported by Spanish project CICYT-TAP99-0443-C05-01.

View full text

A review on egomotion by means of differential epipolar geometry applied to the movement of a mobile robot

Abstract

Introduction

Section snippets

Overview of 3D motion estimation

Adaptation to a mobile robots

Experimental results

Conclusions

Acknowledgements

Pattern Recognition Lett.

Pattern Recognition

Image Vision Comput.

Comput. Vision Graphics Image Process.: Image Understanding

Comput. Vision Graphics Image Process.

Comput. Graphics Image Process.

Comput. Vision Graphics Image Process.

Measuring curved surfaces for robot vision

Comput. J.

A versatile camera calibration technique for high-accuracy 3D machine vision metrology using off-the-shelf TV cameras and lenses

IEEE Int. J. Robotics Automat.

Camera calibration with distortion models and accuracy evaluation

IEEE Trans. Pattern Anal. Mach. Intell.

Kruppa's equations derived from the fundamental matrix

Pattern Anal. Mach. Intell.

A computer algorithm for reconstructing a scene from two projections

Nature

Some properties of the E matrix in two-view motion estimation

IEEE Trans. Pattern Anal. Mach. Intell.

Three-Dimensional Computer Vision

Determining the epipolar geometry and its uncertaintya review

Int. J. Comput. Vision

The fundamental matrixtheory, algorithms, and stability analysis

Int. J. Comput. Vision

The development and comparison of robust methods for estimating the fundamental matrix

Int. J. Comput. Vision

Determining the ego-motion of an uncalibrated camera from instantaneous optical flow

J. Opt. Soc. Am.

Linear differential algorithm for motion recoverya geometric approach

Int. J. Comput. Vision

Introductory Techniques for 3D Computer Vision