Self-calibration for a non-central catadioptric camera with approximate epipolar geometry

In spite of wide usage of catadioptric cameras, self-calibration for a non-central catadioptric camera remains very challenging work. In this paper, a novel self-calibration method for a non-central catadioptric camera is proposed. First, an approximate central projection is introduced. With a freely movable virtual viewpoint, the approximate central projection is able to compensate well the bias caused by the misalignment. Then a derived approximate epipolar geometry (AEG) for the non-central camera under the two-view case is given. Based on the AEG, a full self-calibration method using only two images is designed. Combined with an automatic initialization algorithm, the non-central catadioptric camera can be calibrated robustly without any manual intervention. Experiments on both synthetic data and real images verify the effectiveness of the proposed method.


Introduction
Featuring a large field of view, in recent years omni-directional catadioptric cameras have become increasingly popular in many areas such as robot navigation [1], video surveillance, etc. Their special imaging geometry greatly facilitates applications such as 3D reconstruction and map building [2]. Generally, catadioptric cameras can be classified into two categories: central or non-central. Central catadioptric cameras, which have an effective viewpoint, are usually composed of a special mirror and an appropriate projective camera aligned with the mirror's axis [3]. The most commonly used mirror types are parabolic, hyperbolic and elliptical. The imaging model and the calibration of central catadioptric camera have been studied in depth in recent years. Geyer et al [4] presented a unifying theory for all central catadioptric systems, making the central model much more concise and clear, and also verifying that a central catadioptric camera can be calibrated from one image with more than three lines. Based on this model, a calibration method using planar patterns is proposed by Mei and Rives [5]. The epipolar geometry for central catadioptric cameras is given in [6].
In practice, however, it is very hard to obtain an absolute central catadioptric camera considering the manufacturing error of the mirror and the existence of the misalignment between the reflective mirror and the projective camera. Furthermore, many mirrors do not belong to the very limited 'central list'. Therefore, the non-central systems are more general and have to be well studied. In recent years, great efforts have been made to model and calibrate non-central catadioptric cameras. So far two kinds of model are frequently used: the black-box model [7] and the separate model [8][9][10].
In the black-box model, the system is considered a black box and the calibration task is converted into setting up a list of correspondences between the imaging pixels and their 3D rays. Special structured light patterns [11] could be used to finish this task. Caustic surface presentation [11] can also be classified into this category. The black-box model is effective for any mirror shapes, while suffering from incompact model parameters and a complex calibrating process. On the other hand, the separate model considers the system as a combination of a reflective mirror and a projective camera, and models them separately with more than 20 parameters [8][9][10]. In [8], a two-step calibration algorithm is proposed by using the blackbox model [7] first, followed by an iterative bundle adjustment. In particular, given the known mirror and the camera's intrinsic parameters, a simplified model concerning only the relative position between the mirror and the camera can be computed by a closed-form solution using the imaged mirror boundary [10,12]. Epipolar geometry is the key theory for self-calibration and structure from motion for common perspective cameras [13]. In the robot navigation area, it has also been widely used for visual control purposes [14]. However, without a single projection center, the original epipolar geometry for central cameras becomes invalid for non-central systems, which brings a great challenge to the research of self-calibration. Few works have been done in this area. Knowing the mirror type and assuming the coaxial mounting of the mirror and the camera, Micusik et al [9] proposed a self-calibration method for non-central cameras. First an approximate central camera is calibrated and then the non-central model is used to finish the 3D reconstruction. The method still suffers from a complex model, which has to be set up separately for each type of mirror.
In this paper, a novel self-calibrating approach for a noncentral catadioptric camera using only a pair of images is presented. We mainly focus on the non-central catadioptric camera system caused by the misalignment. No a priori knowledge about the camera and the mirror shape or the constraint of coaxial assembly is needed in our method. First an approximate central projection for non-central systems is presented. Then the resulting approximate epipolar geometry (AEG) theory, which is the essential constraint for the two-view based selfcalibration, is established. Finally a self-calibration method is proposed to compute the system parameters and reconstruct the scene. The excellent 3D reconstruction accuracy obtained from experiments demonstrates that the proposed AEG is reasonable and suitable for non-central catadioptric cameras.
The main contributions of this work are the following: (1) a novel approximate epipolar geometry for non-central catadioptric cameras is presented; (2) based on AEG, a full self-calibrating algorithm for non-central systems using only two images is proposed; (3) the universality of the calibration method is verified by testing on different types of non-central camera.
Section 2 introduces the approximate central projection. The resulting approximate epipolar geometry of the non-central systems under the two-view case is described in section 3. The full self-calibration method, as well as the initialization algorithm, is presented in section 4. Experimental results are reported in section 5, and section 6 concludes the paper.

The approximate central projection
For a central system, the unified central model [4] is popular for its compactness and simplicity. Figures 1(a) and (b) show the classical configuration of a central catadioptric camera and its corresponding unified central model [4], respectively. When misalignment happens, an extra unknown rigid body transformation exists between the camera and the mirror, as shown in figure 1(c). The central camera system hence becomes non-central and the model in (b) is no longer applicable. By carefully observing the images obtained, we find that in most cases such a resulting non-central system can still be approximated by a single effective viewpoint. However, the position of the viewpoint is changed and becomes unknown. Applying a rigid body transformation to the frame of the effective viewpoint F p in the central model, an approximate model is obtained, as shown in figure 1(d).
By simply rotating the Z-axis of the frame F m and making it parallel with the z-axis in the frame F p in figure 1(d), a resulting approximate central projection is obtained, as shown in figure 2. Unlike in the central model, the new model does not require Z S , the optical axis of the effective projection, to go through the unit sphere center. The original effective projection center ′ C p is hence replaced by C p . C p can be moved where R w and T w are the rotation and translation between the world coordinate and the mirror coordinate respectively. Then X ( ) m F m is projected to point X ( ) s F m on the unit sphere with The following step is a projection of X ( ) s Fp to m u (x, y, 1) T on the normalized imaging plane Π mu , with Finally, with a generalized perspective projection matrix K, m u is projected to p(u, v, 1) T on the image plane (Π p ) by where γ 1 = f 1 η and γ 2 = f 2 η are the focal lengths of the generalized perspective projection with f 1 and f 2 the focal lengths of the real perspective camera and η a ratio that depends on the mirror shape. Within this model, we no longer consider the catadioptric camera as a separate mirror and a camera but as a global system. f 1 , f 2 and η cannot be estimated independently and only γ 1 and γ 2 can be estimated. α is the skew factor and (u 0 , v 0 ) the principal point. In a real system, distortion of the projective camera always exists. To compensate for radial distortion, (6) is used.
Unlike the case in [5], tangential distortion is no longer needed in the new model since the effect has been well represented by ξ 1 and ξ 2 .

The backward projection
Epipolar geometry describes the constraint between the rays connecting the world point to the two projection centers. To obtain the epipolar geometry, first the backward projection from an image point p(u, v, 1) T to the ray X ( ) s Fm should be derived.
Reversing the generalized perspective projection, the point m u on the normalized plane is Then from (4), it is easy to get By unfolding (10) and (11) can be obtained.
Then we can solve for λ, The two λ values correspond to a pair of points of the ray emitting from C P intersecting with the unit sphere. According to the approximate model shown in figure 2, the larger λ is the right one. Hence . Substituting (13) into (9) results in With (14), the unit vector of X ( ) s Fm can be easily obtained given the image point p and the parameters of the projection model.

The approximate epipolar geometry
Viewing the same scene from different positions, several images could be acquired. These images are related by the constraint of epipolar geometry. Here only the two-view case is considered, as shown in figure 3.
Let frames F m1 and F m2 describe the two different positions of a camera, with R and T the transformations between them. For a 3D point X m (x w , y w , z w ) in the world coordinate, p 1 = (u 1 , v 1 , 1) T and p 2 = (u 2 , v 2 , 1) T are the corresponding image points of X m in the two views. With (14), it is easy to calculate X ( ) s1 Fm1 and X ( ) s2 Fm2 from p 1 and p 2 . Like the situation in central cameras, X ( ) s1 Fm1 and X ( ) s2 Fm2 should satisfy the following equation: Recording T R as essential matrix E , the constraints between X ( ) s1 Fm1 and X ( ) s2 Fm2 can be rewritten as Substituting (14) into (16), the epipolar constraints of two views can be obtained, as shown in (17).
Equation (17) is the obtained approximate epipolar geoemetry for non-central catadioptric cameras. In (17), ξ and K are the unknown intrinsic parameters of the camera, and E is the unknown extrinsic parameters relating two views. According to AEG, given a set of matched image feature points p p , the essential matrix E could be computed if the camera's intrinsic parameters are known.

Self-calibration
In the AEG constraint equation (17), besides the rigid body transformation between two views, the misaligned projection center ξ and the projection matrix K as well as the radial distortion coefficients k 1 and k 2 are all unknown parameters and have to be estimated through calibration.    (a) Given two images of the scene, extracting and matching SIFT [15] features produces a set of initial matched feature points (u 1 , u 2 ). SIFT features are widely used in camera self-calibration algorithms due to their outstanding performance in extracting and describing scale and rotation invariant feature points [16]. Although other recent variants such as SURF [17] or BRISK have improved computational efficiency, we found that SIFT achieved the best matching performance in this application. (b) Initialize the intrinsic parameters using the algorithm described in section 4.2. (c) Given the intrinsic parameters, eliminate some of the outliers by calculating E with the normalized eight-point algorithm with RANSAC [18] by using (17). (d) Given the inliers, compute the intrinsic parameters by minimizing the reprojection error using the nonlinear optimization method. With the essential matrix E at hand, four solutions of R and T can be computed by using singular value decomposition (SVD). Imposing a positive depth constraint, the correct R and T as well as the 3D points can be estimated. Like any other two-view based reconstruction, T and the reconstructed 3D points can only be determined with a scale factor. Due to the existence of noise, the midpoint of the common intersecting perpendicular of the reconstructing rays is regarded as the reconstructed 3D points. Then with the forward projection shown in (1)-(5), the reprojection of the 3D points can be obtained. Let X = [γ 1 , γ 2 , α, u 0 , v 0 , ξ 1 , ξ 2 , ξ 3 , k 1 , k 2 ] T be the unknown intrinsic parameters, then where G ji (•) (j = 1, 2) is the integration of (1)-(5), representing the imaging process of the ith 3D point in the jth view given the intrinsic parameters X, and p ji (j = 1, 2) are the corresponding observed imaging points in the jth view. A nonlinear optimization method, i.e. the Levenberg-Marquardt method, is employed to solve (18).
(e) The parameters obtained from step 4 are not good enough due to the existence of some outliers and insufficient true inliers. Considering the inaccurate initialization parameters, some of the inliers are wrongly discarded as outliers as well. By iterating steps 3 and 4, more outliers can be filtered out and more inner point pairs can be retrieved, leading gradually to better estimation of the parameters. This strategy greatly improves the accuracy and robustness of the whole self-calibration method. In practice, fewer than five iterations are needed for final convergence.

The initialization algorithm
To quickly find the unknown intrinsic parameters, the initialization of the model is needed. In this paper, an effective algorithm is presented to estimate the initial values. The center of the ellipse of the mirror border in the image can be used to initialize the principal point (u 0 , v 0 ). Several methods can be used to extract the mirror border automatically, such as the one used in [12].
For initialization, assuming k 1 ≈ k 2 ≈ α ≈ ξ 1 ≈ ξ 2 ≈ 0, ξ 3 ≈ 1, and γ 1 ≈ γ 2 ≈ γ is appropriate. Given     With the above initial value for K and ξ, the AEG (17) can be simplified to (19) can be repacked as (21)  Equation (21) corresponds to a polynomial eigenvalue problem (PEP) for which an efficient and most recent solution can be found in [19]. In MATLAB, there is a special function named 'polyeig' to resolve the problem. Generally there are 36 solutions for (21), among which most are zero, infinite or complex; only a few real solutions are useful and worth considering. According to (21), every nine feature point pairs can construct a resolvable PEP, hence a RANSAC-like estimation process is adopted to obtain the best solution. In the process of RANSAC, not every PEP can produce a real solution due to the existence of noises and outliers. In this case the solutions from this PEP are simply discarded. Finally, a best γ can be obtained by choosing the solution with the minimum reprojection error in the images.

Experiments
To verify the AEG theory and the proposed self-calibration method, simulation as well as real data experiments are carried out. For both experiments, the calibration algorithms are implemented using MATLAB R2010b version 7.11.0.584. The computer is equipped with an Intel Core2 duo E4500 2.2 GHz processor with 2 GB RAM, running with Windows 7 Professional operating system.

Simulation results
The simulation catadioptric camera consists of a slightly misaligned (about °5 ) hyperbolic mirror (a = 0.0281 m, b = 0.0234 m, c = 0.0366 m, d = 2c = 0.0731 m, p = b 2 /2a = 0.0097 m) and a perspective camera (resolution × 800 600, u 0 = 400,v 0 = 300, α = 0, f 1 = f 2 = 937.5). The synthetic scene is a cube without a top with the inner surfaces pasted with textured images. Two synthetic images captured at different positions are produced by using POV-Ray [20], free software capable of producing high quality synthetic graphics via text based three dimensional scene description. The first (or the left) image is shown in figure 5(a). The initial matching of SIFT feature points is shown in figure 5(b), where some mismatches exist. Figure 5(c) shows the final inliers after optimization, where few wrong mismatches can be seen.
By applying the proposed self-calibration method introduced in section 4, the full intrinsic parameters of the approximate central projection as well as the relative position of the two views are obtained, as shown in table 1. All of the parameters are correctly calibrated and the reprojection error is very small. In the experiment, the algorithm converges quickly using only four iterations. Table 2 illustrates the change of inliers after each iteration.
Knowing the intrinsic parameters and the relative position, the scene can be reconstructed. Figure 6 shows the reconstructed feature points viewing from three orthogonal directions. It can be seen that all of the planes of the cube are correctly reconstructed.    the problem leads to the degradation of the method to high noise. Fortunately, in practice the typical noise covariance is under 0.5. Therefore stable and accurate calibration results can always be obtained, as shown in a later section. Different levels of misalignment can also be induced in the camera system. Usually the image distortion caused by the rotation misalignment is much larger than that by the translation. Here the situation of misalignment is simulated by rotating the mirror around its focus laterally (around the X or Y axis) with different angles up to 20 degrees. For each configuration, the camera is calibrated by the AEG self-calibration method.
For comparison, the standard epipolar geometry derived from unifying models [5] is also employed for calibration. The results are shown in figure 8. Although the reprojection errors are comparable, the real reconstruction angles calculated by AEG are much better than those from the model given in [5].
To verify the universality of AEG theory and the calibration method, similar experiments are carried out on synthetic misaligned catadioptric cameras with parabolic and spherical mirrors respectively. Considering the rotational symmetry of the sphere mirror, the camera is misaligned by translating the mirror laterally (along the X axis or Y axis). The calibration results are shown in figures 9 and 10 respectively. Again, for both mirrors the average angle errors from AEG are much smaller than in [5]. However, for the spherical mirror the performance gap between the two models becomes smaller. Under the condition of 5 mm misalignment, the angle error from AEG is 3.05 degrees, slightly worse than the 2.62 degrees obtained from the method in [5]. This is due to the symmetry characteristics of the spherical mirror, and the translation misalignment can be largely compensated by just moving the position of the principal point. In this case the noise and the non-central nature of the spherical mirror play an important role in the remaining small reconstructive error.

Real data experiments
The self-calibration method is further tested on real images captured by a catadioptric camera consisting of an H3S  Real cameras with different levels of misalignment are calibrated. In each experiment the relative position between the mirror and the camera is first computed by our previous method [12] and then the misalignment angles are recorded as the reference of the configuration. Figure 11 shows one of the real images captured, its initial feature matches and the final matched inliers. Similarly as in the simulation experiments, a trihedral object is employed and the included angles between the planes are computed.
The mean values of the reconstructed angle errors with respect to the misalignment levels are shown in figure 12. It illustrates similar performance to that in the simulation, exhibiting the accuracy and stability of the proposed self-calibration method.
To further verify the accuracy of the computed R and T between the left and right images, a new experiment is introduced. A catadioptric camera with a misalignment of about °10 is employed. By positioning the catadioptric camera around a circle, a series of images IMG 1 , IMG 2 , IMG 3 , …, IMG m (in this experiment m is 19, IMG 1 and IMG 19 are the same image) can be acquired. After calibrating the catadioptric camera, R and T between each pair of adjacent images can be obtained. A unified scale of T can be computed by considering the correspondence between these reconstructed 3D points. Then all of the acquisition positions can be drawn in a unified frame, as shown in figure 13. The red circle at (0, 0) is the position for IMG 1 . The green circle is the computed position for the last image IMG 19 , which should overlap with the red circle under the ideal case. As illustrated in the figure, the overlapping error is very small and the sequence of R and T estimated by the AEG method is quite accurate even under such severe misalignment.

Conclusions and future work
A novel self-calibration method for non-central catadioptric cameras is proposed in this paper. To make the epipolar geometry applicable in the calibration of a non-central system, an improved version named approximate epipolar geometry is proposed. Based on a movable virtual viewpoint, AEG established a reasonable ray mapping between two non-central catadioptric cameras. Based on the AEG, a full self-calibration method using a pair of images is presented. An effective initialization algorithm for the optimizer is given as well. The whole calibration process is automatic and no specific calibration target is needed. The method is flexible and requires no a priori knowledge about the camera or the mirror shape. Simulation and real data produced by different mirror types with varying levels of misalignment have been used in experiment, and the results proved the accuracy and robustness of the proposed method.
The future work will focus on improving the computational efficiency of the calibration algorithm and make it applicable for on-the-fly calibration purposes.