A Method for Extrinsic Parameter Calibration of Rotating Binocular Stereo Vision Using a Single Feature Point

Nowadays, binocular stereo vision (BSV) is extensively used in real-time 3D reconstruction, which requires cameras to quickly implement self-calibration. At present, the camera parameters are typically estimated through iterative optimization. The calibration accuracy is high, but the process is time consuming. Hence, a system of BSV with rotating and non-zooming cameras is established in this study, in which the cameras can rotate horizontally and vertically. The cameras’ intrinsic parameters and initial position are estimated in advance by using Zhang’s calibration method. Only the yaw rotation angle in the horizontal direction and pitch in the vertical direction for each camera should be obtained during rotation. Therefore, we present a novel self-calibration method by using a single feature point and transform the imaging model of the pitch and yaw into a quadratic equation of the tangent value of the pitch. The closed-form solutions of the pitch and yaw can be obtained with known approximate values, which avoid the iterative convergence problem. Computer simulation and physical experiments prove the feasibility of the proposed method. Additionally, we compare the proposed method with Zhang’s method. Our experimental data indicate that the averages of the absolute errors of the Euler angles and translation vectors relative to the reference values are less than 0.21° and 6.6 mm, respectively, and the averages of the relative errors of 3D reconstruction coordinates do not exceed 4.2%.


Introduction
In recent years, with the advancement in computer vision and image technology, binocular stereo vision has been extensively used in three-dimensional (3D) reconstruction, navigation, and video surveillance. This vision method requires that the cameras possess a higher calibration accuracy and better real-time calibration to satisfy the requirements of practical engineering applications. Therefore, research on camera calibration technology of binocular stereo vision is important both theoretically and practically.
Camera calibration technology is primarily divided according to traditional [1][2][3] and self-calibration [4][5][6] methods. Traditional calibration methods require precision-machined targets, which employ the known 3D world coordinates of control points and their image coordinates to calculate the cameras' intrinsic and extrinsic parameters. These methods include the direct linear transformation (DLT) [7], Tsai's radial alignment constraint (RAC) [8], Wen's iterative calibration [9], and double-plane calibration [10]. The traditional methods have high precision, but their algorithm is complex and the calibration process is time consuming and laborious. Hence, its practical application will be significantly limited. In the 1990s, Faugeras and Maybank [11] first proposed the idea of self-calibrating cameras. The self-calibration methods only require the constraint relation from the image sequence without the aid of a calibration block, possibly allowing the camera parameters to be obtained online and in real time. This method can satisfy the requirements in some special occasions where the cameras' focal length should often be adjusted or the cameras' position will move according to the surrounding environment.
In 1992, Faugeras proposed a self-calibration method by directly solving the Kruppa equation [12], which is computationally complex and sensitive to noise. Given the difficulty in solving the Kruppa equation, Hartley et al. proposed a QR decomposition method in 1994 by using the hierarchical step-by-step calibration [13]. The projection matrix is decomposed by QR, but the method requires the initial value for an effective calibration. Triggs et al. proposed in 1997 a camera self-calibration method based on an absolute quadratic surface [14], which is more effective than the method based on the Kruppa equation with regard to inputting multiple images. The camera self-calibration method based on an active vision system, which controls the camera to perform special motions and uses multiple images captured from various positions to calibrate the camera, is represented using the linear method according to two groups of three-direction orthogonal motions proposed by Ma in 1996 [15]. Generally, the self-calibration methods are poor robustness, and calibration accuracy is lower than that of traditional methods. However, the self-calibration method can effectively utilize various constraints unrelated to the motion of the cameras and some prior knowledge, and render the algorithm more simple and practical. Therefore, the self-calibration method is more flexible and can be utilized in a wider range of applications.
In this work, we primarily investigate the implementation of the camera self-calibration. Hitherto, researchers have performed numerous related works. The typical self-calibration algorithms are to estimate camera parameters through iterative optimization with numerous matching features. For example, Zhang proposed a local-global hybrid iterative optimization method by using the bundle adjustment algorithm and SIFT points matching relationship [16]. Wu proposed a complete model for a pan-tilt-zoom (PTZ) camera. The camera parameters can be quickly and accurately estimated using a series of simple initialization steps followed by a nonlinear optimization from ten images [17]. Junejo provided a novel optimized calibration method for cameras with pure rotation or pan-tilt rotation from two images [18]. These methods presented good calibration accuracy, but were time consuming and unable to implement online calibration. Presently, most camera self-calibration algorithms utilize various constraints in natural scenes. Echigo presented a camera calibration method by using three sets of parallel lines in which the rotation parameters were decoupled from the translation parameters [19]. Song established a calibration method for a dynamic PTZ camera overlooking a traffic scene, which automatically used a set of parallel lane markings and the lane width to compute the focal length, tilt angle, and pan angle [20]. Schoepflin presented a new three-stage algorithm to calibrate roadside traffic management cameras by estimating the lane boundaries and the vanishing point of the lines along the roadway [21]. Kim and Hong proposed a nonlinear self-calibration method of rotating and zooming cameras by using inter-image homography from refined matching lines [22]. However, these methods are suitable for the camera calibration under static conditions, which will not work if the special markers like parallel lines are missing during the rotation of the cameras. In order to solve this problem, Muñoz Rodríguez performed binocular self-calibration by means of an adaptive genetic algorithm based on a laser line [23] and presented an online self-camera orientation for mobile vision based on laser metrology and computer algorithms [24], which avoided calibrated references and physical measurements. Self-calibration algorithms were also performed well by combining with the active vision measurement method. Ji introduced a new rotation-based camera self-calibration method that requires the camera to rotate around an unknown but fixed axis twice [25]. Cai proposed a simple method to obtain the camera intrinsic parameters by observing one planar pattern in at least three different orientations [26]. These methods improved traditional self-calibration algorithm, but could not satisfy the requirements of real-time calibration. In order to achieve fast calibration of the camera, several methods simplified the camera model. Tang proposed a non-iterative self-calibration algorithm for upgrading the projective space to the Euclidean space [27], which combined the typically used metric constraints, including zero skew and unit aspect ratio constant principal points. Yu [28] presented a self-calibration method for moving stereo cameras and introduced some reasonable assumptions, such as principle points located at the image center, camera axis perpendicular to the camera plane, and constant focal length of stereo cameras during movement, to simplify the camera model. De Agapito proposed a linear self-calibration method for a stationary but rotating camera under the minimal assumption of zero skew, known pixel aspect ratio, and known principal point [29]. Once the camera model is simplified, the number of feature points for calibration can be reduced. Sun [30] proposed a novel and effective self-calibration approach for robot vision, which both the camera intrinsic parameters and the hand-eye transformation were estimated by using only two arbitrary feature points. Chen put forward a two-point calibration method for soccer cameras in a narrow field of view, in which only two point correspondences were required given the prior knowledge of base location and orientation of a PTZ camera [31]. The above two methods are rapid and avoid the convergence problems of iterative algorithms.
In this study, we present a novel self-calibration method of BSV with rotating and non-zooming cameras by using a single feature point. The intrinsic parameters of the left and right cameras are estimated in advance by using Zhang's method, as well as the rotation matrix and translation vector at the initial position. The left and right cameras can rotate in the horizontal and vertical directions. Thus, only the two rotation angles for each camera, denoted by pitch and yaw, must be calculated after the rotation. According to the homography of the feature point before and after the rotation, one quadratic equation for the tangent value of the pitch for each camera can be derived. The closed-form solutions of the pitch and yaw can be obtained with known approximate values of the pitch obtained using the SBG angle measuring system. Thus, the rotation angles of the left and right cameras in two directions can be calculated linearly, and then, the extrinsic parameters of the binocular stereo vision after rotation can be obtained. The proposed calibration algorithm is non-iterative and can quickly complete the extrinsic parameter calibration of the rotating cameras, rendering the possibility of estimating the 3D coordinates in real time for the dynamic stereo vision, which can be used for fast positioning of dynamic targets.
The remainder of this paper is organized as follows: Section 2 primarily describes the mathematical model of the BSV with rotating and non-zooming cameras and introduces the extrinsic parameter calibration algorithm by using a single feature point. Section 3 discusses the feasibility of the calibration method. Section 4 explains the virtual simulation and physical experiments to verify the performance of the proposed method. Finally, Section 5 elaborates the conclusions.

Camera model of the Binocular Stereo Vision
The binocular stereo vision (BSV) system with rotating and non-zooming cameras is established, as shown in Figure 1. We describe the intrinsic parameter matrices K 1 and K 2 of the left and right camera as Equation (1), where f uL , f vL and f uR , f vR represent the focal lengths in the column and row directions for the left and right cameras, respectively, and s denotes the skewness of the two image axes, which remains zero in this study: where (u0, v0) is the camera's principal point and fu, fv represent the focal lengths in the column and row directions, (xn, yn) and (xd, yd) are respectively the de-distortion coordinates and distortion coordinates on the imaging plane with normalized focal length. The extrinsic parameters of BSV include rotation matrix R and translation vector T(Tx, Ty, Tz). The vector om is the Rodrigues representation of the rotation matrix, and the rotation matrix can be easily obtained using the Rodrigues transformation. In this study, we use three Euler angles rotating around the X-axis, Y-axis, and Z-axis (denoted by rx, ry, and rz, respectively) to represent the rotation matrix R as shown in Equation (4). Subsequently, rx, ry, rz can be converted by matrix R according to Equation (5), where Rij (i, j = 1, 2, 3) represents the element of matrix R in the i-th row and j-th column: where cos ) sin ) sin ) cos ) The coordinate systems of the left and right cameras at the initial position are Oc1-Xc0Yc0Zc0 and Oc2-Xc0'Yc0'Zc0', respectively. The cameras are stationary but can rotate in horizontal and vertical directions. The coordinate systems of the two cameras after the j-th rotation are denoted by Oc1-XcjYcjZcj and Oc2-Xcj'Ycj'Zcj'. We define the rotation matrix and the translation vector of the coordinate We define the pixel coordinate of the principal point of the left and right cameras' image plane as (u 0L , v 0L ) and (u 0R , v 0R ), respectively. The distortion coefficient vector of a single camera is defined as k c = [k c1 , k c2 , k c3 , k c4 , k c5 ], where k c1 , k c2 and k c5 are respectively the two, four, and six order radial distortion coefficients, and k c3 , k c4 are tangential distortion coefficients. If we have the distortion coordinates (u d , v d ) of one point on the imaging plane, the de-distortion coordinates (u, v) of this point can be obtained according to Equations (2) and (3): f v x d = x n (1 + k c1 r 2 + k c2 r 4 + k c5 r 6 ) + 2k c3 x n y n + k c4 (r 2 + 2x n 2 ) y d = y n (1 + k c1 r 2 + k c2 r 4 + k c5 r 6 ) + k c3 (r 2 + 2y n 2 ) + 2k c4 x n y n r 2 = x n 2 + y n 2 where (u 0 , v 0 ) is the camera's principal point and f u , f v represent the focal lengths in the column and row directions, (x n , y n ) and (x d , y d ) are respectively the de-distortion coordinates and distortion coordinates on the imaging plane with normalized focal length. The extrinsic parameters of BSV include rotation matrix R and translation vector T(T x , T y , T z ). The vector om is the Rodrigues representation of the rotation matrix, and the rotation matrix can be easily obtained using the Rodrigues transformation. In this study, we use three Euler angles rotating around the X-axis, Y-axis, and Z-axis (denoted by r x , r y , and r z , respectively) to represent the rotation matrix R as shown in Equation (4). Subsequently, r x , r y , r z can be converted by matrix R according to Equation (5), where R ij (i, j = 1, 2, 3) represents the element of matrix R in the i-th row and j-th column: where The coordinate systems of the left and right cameras at the initial position are O c1 -X c0 Y c0 Z c0 and O c2 -X c0 'Y c0 'Z c0 ', respectively. The cameras are stationary but can rotate in horizontal and vertical directions. The coordinate systems of the two cameras after the j-th rotation are denoted by O c1 -X cj Y cj Z cj and O c2 -X cj 'Y cj 'Z cj '. We define the rotation matrix and the translation vector of the coordinate system O c1 -X c0 Y c0 Z c0 relative to the coordinate system O c2 -X c0 'Y c0 'Z c0 ' as R 0 and T 0 , as shown in Equation (6). Similarly, the rotation matrix and translation vector after the j-th rotation is denoted by R j and T j : An arbitrary point in 3D is denoted by P, which has two projection points, namely, p L (u L , v L ) and p R (u R , v R ), on the left and right camera image planes after the j-th rotation, respectively. It should be emphasized that the pixel coordinates below are de-distortion coordinates. We define the left camera coordinate system as the world coordinate system, and the origin is the optical center of the left camera. According to the geometric properties of the optical imaging, we have equations Equations (7) and (8), where d xL , d yL and d xR , d yR represent the physical dimensions of the unit pixel of the left and right cameras in the column and row directions, respectively. Given that the two cameras of the same configuration are used in this study, we define d xL = d xR = dx, d yL = d yR = dy. According to Equations (7) and (8), the 3D coordinates of P(X cj , Y cj , Z cj ) in world coordinates can be solved by the least square method after cameras intrinsic and extrinsic parameters are obtained:

Extrinsic Parameter Calibration Using a Single Feature Point
This study aims to calibrate the extrinsic parameters of the BSV system when the two cameras rotate. The intrinsic parameters of the left and right cameras and the translation vector of the two cameras at the initial position can be calibrated in advance offline. As shown in Figure 2a, the coordinate system of the left camera at the initial position is O c1 -X c0 Y c0 Z c0 , and point p in the image plane is the projection point of the feature point P in the 3D world coordinate system at initial time. Assuming the left camera rotates to the position of the blue dotted rectangle after the j-th rotation, the coordinate system of the camera at this moment is O c1 -X cj Y cj Z cj , and point p' in the image plane is the projection point of the same feature point P at this time. The rotation angle of the camera in horizontal and vertical directions is denoted by yaw and pitch, respectively. The approximate values of these two attitude angles can be obtained in real time using the SBG angle measuring instrument, as shown in Figure 2b, in which the precision is ±1 • . Let P(X, Y, Z) represent the 3D coordinates of point P in O c1 -X c0 Y c0 Z c0 , and p(u 0 , v 0 ) and p'(u j , v j ) represent the image pixel coordinates in the image plane.
To simplify the notation, sin(yaw) and cos(yaw) are abbreviated as Sy and Cy, respectively. Meanwhile, sin(pitch) and cos(pitch) are abbreviated as Sp and Cp, respectively. According to the geometric imaging principle, Equations (9) and (10) are obtained for the left camera:  Equations (9) and (10) are simply written as Thus, a mapping relationship exists between the projection points of the same single feature point in the image plane at the initial position and the image plane after the j-th rotation, expressed as Equation (11). This relationship is known as inter-image homography: (11) where μ is the scale factor and μ = Zcj/Zc0.
According to Equation (11), the following two linear equations for Sy and Cy, as shown in Equation (12), can be derived. Equation (12) can be simply represented by a matrix equation, that is, A2×2[Sy Cy] T = b2×1. So we can convert the imaging model into a linear model with respect to sine and cosine of the yaw, where each element in the augmented coefficient matrix is a function of the single variable pitch. Subsequently, the determinant value of the matrix A in Equation (13) can be obtained: If det(A) = 0, then VjCp-fSp = 0. Thus, tan(pitch) = Vj/f. In the following, the tangent value of the angle pitch is simply denoted by Tp. If det(A) ≠ 0, then the solution of Sy and Cy, expressed in Equations (14) and (15), can be obtained according to the linear equation principle: Equations (9) and (10) are simply written as Z c0 Thus, a mapping relationship exists between the projection points of the same single feature point in the image plane at the initial position and the image plane after the j-th rotation, expressed as Equation (11). This relationship is known as inter-image homography: where µ is the scale factor and µ = Z cj /Z c0 . According to Equation (11), the following two linear equations for Sy and Cy, as shown in Equation (12), can be derived. Equation (12) can be simply represented by a matrix equation, that is, A 2×2 [Sy Cy] T = b 2×1 . So we can convert the imaging model into a linear model with respect to sine and cosine of the yaw, where each element in the augmented coefficient matrix is a function of the single variable pitch. Subsequently, the determinant value of the matrix A in Equation (13) can be obtained: If det(A) = 0, then V j Cp-fSp = 0. Thus, tan(pitch) = V j /f. In the following, the tangent value of the angle pitch is simply denoted by Tp. If det(A) = 0, then the solution of Sy and Cy, expressed in Equations (14) and (15), can be obtained according to the linear equation principle: Given that Sy 2 + Cy 2 = 1, Equation (16) can be derived: Subsequently, a quadratic equation of variable Tp, expressed as Equation (17), can be derived if Equation (16) is divided by the square of Cp. So the linear model of the pitch and yaw are transformed into a quadratic equation of the tangent value of the pitch: In the first case, if m = 0, then Tp = −q/n. In the second, if m = 0, then the solution of Tp can be obtained using Equation (18) because the quadratic equation must contain real roots: Notably, Equation (18) contains two solutions at the most. Hence, we utilize the SBG angular measurement instrument in this study to obtain the approximate value of the rotation angle pitch to eliminate the wrong solution. Therefore, in the two cases mentioned above, the correct solution of Tp can be uniquely determined. Given that the rotating angle pitch of the camera in the vertical direction satisfies −90 • < pitch < 90 • , we have Cp > 0. Subsequently, the sine and cosine values of the angle pitch, namely, Sp and Cp, can be obtained using Equation (19) according to the trigonometric function theorem: Once the closed-form solutions of sine and cosine values of the angle pitch are obtained, the rotation angle yaw of the camera in the horizontal direction can be calculated linearly according to Equation (12), that is, [Sy Cy] T = A −1 b. Thus, the rotation angles of the left camera in two directions after the rotation can be uniquely determined. Subsequently, the rotation matrix R lj of the left camera relative to its initial position after the j-th rotation can be obtained. Similarly with the left camera, the closed-form solutions of the angle pitch and yaw of the right camera after the j-th rotation can also be calculated by using the same feature point with the aid of SBG system. Then the rotation matrix R rj of the right camera relative to its initial position after the j-th rotation can be obtained. The calibration method is fast and linear, thereby avoiding the problem of the local optimal solution of iterative algorithms.
In this work, the pedestal of the left and right cameras are fixed, the coordinate systems O c1 -X c0 Y c0 Z c0 and O c2 -X c0 'Y c0 'Z c0 ' of the left and right cameras at the initial position can be mapped to the coordinate systems O c1 -X cj Y cj Z cj and O c2 -X cj 'Y cj 'Z cj ', respectively, after the j-th rotation according to the rotation theorem with a zero translation vector, expressed as Equation (20): If the rotation matrix R 0 and translation vector T 0 of the BSV system at the initial position are known in advance, the mapping relationship between the coordinate system O c1 -X cj Y cj Z cj and O c2 -X cj 'Y cj 'Z cj ', as shown in Equation (21), can be obtained by combining Equations (6) and (20). As can be seen from Equation (21), the rotating matrix R j and translation vector T j (T x , T y , T z ) of the BSV system after the j-th rotation can be obtained as shown in Equation (22). The three Euler angles (i.e., r x , r y , and r z ) representing matrix R j can be solved by Equation (5), and 3D coordinates (X, Y, Z) of the feature point can be estimated by Equations (7) and (8):

Feasibility Analysis
In this work, the left and right cameras can rotate from −45 • to 45 • in the horizontal direction and from −10 • to 10 • in the vertical direction, respectively. To investigate the performance of the proposed method with respect to the posture of one arbitrary camera, we assume that the focal length of this camera is 16 mm and the image size is 1360 pixels × 600 pixels. We vary the angle pitch from −10 • to 10 • , and the angle yaw from −45 • to 45 • with an interval of 1 • . We simulate 256 pairs of matched feature points between the camera's initial position and the position after the rotation for each posture. Hence, 256 repetitive trials are implemented and the Gaussian noise with zero mean and a standard deviation of 0.5 pixel are added to the feature points. Given that the key of the algorithm mentioned in Section 2 lies in the calculation of the angle pitch, we measure the root mean square (RMS) errors between the true values of the pitch and its estimated values to evaluate the calculation accuracy. As shown in Figure 3, the RMS increases with the increasing rotation angle but does not exceed 0.007 • , implying that the proposed algorithm is suitable for all poses of the left and right cameras.
If the rotation matrix R0 and translation vector T0 of the BSV system at the initial position are known in advance, the mapping relationship between the coordinate system Oc1-XcjYcjZcj and Oc2-Xcj'Ycj'Zcj', as shown in Equation (21), can be obtained by combining Equations (6) and (20). As can be seen from Equation (21), the rotating matrix Rj and translation vector Tj(Tx, Ty, Tz) of the BSV system after the j-th rotation can be obtained as shown in Equation (22). The three Euler angles (i.e., rx, ry, and rz) representing matrix Rj can be solved by Equation (5), and 3D coordinates (X, Y, Z) of the feature point can be estimated by Equations (7) and (8):

Feasibility Analysis
In this work, the left and right cameras can rotate from −45° to 45° in the horizontal direction and from −10° to 10° in the vertical direction, respectively. To investigate the performance of the proposed method with respect to the posture of one arbitrary camera, we assume that the focal length of this camera is 16 mm and the image size is 1360 pixels × 600 pixels. We vary the angle pitch from −10° to 10°, and the angle yaw from −45° to 45° with an interval of 1°. We simulate 256 pairs of matched feature points between the camera's initial position and the position after the rotation for each posture. Hence, 256 repetitive trials are implemented and the Gaussian noise with zero mean and a standard deviation of 0.5 pixel are added to the feature points. Given that the key of the algorithm mentioned in Section 2 lies in the calculation of the angle pitch, we measure the root mean square (RMS) errors between the true values of the pitch and its estimated values to evaluate the calculation accuracy. As shown in Figure 3, the RMS increases with the increasing rotation angle but does not exceed 0.007°, implying that the proposed algorithm is suitable for all poses of the left and right cameras.  To investigate the performance with respect to the location of the pixel coordinates, we simulate a total of 8160 pixel points in the image at the initial position of the camera and simple at 10-pixel intervals. The corresponding matched pixel points after the rotation can also be simulated according to these pixel points. For each pixel coordinate location, 256 repetitive trials are implemented, and the Gaussian noise with zero mean and a standard deviation of 0.5 pixel are added to the feature points. The RMS mentioned above with respect to the pixel coordinates is shown in Figure 4. As presented in the figure, the RMS value is less than 0.007 • , implying that the proposed method is feasible regardless of the pixel coordinates of the feature point. To investigate the performance with respect to the location of the pixel coordinates, we simulate a total of 8160 pixel points in the image at the initial position of the camera and simple at 10-pixel intervals. The corresponding matched pixel points after the rotation can also be simulated according to these pixel points. For each pixel coordinate location, 256 repetitive trials are implemented, and the Gaussian noise with zero mean and a standard deviation of 0.5 pixel are added to the feature points. The RMS mentioned above with respect to the pixel coordinates is shown in Figure 4. As presented in the figure, the RMS value is less than 0.007°, implying that the proposed method is feasible regardless of the pixel coordinates of the feature point.

Computer Simulation
The algorithm proposed in Section 2 is a self-calibration method by using a single feature point.

Computer Simulation
The algorithm proposed in Section 2 is a self-calibration method by using a single feature point. In each simulation experiment, a pair of feature points is randomly selected for the extrinsic parameters estimation, and the experiment is repeated 216 times in each noise level. The RMS values of the three Euler angles (i.e., r x , r y , and r z ), the translation vector T = [T x , T y , T z ] T , and 3D reconstruction coordinates (X, Y, Z) are used to evaluate the calibration accuracy, as shown in Figure 5. The Gaussian noise with zero mean and standard deviation of 0.1-1 pixel with an interval of 0.1 pixel are added into the feature points. As shown in Figure 5, the RMS increases linearly with the increasing noise level, but the calibration error of the extrinsic parameters and the 3D reconstruction error are low when the noise level reaches 1 pixel.

Physical Experiment
To verify the proposed self-calibration method, the system of binocular stereo vision with rotating cameras is constructed, as shown in Figure 6. Two gigabit network cameras with the same configuration are installed on the horizontal platform. The two cameras can only rotate in the horizontal and vertical directions but is sufficient to adjust the field of view. The captured images are transmitted into the computer via a network cable. The image size of the left and right cameras is 1360 pixels × 600 pixels, and the physical size of the unit pixel in the column and row directions is 6.45 µm. Given that the left and right cameras are non-zooming, the intrinsic and extrinsic parameters of the cameras at the initial position can be calibrated in advance. Currently, Zhang's checkerboard calibration method is extensively used in binocular stereo vision because of its convenience, low cost, and high precision. The intrinsic parameters, including the focal length, principal point, and distortion coefficients, of the left and right cameras obtained by using Zhang's method are shown in Table 1, as well as the Rodrigues rotation om and translation vector T 0 at the initial position.

Physical Experiment
To verify the proposed self-calibration method, the system of binocular stereo vision with rotating cameras is constructed, as shown in Figure 6. Two gigabit network cameras with the same configuration are installed on the horizontal platform. The two cameras can only rotate in the horizontal and vertical directions but is sufficient to adjust the field of view. The captured images are transmitted into the computer via a network cable. The image size of the left and right cameras is 1360 pixels × 600 pixels, and the physical size of the unit pixel in the column and row directions is 6.45 µ m. Given that the left and right cameras are non-zooming, the intrinsic and extrinsic parameters of the cameras at the initial position can be calibrated in advance. Currently, Zhang's checkerboard calibration method is extensively used in binocular stereo vision because of its convenience, low cost, and high precision. The intrinsic parameters, including the focal length, principal point, and distortion coefficients, of the left and right cameras obtained by using Zhang's method are shown in Table 1, as well as the Rodrigues rotation om and translation vector T0 at the initial position.

Physical Experiment
To verify the proposed self-calibration method, the system of binocular stereo vision with rotating cameras is constructed, as shown in Figure 6. Two gigabit network cameras with the same configuration are installed on the horizontal platform. The two cameras can only rotate in the horizontal and vertical directions but is sufficient to adjust the field of view. The captured images are transmitted into the computer via a network cable. The image size of the left and right cameras is 1360 pixels × 600 pixels, and the physical size of the unit pixel in the column and row directions is 6.45 µ m. Given that the left and right cameras are non-zooming, the intrinsic and extrinsic parameters of the cameras at the initial position can be calibrated in advance. Currently, Zhang's checkerboard calibration method is extensively used in binocular stereo vision because of its convenience, low cost, and high precision. The intrinsic parameters, including the focal length, principal point, and distortion coefficients, of the left and right cameras obtained by using Zhang's method are shown in Table 1, as well as the Rodrigues rotation om and translation vector T0 at the initial position.  In this experiment, the top-left corner point A of the display screen in Figure 7a is selected as the single feature point. The remaining three feature points are used for accuracy validation. The edge of the display screen is extracted using the Canny detection algorithm, as shown in Figure 7b. The four straight lines of the display screen's contour can be identified through the Hough line detection, as shown in Figure 7c. The intersections of these lines are the four corners of the display screen, and the pixel coordinates of the corners can be extracted to the subpixel level. The feature point matching before and after the rotation is presented in Figure 7d. In this experiment, the top-left corner point A of the display screen in Figure 7a is selected as the single feature point. The remaining three feature points are used for accuracy validation. The edge of the display screen is extracted using the Canny detection algorithm, as shown in Figure 7b. The four straight lines of the display screen's contour can be identified through the Hough line detection, as shown in Figure 7c. The intersections of these lines are the four corners of the display screen, and the pixel coordinates of the corners can be extracted to the subpixel level. The feature point matching before and after the rotation is presented in Figure 7d. After the initial positions of the two cameras are determined, the first images of the display screen captured by the left and right cameras, are used as the reference frame. In each trial, the captured image is used as the current processing frame for the left and right camera after the cameras rotated to a certain position, and the output values of the SBG system are regarded as the approximate values of the rotation angle pitch and yaw. After the matching between the feature point in the current and reference images for the left and right cameras, respectively, is accomplished, the rotation angles of each camera in the two directions can be calculated using the proposed method in Section 2. Hence, the rotation matrix and the translation vector of the binocular stereo vision after the rotation can be obtained. In this work, Zhang's method is also used to implement the calibration after each rotation, and the calibration results are considered to be reference values and are compared with our data. After the initial positions of the two cameras are determined, the first images of the display screen captured by the left and right cameras, are used as the reference frame. In each trial, the captured image is used as the current processing frame for the left and right camera after the cameras rotated to a certain position, and the output values of the SBG system are regarded as the approximate values of the rotation angle pitch and yaw. After the matching between the feature point in the current and reference images for the left and right cameras, respectively, is accomplished, the rotation angles of each camera in the two directions can be calculated using the proposed method in Section 2. Hence, the rotation matrix and the translation vector of the binocular stereo vision after the rotation can be obtained. In this work, Zhang's method is also used to implement the calibration after each rotation, and the calibration results are considered to be reference values and are compared with our data. This experiment is repeated 12 times. Figure 8a presents the rotation angles of the left camera in the two directions obtained by using proposed method and the corresponding approximate values. Figure 8b presents that of the right camera. The angles in Figure 8 based on our calculations are close to the approximate values, thereby verifying the validity of the algorithm. Figure 9 shows the three Euler angles representing the rotation matrix, namely, r x , r y , and r z , obtained by using the proposed and Zhang's methods in each experimental trial. As shown in Figure 9, the trend of the three angles in proposed method is the same as that of Zhang's method. The absolute errors of the three Euler angles relative to the reference values are presented in Figure 10. The three averages of the absolute errors are less than 0.21 • , and the average error of r x is the largest. This experiment is repeated 12 times. Figure 8a presents the rotation angles of the left camera in the two directions obtained by using proposed method and the corresponding approximate values. Figure 8b presents that of the right camera. The angles in Figure 8 based on our calculations are close to the approximate values, thereby verifying the validity of the algorithm.  Figure 9 shows the three Euler angles representing the rotation matrix, namely, rx, ry, and rz, obtained by using the proposed and Zhang's methods in each experimental trial. As shown in Figure  9, the trend of the three angles in proposed method is the same as that of Zhang's method. The absolute errors of the three Euler angles relative to the reference values are presented in Figure 10. The three averages of the absolute errors are less than 0.21°, and the average error of rx is the largest.    Figure 9 shows the three Euler angles representing the rotation matrix, namely, rx, ry, and rz, obtained by using the proposed and Zhang's methods in each experimental trial. As shown in Figure  9, the trend of the three angles in proposed method is the same as that of Zhang's method. The absolute errors of the three Euler angles relative to the reference values are presented in Figure 10. The three averages of the absolute errors are less than 0.21°, and the average error of rx is the largest.    Figure 11 shows the translation vector, namely, Tx, Ty, and Tz, obtained by using the proposed method and Zhang's methods in each experimental trial. Undoubtedly, the calibration accuracy of Zhang's method must be much higher than that of our self-calibration method by using a single feature point. However, the calibration results estimated by using the proposed method are close to those of Zhang's method, which proves the reliability of the proposed method. The absolute errors of the translation vector relative to the reference values are presented in Figure 12. The three averages of the absolute errors are less than 6.6 mm, and the average error of Tx is the largest.   Figure 11 shows the translation vector, namely, T x , T y , and T z , obtained by using the proposed method and Zhang's methods in each experimental trial. Undoubtedly, the calibration accuracy of Zhang's method must be much higher than that of our self-calibration method by using a single feature point. However, the calibration results estimated by using the proposed method are close to those of Zhang's method, which proves the reliability of the proposed method. The absolute errors of the translation vector relative to the reference values are presented in Figure 12. The three averages of the absolute errors are less than 6.6 mm, and the average error of T x is the largest. method and Zhang's methods in each experimental trial. Undoubtedly, the calibration accuracy of Zhang's method must be much higher than that of our self-calibration method by using a single feature point. However, the calibration results estimated by using the proposed method are close to those of Zhang's method, which proves the reliability of the proposed method. The absolute errors of the translation vector relative to the reference values are presented in Figure 12. The three averages of the absolute errors are less than 6.6 mm, and the average error of Tx is the largest.  In each trial, we calculate the 3D coordinates of the feature points B, C, and D on the display screen after the rotation, as shown in Figure 7a. The averages of the 3D coordinates are used for the evaluation of calibration accuracy. As shown in Figure 13, we compare the data obtained by using the proposed method with that calculated by using Zhang's method. The figure shows that the 3D reconstruction coordinates of the proposed method is close to that of Zhang's method, which proves the validity of the proposed method. The relative errors of the 3D coordinates with respect to the reference values are presented in Figure 14. The averages of the relative errors are less than 4.2%, and the relative error on Y-axis is the largest. The proposed method is suitable for the occasions where high accuracy of 3D reconstruction is not required.  In each trial, we calculate the 3D coordinates of the feature points B, C, and D on the display screen after the rotation, as shown in Figure 7a. The averages of the 3D coordinates are used for the evaluation of calibration accuracy. As shown in Figure 13, we compare the data obtained by using the proposed method with that calculated by using Zhang's method. The figure shows that the 3D reconstruction coordinates of the proposed method is close to that of Zhang's method, which proves the validity of the proposed method. The relative errors of the 3D coordinates with respect to the reference values are presented in Figure 14. The averages of the relative errors are less than 4.2%, and the relative error on Y-axis is the largest. The proposed method is suitable for the occasions where high accuracy of 3D reconstruction is not required.
the proposed method with that calculated by using Zhang's method. The figure shows that the 3D reconstruction coordinates of the proposed method is close to that of Zhang's method, which proves the validity of the proposed method. The relative errors of the 3D coordinates with respect to the reference values are presented in Figure 14. The averages of the relative errors are less than 4.2%, and the relative error on Y-axis is the largest. The proposed method is suitable for the occasions where high accuracy of 3D reconstruction is not required.

Conclusions
In this study, we present a novel self-calibration method for extrinsic parameter estimation of a rotating binocular stereo vision by using a single feature point. This is achieved by assuming the intrinsic parameters of the left and right cameras are known in advance, as well as the rotation matrix and the translation vector at the initial position. Only the rotation angles of the left and right cameras in the vertical and horizontal directions, that is, pitch and yaw, must be calculated after the rotation. We transformed the geometric imaging model of the pitch and yaw into a quadratic equation of the tangent value of the pitch. The closed-form solutions of the pitch and yaw are obtained with the aid of SBG equipment. Once the pitch and yaw are uniquely determined, the rotation matrix of the BSV system and the three Euler angles representing rotation matrix can be calculated according to rotation theorem. The translation vector are estimated by the rotation matrix of the right camera and the initial translation vector.
The proposed method is non-iterative, thus addressing the problem of not obtaining a global optimal solution in iterative algorithms. Under extreme conditions, we limit the number of feature points for calibration to a minimum, remarkably shortening the consumption time of the feature point matching and allowing the possibility to calibrate the extrinsic parameters of the rotating binocular stereo vision in real time. Computer simulations prove the feasibility of the proposed method, and the error of the extrinsic parameters and 3D coordinate reconstruction are minimal when the noise level was high. To validate the feasibility of the proposed method, we compare it with Zhang's method. Although Zhang's method exhibits a much higher precision, our calibration results from repeated experiments are close to the reference values estimated by Zhang's method. The primary contribution of this study is that the proposed method can be used for real time 3D coordinate estimation of dynamic binocular stereo vision in when an extremely high calibration accuracy is not

Conclusions
In this study, we present a novel self-calibration method for extrinsic parameter estimation of a rotating binocular stereo vision by using a single feature point. This is achieved by assuming the intrinsic parameters of the left and right cameras are known in advance, as well as the rotation matrix and the translation vector at the initial position. Only the rotation angles of the left and right cameras in the vertical and horizontal directions, that is, pitch and yaw, must be calculated after the rotation. We transformed the geometric imaging model of the pitch and yaw into a quadratic equation of the tangent value of the pitch. The closed-form solutions of the pitch and yaw are obtained with the aid of SBG equipment. Once the pitch and yaw are uniquely determined, the rotation matrix of the BSV system and the three Euler angles representing rotation matrix can be calculated according to rotation theorem. The translation vector are estimated by the rotation matrix of the right camera and the initial translation vector.
The proposed method is non-iterative, thus addressing the problem of not obtaining a global optimal solution in iterative algorithms. Under extreme conditions, we limit the number of feature points for calibration to a minimum, remarkably shortening the consumption time of the feature point matching and allowing the possibility to calibrate the extrinsic parameters of the rotating binocular stereo vision in real time. Computer simulations prove the feasibility of the proposed method, and the error of the extrinsic parameters and 3D coordinate reconstruction are minimal when the noise level was high. To validate the feasibility of the proposed method, we compare it with Zhang's method. Although Zhang's method exhibits a much higher precision, our calibration results from repeated experiments are close to the reference values estimated by Zhang's method. The primary contribution of this study is that the proposed method can be used for real time 3D coordinate estimation of dynamic binocular stereo vision in when an extremely high calibration accuracy is not required. In our future work, a real-time and highly accurate method will be simultaneously investigated.