Research on 3D Reconstruction methods Based on Binocular Structured Light Vision

There are many studies on 3D reconstruction based on monocular vision, but for complex surface parts, contour occlusion problems will occur, which requires binocular or multi-eye vision for 3D reconstruction. This paper mainly uses binocular structured light vision to study 3D reconstruction. The specific methods are: use structured light coding to calibrate the binocular camera, the projector and the left and right cameras to obtain the calibration parameters, and then obtain the 3D coordinates based on the triangulation principle that there is only one intersection between the camera straight line equation and the projector plane equation in space. Because of the binocular structured light used in this article, the point clouds of the left and right cameras need to be merged to complete the stitching process of the two points clouds. Through the verification of this methods, the stitching error is good, and the point cloud can be streamlined later to improve the point cloud registration rate.


Introduction
Vision-based 3D reconstruction is to collect pictures through a camera, obtain 3D coordinates according to the principle of triangulation, and obtain a 3D model of the measured object. 3Dreconstruction is mainly divided into two categories: contact type and non-contact type. In the contact type, a coordinate measuring machine (CMM) is commonly used. This method is simple to operate, has good repeatability and high accuracy, but it is easy to treat the surface of the object. It is also time-consuming and inefficient to cause damage. At present, the most commonly used is opticalbased non-contact 3D reconstruction [1].The non-contact type can be further subdivided according to the active and passive methods, and is divided into two types: active vision method and passive vision method. In active vision, laser scanning equipment is expensive, and the 3D data obtained is not conducive to subsequent processing. Therefore, more structured light is currently used. For passive 2 vision, there are many studies on monocular vision, but monocular cannot solve the problem of occlusion of the measured object due to the complex shape, while binocular vision can solve this problem [2]. Binocular vision has one more camera than monocular vision. For the obtained 3D point cloud, point cloud fusion is required. This paper uses the combination of active vision and passive vision, namely binocular vision and structured light method, to study the3D reconstruction and point cloud stitching of binocular structured light vision.

3D reconstruction experiment platform
The experimental platform mainly uses structured light projector and binocular vision, which combines active vision and passive vision to perform high-precision 3D point cloud reconstruction of the object to be measured. The structure diagram of the experimental platform is shown in Figure 1. The projector and the left and right cameras are placed on the same horizontal line. Adjust the distance of the object to be measured from the camera and the angle between the optical axis of the left and right cameras to obtain a suitable field of view of the object. The pictures are collected through the 3D reconstruction experimental platform for subsequent 3D reconstruction and point cloud stitching.

3Dreconstruction of binocular structured light vision
Structured light is an active measurement method. The structured light is coded and the code pattern is burned into the projector, and then projected onto the surface of the object to be measured through the projector. After the surface of the object to be measured is modulated, it includes the surface of the object to be measured [3]. 3D information, the modulated image is collected, decoded, and then based on the parameters obtained by calibration, and finally using the principle of triangulation, the 3Dinformation of the surface of the object to be measured can be obtained.

Camera imaging model.
The camera model is a simplification based on optical imaging, and the pinhole camera model is a more commonly used camera model. The image coordinates of any point p in space can be represented by a pinhole camera model: the projection position of any point p in the image is the intersection of the camera optical center and point p and the image plane. According to the camera imaging model, the 3D world coordinates can be obtained from the camera coordinates, but both need to know the camera internal parameter matrix and the camera external parameter matrix, which can be obtained by camera calibration. This content is introduced in the calibration section.

Structured light encoding and decoding.
Structured light coding is divided into time coding, space coding, and direct coding. Time coding is to project a series of coding patterns according to a time sequence [4]. This method has a large amount of calculation, but the calculation accuracy is high. Spatial coding is to project only one pattern. Compared with time coding, the amount of calculation is smaller. It can be used for real-time 3D scene measurement. However, the information of adjacent points in space will be lost when decoding, 3 and the final 3D point accuracy is not as high as time coding. Direct encoding is to encode each pixel directly. This method is the same as time coding. It needs to project multiple patterns to the surface of the measured object. It is not suitable for the measurement of dynamic scenes. It is very sensitive to noise and has great limitations. In summary, this article uses structured light time coding for coding. In time coding, Gray code is a reliable coding method. The adjacent bits are only one bit different, and decoding errors can be minimized during decoding. The gray code encoding process is the conversion of binary code into gray code, and the gray code decoding process is the conversion of gray code into binary code. Figure 2 is a schematic diagram of sub-Gray code 7.

Calibration of 3D reconstruction system
System calibration is to project structured light coding patterns on the calibration board, and then decode the collected coding patterns to complete the subsystem calibration of the left camera and projector, the subsystem calibration of the right camera and projector, and the stereo calibration of the left and right cameras. To provide a theoretical basis for the subsequent solving of the 3Dcoordinates and rough stitching of the object to be measured.
The camera is calibrated by Zhang's calibration method, which collects the pictures illuminated by a set of structured light in each different posture of the calibration board, determines the corner coordinates, and then finds the relationship between the corner coordinates and the world coordinates of the calibration board, and then calculates Display the internal parameter matrix, external parameter matrix and distortion coefficient of the camera itself. Projector calibration can be regarded as a reverse process of camera calibration. At this time, the calibration becomes to find the relationship between the corner coordinates of the projector and the world coordinates of the checkerboard calibration board. Through structured light coding and decoding, the local homography matrix from the camera corner to the projector plane can be estimated, and the corner coordinates of the projector can be estimated through the local homography matrix. Then the projector can be calibrated according to the camera calibration.
The meaning of each parameter of the internal parameter matrix is: the physical size of a pixel, dx, dy; the focal length f; the vertical and horizontal offset of the image origin relative to the optical center imaging point cx, cy (pixels as a unit). The extrinsic parameter matrix contains the rotation matrix and translation matrix converted from world coordinates to camera coordinates. The distortion coefficient includes the camera's radial distortion and tangential distortion.
Stereo calibration is based on the left camera. According to the rotation matrix and translation matrix obtained by the calibration of the left camera and projector subsystem, and the rotation matrix and translation matrix obtained by the calibration of the right camera and projector subsystem, the right camera is obtained simultaneously. The rotation matrix and translation matrix of the left camera. (1) Where PP is the coordinate of point P under the projector, PCL is the coordinate of point P under the left camera, PCR is the coordinate of point P under the right camera, RL is the left subsystem rotation matrix, TL is the left subsystem translation matrix, and RR is the right The subsystem rotation matrix, TR is the right subsystem translation matrix.
Taking the left camera as the standard, the pose relationship between the right camera and the left camera: Where R is the rotation matrix and T is the translation matrix. From (1) (2) (3), the left and right camera transformation relationship can be obtained: Through camera calibration and projector calibration, stereo calibration of the left subsystem, right subsystem, and left and right cameras can be completed, and finally the system calibration of the binocular structured light system is completed.

3D reconstruction algorithm
Through camera calibration, projector calibration, and according to the principle of 3D reconstruction (as shown in Figure 3): in space, there is only one intersection point between the camera's straight linear equation and the projector's plane equation, and this point is the 3D point to be obtained. The 3Dcoordinates of the points in space can complete 3D reconstruction. For the reconstructed points, according to the distortion coefficient obtained by calibration, the camera and projector are distorted to improve the reconstruction accuracy [5].
Through the known camera coordinates and the coordinates decoded by the projector, the parameter t can be obtained by entering the above formula (8), which is the required three-dimensional coordinate point to complete the three-dimensional reconstruction, which lays the foundation for point cloud fusion [6].

Point cloud fusion
The left camera and projector form a monocular structured light measurement system, and the right camera and projector also form a monocular structured light system [7]. Combine the left and right subsystems to obtain a binocular structured light measurement system. The two subsystems each obtain the point cloud coordinates with the camera in their own system as the reference coordinate system, and finally the process of superimposing the left point cloud and the right point cloud is point cloud fusion, that is, splicing. The essence is to unify all the obtained point clouds under the same world coordinate system. This process can be obtained by stereo calibration. By obtaining the rotation and translation matrices of the left and right cameras, the point cloud under the right camera can be converted to the left camera. Rough stitching [8].
The ICP algorithm is essentially an optimal registration method based on the least squares method. The algorithm repeatedly selects the corresponding point pairs and calculates the optimal rigid body transformation from coordinate system 1 to coordinate system 2 until the convergence accuracy requirement of correct registration is met. K-D tree is a data structure that partitions a k-dimensional data space. Mainly used in the search of key data in multidimensional space, such as range search and nearest neighbor search. K-D tree is a special case of binary space partition tree. It is a binary search tree with other constraints used to organize the geometry representing the points in the K-dimensional space. In order to achieve its purpose, it is usually processed in only three dimensions. Therefore, all K-D trees are three-dimensional K-D trees [9].
When the overlapping area of the origin point cloud and the target point cloud is large, the nearestKSearch interface is preferred to find the nearest neighbor. If the point cloud is sparse and the search radius is small, the radius Search interface can also be used to find the nearest neighbor. For the point cloud that has been rough stitched, the source point cloud and the target point cloud overlap a large area, so nearestksearch is used to search for the nearest k neighbors to improve the matching speed [10].
For occasions with high accuracy requirements, coarse splicing cannot meet the accuracy requirements, and fine splicing is required. Commonly used methods for fine splicing include ICP, K-D tree, SVD, and quaternion. This article uses ICP and K-D Tree methods for fine stitching. When the overlapping area of the origin point cloud and the target point cloud is large, the nearestKSearch interface is preferred to find the nearest neighbor. For the point clouds that have been rough stitched, the source point cloud and the target point cloud overlap a large area, so nearestksearch is used to search for the nearest k neighbors to improve the matching speed.

Result analysis
The stitching algorithm is verified by 200,000 Stanford bunny point clouds. When the initial position is better, the number of iterations is less. As shown in the figure, the white is the target point cloud and the green is the source point cloud. Because the target point cloud and the source point cloud overlap a large area, after the first iteration of registration, the two points clouds begin to overlap, and the iteration error is 0.0001497. After the second iteration of registration, the coincidence area is further enlarged, and the iteration error is 2.21429e-06. After the third iteration of registration, the coincidence is basically the same, and the iteration error is 2.82617e-07. This shows that the accuracy of the algorithm is higher.

Summary
3D reconstruction methods are use a combination of active vision and passive vision to perform 3D reconstruction using binocular cameras and projectors. The 3D point cloud is obtained according to the principle of triangulation, and the left and right point clouds are coarsely stitched according to the stereo calibration, using ICP and KD trees. Method to complete the fine stitching. By verifying the 200,000 Stanford bunny point cloud, the registration accuracy is better, and only invalid points are removed in the experiment, which can simplify the point cloud and increase the registration fusion rate.