Review

. A point cloud as a collection of points is poised to bring about a revolution in acquiring and generating three-dimensional (3D) surface information of an object in 3D reconstruction, industrial inspection, and robotic manipulation. In this revolution, the most challenging but imperative process is point could registration, i.e., obtaining a spatial transformation that aligns and matches two point clouds acquired in two diﬀerent coordinates. In this survey paper, we present the overview and basic principles, give systematical classiﬁcation and comparison of various methods, and address existing technical problems in point cloud registration. This review attempts to serve as a tutorial to academic researchers and engineers outside this ﬁeld and to promote discussion of a uniﬁed vision of point cloud registration. The goal is to help readers quickly get into the problems of their interests related to point could registration and to provide them with insights and guidance in ﬁnding out appropriate strategies and solutions.


Introduction
With the advent of low-cost three-dimensional (3D) imaging devices and the development of professional 3D acquisition techniques, more and more research efforts have been attracted toward the 3D point cloud from industry to academia in recent years. In contrast to a 2D image, a 3D point cloud data contain much more information of an object or a scene, and hence, it can provide a better understanding and description of the real world. Specifically, the point cloud can be used to measure the distance between objects in the current frame, achieve an accurate representation of the shape of an object, and so on. erefore, 3D point clouds have been widely produced and processed to provide accurate and fast descriptions of the 3D geometries of objects in various applications [1], such as 3D model reconstruction, geometry quality inspection, and robotic manipulation.
However, it is impossible to obtain all the point cloud data of an object or a scene at once in practice. erefore, we usually move the 3D acquisition device around a target object to finish a throughout scan, such that retrieving multiple point clouds at different viewpoints for better rebuilding the 3D environment or recovering the 3D shape of an object. It is critically important to capture these point clouds in such a way because (1) the Field of View (FoV) of a depth camera usually is not big enough to capture the massive scene, such as the forest and city, and (2) the quality of a single-achieved point cloud is generally not acceptable. For example, due to the low resolution of a camera, we need to fuse more frames into a global scene. In addition, to overcome the holes in the point cloud that are mainly caused by the reflection of the light on an object's surface, multiple frames at different viewpoints are required. In short, to obtain an accurate point cloud, the practical approach is to collect multiple point clouds in an appropriate manner and integrate them together. Furthermore, it is required to know the orientation of a scanned object represented by its point cloud data in its local coordinate. We have to match a standard point cloud in the dataset with the scanned point cloud. e key technology that makes several point clouds fused into a complete point cloud with a common coordinate system is called point cloud registration, also called surface registration, in which point clouds are analyzed to represent the real-world surface of the object. It is a key step for applications and tasks, such as autonomous driving [2][3][4][5], 3D reconstruction [6][7][8][9][10], simultaneous localization and mapping (SLAM) [11][12][13][14], and virtual and augmented reality (VR/AR) [15][16][17]. In the application of point cloud registration, each point cloud acquired from different viewpoints is measured within their local coordinates. erefore, the objective of registration is basically to calculate the transformation (translational and rotational) between point clouds to transfer these point clouds into a common mapping coordinate system. According to the number of the input data, point cloud registration can be divided into pairwise registration and multiview registration [18]. e pairwise registration particularly focuses on two point clouds and estimates the transformation between both of them, while the multiview registration, also called groupwise [19], performs more than two point clouds simultaneously, and the transformation of each scan to the reference data is calculated. According to the accuracy performance of registration, it can be divided into two stages: coarse registration and fine registration in the task of point cloud registration [20]. e coarse registration is intended to provide a rough transformation in terms of accuracy with an arbitrary initial position, and the fine registration aims to provide a relatively more accurate result for two point clouds so that the point clouds have more common regions where points become closer. Depending on whether the object of registration is deformed, the registration methods are also divided into rigid registration and nonrigid registration [21]. On the other hand, according to the types of the theoretical solutions to point cloud registration, point cloud registration can mainly be split into five categories: iterative closest point (ICP)-based methods, feature-based methods, learning-based methods, probabilistic methods, and others [22][23][24][25]. e ICP and its variants are classic solutions. ey can provide an accurate and reliable transformation between two point clouds through an iterative algorithm of correspondences with a good initial position and rotation. However, the final registration results are very sensitive to the initial position and rotation [26]. Feature-based methods aim to extract the meaningful and robust local geometrical descriptors as correspondences to estimate a transformation between two point clouds. e extracted descriptors can be subdivided into local descriptors [27][28][29] and global descriptors [30][31][32][33]. Local descriptors are extracted from the interest part of the point cloud, and the global descriptors are generated by encoding the geometric information of the whole point cloud. Although the global descriptors have a good performance on the data with obvious geometric attributes, it is difficult to be robust on point cloud containing outlier or flat surfaces.
Learning-based methods provide a relatively more robust transformation between two arbitrarily point clouds by invariant features generated by machine-learning techniques. anks to machine learning, feature extraction in registration can be more invariant because the learned descriptors may contain more detailed features than geometric characteristics or other handcrafted descriptors. e probabilistic method models input point cloud as a density function, such as Gaussian mixture model, and then optimize a statistical discrepancy among probabilistic correspondences to estimate final transformation. One of the advantages of this method is that it can be applied to rigid and nonrigid registration. Other solutions include methods with auxiliary data, in which transformation information is provided from other modules and techniques, and the 4points congruent sets algorithm based on random sample consensus (RANSAC). e registration algorithms discussed in this work could be used in the pairwise or groupwise/ multiview registration and could be applied in a fine or coarse stage with respect to accuracy. Although some of them can be used in nonrigid registration, we will primarily focus on rigid registration in this paper.
Several valuable reviews so far have been published on this topic in the last decade. For instance, Pomerleau et al. [34] presented a general framework based on ICP to classify the existing solutions and algorithms during the last twenty years and mainly focused on the use cases for mobile robotics applications that cover different kinds of platforms, environments, and tasks. It also introduced several realistic situations and applications based on point cloud registration, such as search and rescue tasks, automation of inspection, shoreline monitoring, and autonomous driving. Tam et al. [35] gave a comprehensive summary on rigid and nonrigid registration of point cloud and mesh. Cheng et al. [20] reviewed the existing registration methods based on laser scanning point cloud mainly in photogrammetry and remote sensing. Li et al. [36] gave a good statement of the registration problem and presented a novel framework for globally solving the problem grounded on the Lipschitz global optimization theory with unknown point correspondences. Ben et al. [37] provided an overview of the most important rigid registration algorithm on the estimation of the optimal rigid transformation and discussed their common mathematical foundation. Zhen et al. [38] conducted a comprehensive survey on terrestrial laser scanner (TLS) point cloud registration methods and presented a benchmark dataset containing various scenes. Li et al. [39] made a comprehensive evaluation on the ICP algorithm in terms of overlap ratio, angle, distance, and noise factors. Jiang et al. [40] and Ma et al. [41] provided a comprehensive and systematic review of image registration, and briefly introduced several typical applications in which 2D image registration was discussed in more detail. Specifically, Ma introduced the existing methods of feature-based matching from handcrafted to trainable ones with a comprehensive review. Jiang presented two general frameworks for area and feature-based pipelines and briefly introduced and analyzed typical applications to reveal the significance of multimodal image matching.
Our contributions are to (1) split the point cloud registration problem into the main five categories based on the research publication in the last decades, which is beneficial for new researchers and potential users of registration; (2) systematically summarize point cloud registration techniques and analyze the similarities and differences of the currently existing methods; and (3) introduce up-to-date achievements, challenges, and further research efforts on the applications based on point cloud registration. e structure of this paper is organized as follows: Section 2 describes the conception of point cloud registration and the issues in this area. In Section 3, the bestknown solutions in ICP-based, feature-based, learningbased, probability-based methods, and others are reviewed and discussed, respectively. Finally, the conclusion and outlook on point cloud registration is given in Section 4.

Basic Concepts and Principles of Point Cloud Registration
Point cloud registration is a fundamental technology in computer vision and robotics and is considered as an important prerequisite in various applications that use point cloud data as the input. In this section, we first briefly introduce the source of the point cloud, namely, the point cloud acquisitions. en, the definition of point cloud data is provided, and it is followed by the introduction of the problems on point cloud registration, and other issues and constraints in this field.

Point Cloud Acquisitions.
Point clouds can be generated by a 3D/depth camera directly or calculated by photogrammetric techniques. Typically these depth cameras can be divided into two categories [42]: passive method based and active method based. A passive method obtains the depth information to generate point cloud data through computing the relationship among multiple view images or a sequential set of images taken at different viewpoints. For instance, structure from motion (SfM) and multiview stereo (MVS) [43] can be integrated to produce point cloud data with high quality in terms of resolution and accuracy [44,45]. Given a series of images of an object captured at different views (image i acquired at position i), the SfM can utilize these images to create a sparse point cloud based on the correspondence between two images (see Figures 1(a) and 1(b)). For a dense point cloud, the MVS technique is usually considered [44,46], in which images are taken with the pose as the input provided by SfM, for finally producing a point cloud nearly comparable in accuracy to those generated by laser scanners. e PMVS is one of the well-known methods. It begins by feature detection and matching, and then the expansion and filtering step aims to reconstruct and remove erroneous patches repeatedly. e flowchart of PMVS is illustrated in Figure 1(c). In Figure 1, point clouds generated from SfM and PMVS are shown, respectively. Stereo-depth camera [47] is another type of camera with two or more image sensors spaced a small distance apart so that it is allowed to simulate the human binocular vision to perceive depth information by the determination of correspondence between points in images and the computation of depth values based on point correspondences [48]. e primary advantage of these approaches is that they function well in most outdoor environments. erefore, there are many outdoor applications of scanning surrounding scene and environment, such as building [49,50], streets [51], and city [52,53], using the passive method.
On the other hand, the active devices, such as the timeof-flight (ToF) camera, also called LiDAR, and structured light camera, usually emit or project light and utilize their properties to calculate depth information. Since the speed of light is known, the ToF cameras can calculate the distance between the camera and target objects by measuring the round trip time of a modulated light signal. e speed of generating the final point cloud data can reach 120 fps [54] so that it is widely used in real-time applications, such as autonomous self-driving [55] and gesture recognition [56]. Moreover, depending on the wavelength and power of the emitting light, the ToF camera can provide depth value at significant distances and high-resolution. A structure light camera projects the known patterns, such as phase lines, onto the surface of an object or scene to calculate depth information based on these deformed patterns [57,58]. As illustrated in Figure 2, the vertical straight stripes are projected onto the object and be distorted because of the surface profile. Eventually, 3D shape information can be obtained by the analysis of these images. Because the structured light camera can obtain high-resolution and speed at a relatively short range [59], it is widely used in fields that require high quality of point cloud in terms of accuracy and resolution, such as 3D shape measurement field [59] and medical field [60]. e basic principle and its applicable scenes of point cloud acquisitions are summarized in Table 1. Figure 3, a point cloud is a collection of points, i.e., S � p 1 , p 2 , . . . , p n , and these points describe the shape and surface of an object or scene in 3D. Generally, a point p at least contains the x, y, and z values that define basic geometric coordinates for the target objects, so that S p 1 , p 2 , . . . , p n |p ∈ R 3 . However, when some properties of a single point in the point cloud are available, such as curvature k, normal vector n, and the color information (r, g, b), the point p could be represented by a longer vector. An example of the point cloud provided by the KITTI dataset [61] is illustrated in Figure 3.

Registration of Point Cloud.
e point cloud registration consists of rigid and nonrigid registration. Rigid registration assumes that point clouds retrieved from the 3D acquisition are related by a rigid transformation that only consists of a translation and rotation. Nonrigid registration allows deformation in input data so that it provides an affine transformation describing translation, rotation, scaling, and shear.    mapped to the other point cloud while keeping the same shape and size after an appropriate rigid transformation. e rigid transformation is widely used in computer vision since target objects only change their location without altering their shape or size in most applications [62]. Mathematically, we consider two point clouds P � p 1 , p 2 , . . . , p n |p ∈ R 3 and Q � q 1 , q 2 , . . . , q m |q ∈ R 3 . e task of rigid registration is to find a transformation to align Q with P, and the transformation consists of a rotation matrix R and translation matrix T. e mathematical process can be briefly presented as alogrithm: R, T � rReg(P, Q), where rReg denotes a function that estimates the optimal R and T of a rigid registration based on a certain relationship between two point clouds. is function is also the focus of this paper and will be further detailed in the following sections. After the final step, the final result Q ′ should be in the same coordinate system as P, and the objects represented by Q and P will be more completed which means the objects have richer information and are more detailed. For example, to compute the transformation between two models formed with points, a transformation is applied for a reading point cloud to align a reference point cloud, as illustrated in Figure 4. If it puts them together without alignment, the point clouds will not be aligned because their coordinates are different (see Figure 4(b)). To obtain a complete scene of the kitchen, multiple point clouds at various viewpoints are retrieved, as shown in Figure 5(a). Point clouds fused into the common coordinate system without registration are shown in Figure 5(b), and point clouds with an appropriate registration solution are shown in Figure 5(c).

Nonrigid Registration.
In nonrigid registration, a nonrigid transformation is established to warp the scanned data to the target point cloud. e nonrigid transformation contains reflection, rotation, scaling, and translation instead of only translation and rotation in rigid registration. e use of nonrigid registration is mainly motivated by two reasons: (1) the nonlinearities and calibration errors of data acquisitions can cause low frequency warps on the scans of rigid objects [63,64]; (2) the registration is performed for the deforming and moving scene or object that changes its shape over time. For example, handwritten character recognition or a point cloud with warps or deformation is aligned to a model, as illustrated in Figure 6(a). As shown in Figure 6, the person and camera are moving, while the initial and incomplete data are increasingly denoised and detailed in realtime [65,66]. However, most of the registration solutions to the point cloud so far have focused on rigid registration. e development of nonrigid registration has been relatively tardy compared with rigid registration. To the authors' best knowledge, the estimation of correspondences and optimization techniques is also critical in most nonrigid registration. Once the correspondences are given, the nonrigid transformation will be easier to estimate [67,68]. Meanwhile, these crucial steps also are essential for rigid registration that we mainly focus on in this paper. Furthermore, in the real world, many applications that need nonrigid registration also involve more or less rigid transformation due to the great complexity of the high dimensionality of nonrigid transformation. erefore, in this article, we will primarily study the rigid registration and hope it will help those who want to have a comprehensive picture of point cloud registration.

Issues and Constraints.
e technique of point cloud registration has been used in different fields and applications such as 3D object scanning and 3D localization. Meanwhile, a point cloud is often combined with a traditional camera [69] to effectively address real-world problems such as object tracking, object recognition, and object detection. erefore, the specific algorithms or solutions to solve the problem of point cloud registration are various in different fields because the choice of these algorithms or solutions generally depends on several characteristics, such as accuracy and computational complexity, and even on data used and the environment itself. Point cloud registration is a challenging problem due to the reasons specified as following.  Mathematical Problems in Engineering retrieved consecutively might be large or small. However, most of the existing solutions often calculate the possible transformation by determining the correspondences based on the common region between point clouds. erefore, it will be a challenge to process varying overlap during the point cloud registration.

Noise and Outliers.
In contrast to triangle meshes represented by a set of triangles that are connected by their common edges or corners, a point cloud acquired from lowcost sensors and time-of-flight cameras [70], such as the Microsoft Kinect camera, is scattered, always suffers from noise contamination, contains outliers, and may have a considerable variation in accuracy [71,72] due to unavoidable deviation of camera theoretically [73], the lighting or specular reflections [74], and the measurement noise of the acquisition devices [75]. A scanned raw point cloud usually contains sparse outliers and isolated or nonisolated clustered outliers and hence may subsequently result in detrimental effects on practical applications. Generally, a good filtering algorithm should be employed to eliminate the noise from the point cloud while preserving the geometric features of objects or scenes.

High Computational Cost.
With the emergence of the high-resolution dataset and cameras, the massive point clouds often bring a great number of troubles to the registration algorithm because the distribution of these point clouds is unstructured and irregular [76]. Moreover, these point clouds obtained by 3D cameras cannot be directly used in applications such as the iterate-over operation of image pixels. Some procedures, such as removing noise, sampling, and normal estimation, usually first require the computation of the K-neatest neighborhoods (KNNs) for every point in the point cloud and then to visit each of them [77]. To make indexing and searching of point cloud fast, researchers designed many mathematical models such as traditional KNN algorithms and improved algorithms such as building point cloud topology for triangular mesh structure and constructing kd-tree on GPUS [78]. In the research works [77,[79][80][81], the faster searching algorithm and better storage management of point cloud can support the process of registration. However, with the development of computer power and digital storage and the demand for high-resolution 3D cameras, the effective operation and analysis of massive point clouds is a challenging task for applications.

Various Indicators for the Success of Registration.
For point cloud registration, no matter what kind of solutions or algorithms we use, a good result conversion should be defined. Since the most popular classification for this problem is expectation-maximization (EM), the significant step is to find an effective minimization procedure. e wellknown solution is the iterative closest point (ICP) algorithm which consists of selection, matching, and minimizing errors [82]. Meanwhile, many researchers in the past decades proposed significant modifications of ICP, such as giving different error metrics (i.e., point-to-point and point-tosurface metric) and point selection strategies, in order to improve the robustness, convergence speed, and accuracy of registration result. Although most variants of ICP achieved a good result in the ICP-framework and in their dataset because their methods based on their proposed error metric or termination criteria are better compared with existing methods, what exactly a successful registration result is was rarely discussed. To our best knowledge, Dirk et al. [83] suggested that we could check the percentage of the overlapping surface of the two point clouds and check if the overlapping region has enough inliers to guarantee a successful result. Based on this suggestion, we will discuss the impact of the existing error metric or other criteria in terms of good registration in this paper. Liang et al. [20] gave a list of three error analysis methods: based on the comparison among existing registration algorithms, based on the target point in reference data, and based on a common region between input data. e first method compares existing methods to either improve the current solutions or incorporate the existing approaches into a complete workflow, hence achieving a better result. e second method measures the result of registration by analyzing specific points in reference data. e third one is a widely used method assuming that common points and regions are occurring in two point clouds to be registered.

Registration Methods/Algorithms
Over the past decades, growing amount and diversity of methods have been proposed for point cloud registration, from the classic ICP algorithm to the solution integrated with the deep learning techniques. In this section, five categories of the point cloud registration solutions are introduced: ICP-based methods, feature-based methods, learning-based methods, probability methods, and others.

ICP-Based Methods.
Among existing point cloud registration methods, the iterative closest point (ICP) algorithm has received significant attention due to its simplicity and efficiency. e ICP algorithm can obtain a more accurate and robust transformation from one point cloud, named as the reading point cloud, to the other one, called as the reference data. Although it has many drawbacks, such as falling into a local minimum, it will be easily to overcome the drawbacks and to achieve a better result via integrating with other algorithms that can provide a rough transformation as the input to ICP. erefore, many modified ICP algorithms were proposed and applied in applications. In this section, the ICP algorithm and its excellent variants are reviewed.

ICP Algorithm.
e ICP algorithm is an iterative algorithm that can ensure the accuracy, convergence speed, and stability of registration in an ideal condition. In a sense that the ICP can be regarded as the expectation-maximization (EM) problem [84,85] such that it computes and updates a new transformation based on correspondences, and then is applied to reading data until an error metric converged. However, there is no guarantee that ICP reaches a global optimum. e ICP algorithm can be approximately split into four steps [83,86]: point selection, point matching, point rejection, and error metric minimizing as shown in Figure 7, and then each of them is introduced in the following sections.

Point Selection.
is step aims to select a subset of points representing the current point cloud, because the raw point cloud is usually quite large and requires high computational complexity, resulting in unbearable computing time. Meanwhile, depending on the sensors and applications, too much data also bring redundant or unnecessary details into the task of registration. erefore, point selection as a preprocessing step to reduce the number of points is imperative in registration with high computational efficiency and accuracy. erefore, many downsampling strategies were proposed and studied [87,88] for point selection, such as random, distance limit, and uniform downsampling. In a random strategy, specified points are picked randomly from neighborhoods. When setting a threshold distance in the distance limit strategy, for every point, it picks points from the original cloud so that points in the output cloud keep the specified distance at least between each other. e bigger this distance threshold is, the less number of points is kept, as illustrated in Figure 8. In the uniform strategy, a voxel grid is created to split the point cloud data into a set of small 3D boxes spatially, and then the spatial average of the point distribution in a voxel grid is selected as a real point. An example is shown in Figure 9.
In uniform subsampling, the computational cost of partition and centroids calculation will linearly increase when dense grids are required to achieve high-resolution [89,90]. erefore, the distance limit and the random subsampling are widely used as a filter or data reduction step in many applications [90]. On the other hand, these methods can avoid a false offset performance in which generated points are not raw data that might not represent the surface of the object accurately. Moreover, for speeding up the operation in the point cloud, we can use a spatial representation [76], such as kd-tree, as a data structure to efficiently store and manage discrete data of the point cloud.

Point Matching.
e purpose of this step is to discover the matching of every point in the reading point cloud onto the reference point cloud so that the correspondences are established between the two point clouds. We can find the correct translation and rotation between the two point clouds after we know the appropriate correspondences.
Correspondences, i.e., point pairs, typically are generated by searching the closest neighbor points in reading data for each reference point with respect to the Euclidean distance. As shown in Figure 10, point clouds are approximately represented by blue points in Figure 10(a), correspondences are established through finding closest points and marked as blue lines in Figure 10(b). However, the searching operation in matching is a greedy approximation and has a significant computational complexity with O(N * M), where N and M denote the number of points in the respective point cloud. To avoid the exhaustive searching of all the reference points, many scholars improved the performance with the assistance of input data structure. For example, multi-z-buffer [91], kd-tree [92,93], and octree [94] were developed to store, manage, and operate in order to accelerate the matching process. Specifically, Wei et al. [95] used kd-tree to optimize the calculation of the normal in the point cloud. Meng et al. [96] used a combination of kdtree and extrapolation to improve the efficiency of registration. Moreover, we can utilize different strategies to find correspondences instead of a closest-based method. For example, Yong et al. [97] introduced a point-to-projection technique to estimate correspondences by projecting reading points onto a destination surface from the point of view of the destination [98]. Pulli [99] studied a point-to-plane technique through the normal vector of the reading point cloud, and so on. ese approaches differ from point-topoint, i.e., a closest-based strategy, largely determine the further steps, including point rejection and minimization of error metric.
Additionally, the matching process can be accelerated by hardware. For example, Yong et al. [100] proposed a realtime 3D registration technique in which a point-to-plane based matching process was implemented entirely in CUDA, an up-to-date GPU programming technique developed by NVIDIA for general computing on graphical processing units. Peng et al. [101] introduced a GPU-based ICP method that implemented the GPU-based nearest-neighbor searching in the traditional ICP.

Point Rejection.
e purpose of this step is to reject and eliminate invalid correspondences obtained from the matching process, because erroneous pairs of points can negatively affect the estimation of transformation result between point clouds [102]. Additionally, the rejection of invalid correspondences can tremendously increase the robustness against noise and outliers [103]. e widely used methods to reject erroneous point pairs are based on Euclidean distance. One of these criteria is a fixed maximum distance: given a maximal distance as the threshold value, point pairs with the distance greater than this specific value will be rejected [102]. As illustrated in Figure 11(b), the dotted pairs will be removed because their distance is larger than a specific value. It works well for filtering outside overlap regions induced by noises and outliers. Similarly, based on this fixed distance, a specific threshold between a pair of points, correspondences can also be rejected with the worst n% of pairs [99], or with multiple of the standard deviation of distances [104]. However, rejection based on a fixed percentage or threshold usually is inflexible in dealing with incomplete point clouds [105]. erefore, other rejection strategies were introduced, such as median distance [106], dynamic threshold [99], RANSAC-based [107], normal compatibility [108], and duplicate matching [109] methods. RANSACbased rejection can randomly pick up subsets of correspondences to estimate relatively the best transformation, and it is effective to avoid converging into local minima [83]. e duplicate matching focuses on preserving a single pair with removing redundancy pairs between the reading and reference point clouds. As shown in Figure 11(c), the blue lines will be preserved because the pairs connect to the same point with the shortest distance in the reference data. e rejection based on normal compatibility rejects the point pairs that have inconsistent normal information. Specifically, the angle between the normal of pairs is larger than a given threshold, or the normal direction of the point pair is inconsistent [110]. As depicted in Figure 11(d), the arrows show each normal of points, and red pairs ought to be rejected because the angle of their normal is larger than a given value. Furthermore, experiments showed that weighted correspondences, by a weighting function that can be considered as an adjustment of correspondences after point marching, have a highly robust performance for point clouds involving larger rotation and translation [111,112].

Error Metric Minimizing.
is step aims to apply an appropriate cost function and metric that, in a sense, can be used to determine whether a good transformation result has been obtained between pairwise point clouds and then minimize the error metric through iteration with a new transformation.
ree approaches [113,114], point-topoint, point-to-plane, and point-to-projection, are widely used for rigid point cloud registration.
Point-to-Point. Besl et al. [115] and Arun et al. [116] proposed a point-to-point strategy, in which each point in the reading point cloud is paired with the closest point based on Euclidean distances in the reference point cloud to form correspondence pairs, as discussed in the step of point matching. e error metric is the sum of squared differences between the matched pairs as shown in equation (2). en, for the entire point cloud data, the goal is to find optimal R and T that can achieve a minimum value of equation (3), in which R, T can be assumed a final result. Eggert et al. [117] proved that it could get good performers by evaluating it in terms of numerical stability and accuracy: where ‖ · ‖ 2 denotes Euclidean length, p i and q i are the i-th point pair of N pair correspondences from reference point cloud P to the reading point cloud Q, and R ∈ SO(3) and T ∈ R 3 represent the rotation and translation part of transformation, respectively. e reading point cloud will be updated with this new transformation and proceed to the next computation, including the point matching and rejection if L does not meet the required threshold and converges. e error metric in the ICP algorithm converges and terminates when the difference between the current L and the next L next is less than a given threshold δ, i.e., |L − L next | < δ.
Point-to-Plane. Chen et al. [118] and Bergevin et al. [119] proposed a point-to-plane ICP, searching the intersection on the surface of the reference point cloud from the normal vector of the reading point cloud. is method takes into account searching the nearest point from the reference point cloud and the normal of the searched point. Consequently, it is less sensitive to outliers and noise and has a more accurate performance and faster convergence speed compared with point-to-point based ICP. e error metric of point-to-plane is to minimize the sum of the squared distance from a point to a plane between each correspondence as where q i is transformed by R and T, and n p i is the unit normal vector at p i , and all other variables are the same as in equation (3). For obtaining an optimal R and T, the linear least-squares problem can be formatted and solved with a linear optimization, such as singular value decomposition (SVD) method [116], computing orthonormal matrices [120], and unit quaternions [121], or a nonlinear solver, such as Levenberg-Marquardt method [122]. e performance of these methods is similar in terms of accuracy and stability to minimize the error metric function [117].
Point-to-Projection. e point-to-projection approach was proposed by Blais et al. [98] and Neugebauer et al. [123], and it is fast but less accurate than priors. It obtains correspondences through projecting the points from the reading point cloud onto the adjacent points in the reference point cloud along point normals (i.e., normal shooting approach), in a sense that it uses a different strategy to establish correspondences. Followed by rejection to achieve valid correspondences, the error metric in point-to-projection is the same as equation (3). In addition, for achieving an ICP algorithm with more precise and fewer iterations in different applications, the other error metrics were proposed, such as point-to-line based ICP [124], and colors or curvatures based method [125] that considers the compatibility of more properties of points. Among these error metrics, the widely used are point-to-point and point-to-plane. e point-to-plane approach has been shown to converge faster than the point-topoint algorithm [82], especially for those applications that need to scan large planar surfaces, while the point-to-point has better robustness and convergence precision [39]. Moreover, there are many variants based on point-to-plane and point-to-point for achieving good performance. For example, Rusinkiewicz et al. [82] introduced simple and computationally efficient methods based on point-to-plane. Forstner et al. [126] presented a computationally efficient solution to determine the relative transformation in realtime. Segal et al. [127] introduced generalized-ICP (GICP) that attaches a probability model to the minimization of error metric in order to reduce iterative computation.
3.1.6. ICP Variants and Summary. Over more than two decades, many ICP variants were proposed with respect to two key steps: searching valid correspondences and conducting efficient minimization of error metric. ey have grown into sophisticated tools that can be applied to a custom application. In this part, we introduce several stateof-art ICP variants and practices for real-world applications.
First of all, the mainly used ICP algorithms are point-topoint based and point-to-plane based. erefore, many specific algorithms have been implemented by researchers, these algorithms are often used to compare with other approaches, for instance, Open3D [128], an open-source library that processes 3D data, Point Cloud Library (PCL) [129], an open-source project contains a part of large-scale point cloud processing, and Libpointmatcher [130], a highly configurable chain of ICP and filters, and all of them are cross-platform; that is, they can run on the mainstream platforms, such as Windows and Linux.
Secondly, some modifications of the original ICP can also be used to achieve attractive performance in registration tasks. For example, Go-ICP introduced by Yang et al. [131] searches the entire transformation space to reach a globally optimal estimation by a branch-and-bound (BnB) scheme, avoiding falling into local minima. Generalized-ICP (GICP) studied by Segal et al. [127] handles the minimization of error metric in a probabilistic model. It enhances registration to be more robust with incorrect correspondences. Although these methods still need to integrate with the traditional techniques of ICP for higher accuracy, it can provide a good result of registration. e processes presented in this section are summarized in Table 2, in which the modules and implement details of the ICP algorithm are listed.

Feature-Based Methods.
As we can see in the ICP-based algorithm, the establishment of correspondences is critical before the transformation estimation. e final result can be guaranteed if we obtain appropriate correspondences describing the correct relationship between two point clouds. erefore, we can stick landmarks on the scanned object or manually pick equivalent point pairs in postprocessing to calculate the transformation of interesting points (picked points), and this transformation can finally be applied to the reading point cloud. As illustrated in Figure 12(c), point clouds are loaded in the same coordinate system and painted in different colors. Figures 12(a) and 12(b) show two point clouds captured at different viewpoints, point pairs are selected from reference and reading data, respectively, and the registration result is shown in Figure 12(d).
However, these methods are neither friendly to the measured object that cannot be attached with landmarks, nor can it be applied in the applications that require automatic registration [132]. Meanwhile, to minimize the search space for correspondences [26] and avoid the assumption of initial transformation in the ICP-based algorithm, feature-based registration was introduced in which key points designed by researchers are extracted. In general, the keypoint detection and correspondence establishment are primary steps in this method [133].
Depending on the applications, these features are generated regarding geometric information of the point cloud. For instance, features generally can be highly correlated with shape information, such as, curvature, normal [134], colors [135], and intensity [136]. In [26], curvature and surface  normal was utilized to express the characteristics of the point cloud, and the correspondences were found with these conditions: where p s i and p s j are features in the reading data and reference data, respectively, h and w are the surface curvature and normal angle of the neighbor point, and τ curvature , τ normal are the threshold for curvature and normal angle. For some applications that primarily collect road or driving point cloud data, line and surface information are important because road networks and roof patches might be in common [137]. ese approaches were unexpected and suggested that descriptors should be descriptive and invariant, such that it is capable of encapsulating the predominant information from the point cloud, and better insensitive to noises, i.e., disturbed points, so that the correspondences can be effectively established between pairwise point cloud, to achieve more robust rigid transformation finally. ese handcrafted features, referring to some properties of features that are manually engineered by the researchers, have a long history of designing and studying. Many 3D descriptors for point clouds have been introduced and applied to the task of point cloud registration. Meanwhile, comprehensive reviews for point cloud descriptors and comparison in terms of computational efficiency and accuracy were provided by [138][139][140][141][142] as well. According to the way of descriptor extraction, the descriptors can be approximately classified as local, global, and hybrid descriptors. Local descriptors, for every point, are calculated through spatial distribution or geometric attributes from their neighbors, and it is produced by a histogram representing the descriptors for each point. It is less discriminating exactly, but it is a good choice for cluttered scenes. On the other hand, a global descriptor, for the entire point cloud, is produced with a single descriptor vector by integrating a set of features. Global-based features are often used in 3D object recognition and 3D object categorization, while some global descriptors can be used for pose estimation of the objects because they contain local descriptors, such as Viewpoint Feature Histogram (VFH) and Local-to-Global Signature (LGS) descriptor. Moreover, global-based methods are less computation compared with the local features and better to describe the whole point cloud but are affected generally by occlusions and clutter [143]. Hybrid-based descriptors incorporate local and global descriptors together with most of the advantages from them.
In this section, two local-based methods: Point Feature Histogram (PFH) [28,144] and Signature of Histograms of Orientations (SHOT) [145] are introduced, respectively, for point cloud registration.

Point Feature Histogram (PFH)
. PFH can be classified as local feature descriptors that rely on the local neighborhood of each point, while the global feature descriptors estimate geometrical information as a vector through the whole point cloud. In the keypoint detection, PFH exploits the geometrical properties of the point's neighborhood to estimate multivalue features. en, the feature descriptions are created by binning these features into a histogram. Followed by the establishment of correspondences in multiple-space by ks-tree, the transformation (translation and rotation) between matched pairs is constructed and applied to the reading point cloud as an initial parameter for ICP.
Specifically, in the k-neighborhood around point p s , PFH descriptor is defined by four features as where u, v, and w represent the Darboux frame coordinate system, as shown in Figure 13, and f 1 , f 3 , and f 4 are in practice the difference between p s and p t (distance is excluded from histogram). Subsequently, PFH is formed by binning these three features into a div 3 histogram, where div is the number of subdivisions in the range of feature values.
In the implementation of Point Cloud Library (PCL), div is set to 5 such that each point with 125 bins, followed by establishing correspondences through these histograms that are represented by a high-dimensional space. e correspondences between two point clouds are depicted in Figure 14(a), the specific histograms at p 290 and p 6916 are shown, and the final registration result by PFH is shown in Figure 14(b). e computation complexity of PFH is O(n 2 ) where n is the number of points. Fast PFH (FPFH) that has a computational complexity of O(n) was proposed in [27], in which a Simplified Point Feature Histogram (SPFH) is generated and weighted to construct the FPFH.

Signature of Histograms of Orientations (SHOT).
SHOT is a local-based descriptor that combines the benefits of both signature-and histogram-based methods. Signaturebased methods describe the 3D points with their neighborhood by encoding geometric information in a defined invariant local reference frame. Histogram-based methods describe the neighborhood by compressing this information into bins following a specific quantized domain. Specifically, it builds a robust local reference frame for every point in the point cloud, where angle θ q is calculated individually between normal n q and local z axis, and then 3D descriptors are encoded by a histogram within around points in the support. Finally, for the signature structure, an isotropic spherical grid divided into 32 volumes in the experiment is used to encompass partitions along the radial, azimuth, and elevation axes. Figure 15 provides a graphic description of the signature structure for SHOT, where for better clarity, volumes of the grid are 16 because of 2 elevation divisions, 2 radial divisions, and 4 azimuth divisions instead of 8.
Apart from the computation of descriptors, the next step for point cloud registration is to establish correspondences between features extracted from reading and reference data.
A threshold is set for finding the nearest neighbor to match point-to-point correspondence. As shown in Figure 16 overlap where correspondences can be extracted. en, the transformation leads into an initial guess into the ICP algorithm to achieve a fully automatic registration point cloud, as illustrated in Figure 16(b).

Further Steps after Establishment of Correspondences.
is section introduces the further steps after obtaining a large set of correspondences. To our knowledge, given a large set of correspondences, a popular strategy was proposed [146,147] and consisted of two steps: (1) building a set of putative correspondences containing outliers and inliers; (2) removing the outliers and estimating the transformation using inliers.
For some point cloud registration tasks, the point-topoint correspondences are always established through associating each descriptor to its nearest neighbor [141] in the reference point cloud in the descriptor space. e transformation produced by these correspondences can be directly considered as a rough result and then improved by other algorithms such as ICP. However, this strategy might not be accurate enough, because the correspondences obtained from two point clouds include inliers and outliers generally, and the ICP are sensitive to noises and outliers. Additionally, the real-world data contain noises, outliers, occlusion, and even flat surfaces with few apparent features, such that the registration performance is severely degraded because the extracting of traits from the point cloud is affected. erefore, designing an algorithm to remove outliers and estimate transformation based on inliers efficiently is important. A key of this task is to remove mismatches from given putative feature correspondences, leaving the true matches to calculate transformation. To this end, a simple way is to reject random correspondences based on the hypothesize-and-verify framework [148]. However, the notable drawback of it is it requires a large amount of trials, especially when the inlier ratio is low and the expected confidence to find correct matches is high [149]. To eliminate the mismatches fast, a mismatch removal method was proposed [150] that preserves local neighborhood structures of feature points from thousands of correspondences in few milliseconds, resulting in a quick initialization result for registration. Ma et al. [147] considered constructing a putative set of correspondences and rejecting matches due to sufficiently different feature descriptor vectors. In addition, for nonrigid and rigid registration, a solution that robustly estimated the transformation from correspondences was proposed [151,152] using L 2 -minimizing estimate [153], but it is severely biased by outliers. A set of putative correspondences could be purified by semisupervised learning [154] or a mixture of densities [155]. However, these featurebased methods compute transformations solely based on a set of putative correspondences instead of the entire input data.

Learning-Based Methods.
In recent years, learningbased approaches have received significant attention in many fields related to computer vision, especially object detection, classification, semantic segmentation, etc. In applications that use a point cloud as an input, traditional strategies to estimate feature descriptors heavily rely on distinctive geometric properties of the object in the point cloud, as we discussed in the last section. However, the realworld data often vary from object to object that may contain flat surfaces, outliers, and noise. Moreover, the removed mismatches usually contain useful information and can be used for transformation learning.
Learning-based techniques can be suitable to encode semantic information and be generalized in specific tasks. Specifically, most registration strategies integrated with machine-learning techniques are faster and more robust than classical methods and flexibly extended to other tasks, such as object pose estimation and object classification. Similarly, a key challenge in point cloud registration based on learning is how to extract features that are invariant to the point cloud's spatial change and more robust to noise and outliers. In this section, we will describe and discuss several state-of-art approaches integrated with machine learning.  Figure 17. It should be noticed that the alignment in the network is used to align both input points and point features, and this alignment can make sure that the learned representation is invariant to the geometric transformation of the point cloud. Although PointNet for 3D descriptors [157] mainly focuses on 3D classification and segmentation, they indeed provide research studies with more inspirations on the registration task because the descriptors used in segmentation task and 3D object classification bring a different perspective for correspondence establishment.
As the pioneering effort that directly processes raw point cloud, PointNet++ [158] developed from the PointNet can learn local features by exploiting metric space distance. PointNet++ is a hierarchical neural network architecture that consists of a set of learning layers to combine features from multiple scales, whereas the traditional PointNet is to learn from spatial information of points and then aggregate these features into a global signature of the point cloud. is hierarchical structure includes a bundle of set abstraction levels which are composed of a sampling layer, a grouping layer, and a PointNet layer. e sampling layer focuses on selecting a set of points from input data. e grouping layer focuses on constructing local region sets for each point. e PointNet layer aims to produce feature vectors by encoding the local region pattern. A hierarchical architecture of PointNet++ is given in Figure 18, N denotes the points with d dimension coordinate and C dimension feature, and K is the number of centroid points' neighborhood.
Point Cloud Registration Network (PCRNet) [159]. It presents a framework combines PointNet features to find the transformation between point clouds. In particular, it uses five multilayer perceptrons (MLPs) which are similar to the PointNet architecture and are arranged in a Siamese architecture, to encode the point cloud as global feature vectors that contain the information on the geometry and orientation of point clouds. Followed by estimating the transformation with a single forward pass in the fully connected (FC) layers, which is different from PointNetLK [160] that uses a traditional algorithm to calculate the transformation. An architecture of PCRNet is illustrated in Figure 19. Point clouds first are fed as input into five MLPs (64,64,64,128,1024), and the output is global features generated from symmetric max-pooling function ϕ. ese features are concatenated and provided as input to fully connected layers, and the final result that includes translation and rotation values will be calculated in an FC with (1024, 1024, 512, 512, 256, 7). Experiments showed that iterative PCRNet is similar to Go-ICP with respect to accuracy but much faster based on ModelNet40 [161].
Deep Global Registration (DGR) [24]. It consists of three modules: a 6D convolutional network for estimating the veracity of each correspondence, a weighted Procrustes solver for a closed-form rigid registration, and an optimization module that fine-tunes the registration generated from the previous step. Specifically, it first extracts features from the pairwise point cloud using fully convolutional geometric features (FCGFs) [162], and then a translation invariant convolutional network is used to identify the correct correspondences. Finally, the weighted mean squared error is minimized by the Procrustes method [163] to generate rotation R and translation T, as shown in Figure 20. To improve the accuracy of registration, a fine-tuning module is proposed finally to minimize a robust loss function, in which the optimization space is processed smoothly by removing discontinuous. Experiments showed that it can achieve a good performance on real-world data, 3DMatch [164].
Deep Closest Point (DCP) [165]. To overcome the intrinsic drawbacks of ICP, i.e., converging to spurious local optima, DCP was introduced by using learned embedding to guarantee a better marching. It has three parts: a point cloud embedding network, an attention-based module to predict a soft-matching that is generated from a probabilistic approach, and an SVD layer to estimate rigid transformation. In more detail, the raw point cloud is embedded into highdimensional space by PointNet or dynamic graph convolutional networks (DGCNNs) [166], followed by an attention and pointer generation module to learn cocontextual information and establishment of soft correspondences. Finally, the transformation between point clouds is calculated in a differentiable SVD layer. As shown in Figure 21, F x and F y are feature embeddings, respectively, for X and Y, and Φ x and Φ y are new embeddings that have contextual information. m(x i , Y) is a soft pointer from each xi into the points of Y. Finally, we can get pairs containing x i to y i over all i. In particular, this is a DCP-v2 integrated with an attention module while the DCP-v1 does not. Additionally, because PoinNet and DGCNN both can extract features from the point cloud, but PoinNet learns global descriptors while DGCNN learns local descriptors. In this article [165], the proposed method based on DGCNN empirically performs better than the PoinNet in terms of accuracy on the dataset ModelNet40 [161]. However, this method is not accurate enough.
us, it can be further polished by the ICP algorithm after a better initialization provided by DCP to converge to the global optimum.
Partial Registration Network (PRNet) [23]. It tackles partial correspondence problems by learning geometric priors with self-supervised learning. PRNet is composed of an appropriate geometric representation and a keypoint detector that focuses on finding common points and establishing correspondences. Moreover, the PRNet is designed to be iterative like ICP, such that it is capable of performing coarse-to-fine refinement. Specifically, PRNet first detects keypoint of point cloud X and Y and then predicts a mapping and rigid transformation (m p ) from X to Y based on the keypoints, as shown in Figure 22. Finally, the obtained transformation is applied to X, and the process is executed repeatedly with the new transformation as input. However, to be sharper and approximately differentiable than traditional methods [167], PRNet can be considered as an actor-critic method (ACM), using Gumbel-Softmax [168] to sample a matching matrix for predicting a mapping of keypoints from X to Y. Experiments showed that PRNet is more robust to noise on the ModelNet40 [161].
Robust Point Matching (RPM-Net) [169]. It builds upon the previous iterative framework RPM [170] that uses optimization technique and soft assignment strategies to solve point matching problem. RPM-Net has mainly three parts: (1) computing the hybrid features combined with spatial coordinate and geometric properties of points; (2) a differentiable Sinkhorn [171] layer and annealing algorithm [172] is applied to get soft assignments of correspondences from these features; (3) the weighted SVD is used to solve the rigid transformation. In particular, features for every point from the point cloud are extracted by PointNet and FPFH [27], followed by an annealing algorithm for soft assignment and Sinkhorn normalization that produces the final correspondences M i , during which a secondary network is optional used to predict and adjust the annealing parameters (α, β) during iteration. Finally, the rigid transformation is computed by SVD while each correspondence Y is weighted. e overview of RPM-Net is shown in Figure 23. e result data obtained in experiments suggest that it is robust to noise data on the dataset ModelNet40 [161] based on mean isotropic rotation and translation errors.
PointNetLK [173]. It builds a deep neural network framework derived from PointNet and is integrated with the Lucas-Kanade (LK) [174] algorithm which is classic techniques for image alignment. To this end, PointNetLK begins by producing K-dimensional vector descriptor from PointNet function ϕ, and then ΔG as current transform is represented by an exponential map and estimated in the LK layer, estimating the alignment without costly computation of point correspondences. e final transformation G est is obtained by all incremental estimates computed during the iterative loop. is model is shown in Figure 24. PointNetLK evaluated on the dataset ModelNet40 [161] demonstrates more robust to noise data and is insensitive to initial position than the ICP algorithm.
3DRegNet [175]. Given a set of noisy point correspondences, it focuses on classifying inliers and outliers because inliers mainly contribute to the estimation of an optimal pose. To end this, a deep neural network (DNN) for registration is designed to compute the transformation parameters (R and T). Specifically, 3DRegNet starts by passing a set of point correspondences through a fully connected layer, during which 128 dimensional features for every correspondences are calculated and will be fed into ResNets [176] to obtain weighted w i ∈ [0, 1) correspondences (see Figure 25(b)). Subsequently, DNN is presented to extract and process meaningful features by pooling and context normalization. en, the transformation is estimated at the end registration block (see Figure 25(c)). e architecture of 3DRegNet is shown in Figure 25(a). Moreover, an additional 3DRegNet can be used to compute a smaller transformation for a more precise result. A series of experiments based on the dataset ICL-NUIM [177] and SUN3D [178] suggest that a deep learning approach indeed performs better under the condition of lower quality of point correspondences.

Comparison and Summary.
In this section, the learning-based registration methods are summarized and compared in Table 3. For a better understanding, approaches mentioned in Table 3 Figure 17: Architecture of PointNet, in symmetry function for unordered input, the output for classification is score for k classes.

Set abstraction Set abstraction
Sampling and grouping PointNet Sampling and grouping

PointNet
Hierarchical point set feature learning proposed in 2020.
anks to the existence of various datasets, many learning-based methods can be put together and compared with each other in terms of the same indicators, such as time consumption, accuracy, and robustness to noise. However, in Table 3, the performance based on evaluation metric is our main concern because time consumption is highly dependent on the computational devices which are very diverse. Furthermore, the robustness to noise is also difficult to measure uniformly, since different noise generation distributions may affect the performance of algorithm unless the algorithm is tested for several times with different noise distributions, as in the case of Peng Li [39] applied various standard deviations of Gaussian noise to point cloud.
Moreover, Network type refers to the predominant methods used to solve correspondence estimation and transformation estimation problem, though there are secondary networks applying for prediction of specific parameters, such as RPM-Net optional uses a secondary network to adjust parameters during iteration. Need refinement indicates whether this method needs other solutions to produce a higher precise result, probably because the accuracy is not enough. Eventually, Tested dataset and Evaluation metric are an important part of characteristics for learning-based methods because a uniform and standardized indicator is provided for performance.

Methods with Probability Density Function.
Point cloud registration based on a probability density function (PDF), such that using a statistical model for registration, is a wellstudied problem [179,180]. e key idea in this method is to represent data by a specific probability density function, such as Gaussian mixture model (GMM) and normal  distribution (ND). en, the registration task is reformulated to the problem of aligning two corresponding distribution, followed by an objective function measuring and minimizing a statistical discrepancy between them [179]. Meanwhile, due to the representation of PDF, the point cloud can be regarded as a distribution rather than many individual points, so that it avoids the estimation of correspondences and has a good performance of robustness to noise, but in general slower than ICP-based methods [181]. In this section, methods jointly with probability density function for point cloud registration are introduced.

Gaussian
Mixture. e point cloud registration task can be reinterpreted as a mixture density estimation and likelihood maximum problem. erefore, a general framework, namely, mixture point matching (MPM), was introduced by Chui et al. [25]. In particular, let us take a GMM under Euclidean transformation [180] as an example, given two point clouds that were defined by X and Y individually.
≤ j < m} be GMMs constructed from X and Y, with means single point x i and y j , variances σ 2 ix and σ 2 jy , mixture weights ϕ X i and ϕ Y j , and the number of items n and m, respectively. en, the L 2 distance transformed Ω X and Ω Y is given by where T(Ω, R, t) denotes the function that rigidly transforms Ω with R and t, and p(p|Ω) is the probability of observing a point p under a mixture model Ω � μ i , σ 2 i , ϕ i , defined by where Γ(p|μ i , σ 2 i ) is the probability of Gaussian, [p(p|T(Ω X , R, t))] 2 is invariant under rigid transformation, and [p(p|Ω Y )] 2 is independent of (R, t). en, the closedform expression for an objective function is f(R, t) � − R 3 p p|T Ω X , R, t − p p|Ω Y dp, (9) which can be minimized to produce (R, t) through kernel density estimation (KDE), [182] expectation-maximization [183], or support vector machine (SVM) [184].

Coherent Point Drift.
Coherent point drift (CPD) was introduced by Myronenko et al. [185] and further improved in [186]. Given two point clouds, similar to [25], CPD fits a GMM to the reference point cloud, and the Gaussian centroids are initialized from the points of reading data. e core of the CPD method is to coherently move GMM centroids as a group to preserve the topological structure of the point cloud. In CPD-based registration, a weighted GMM probability density function is defined as where M is the number of points Y y 1 , y 2 , . . . y m in reading point cloud, N denotes the number of points in reference data, and w(0 ≤ w ≤ 1) is a weight parameter that reflects distribution of the amount of noise and outliers in point cloud. It is generated by reference data comprising X � x 1 , x 2 . . . , x n and capable of explicitly modeling outliers. e GMM centroid location can be adjusted by a set of transformation θ and σ 2 that can be estimated by minimizing the negative log-likelihood function in equation (11), under data assumption of independent and identically distributed: It is usually solved by the EM algorithm to find θ and σ 2 [183]. e EM algorithm proceeds by alternating P new and P old during exception and maximum steps until convergence.
Many modifications of the original CPD algorithm were introduced. For instance, to accelerate the speed of the CPD algorithm, an accelerated CPD algorithm was presented in [187] that incorporated squared iterative expectation-maximization to converge to the global extreme Looping computation One-time computation (3, 64, 64, 64, 128, K) sym. func.
Shared ϕ (P )   and keep the same monotonically as the EM algorithm [188], and a Dual-Tree Improved Fast Gauss Transform (DT-IFGT) method was used to further accelerate CPD algorithm. To avoid setting the value of w in equation (10), Peng et al. [189] presented a refined CPD algorithm using a hybrid optimization, and experiments showed that it was better than the traditional CPD algorithm in terms of robustness and accuracy. Jun et al. [190] modified the original CPD algorithm with the voxel-grid filter before processing to improve the speed of registration while maintaining the same accuracy.

Normal Distribution
Transform. e normal distribution transform (NDT) algorithm [191,192] regards input data as probability density function by the normal distribution of every cell (i.e., voxel box as discussed above) in the point cloud. It calculates the transformation between pairwise data as a minimization problem [193]. e probability density function is defined as where q and C are the mean vector and covariance matrix of every cell that contains point x, and c is a normalizing constant that can be set to one. q and C are calculated as where x k |k ∈ n is the point contained in the cell and n is the number of point in a cell. To formulate an optimization problem as a minimization problem, a score function of a pose p and transformation function T(p, x) is defined as Translation error --Rotation error "-" in Performance implies that this method compares with traditional solutions rather than the learning-based techniques. e Tested method refers explicitly to the solution tested. followed by HΔp � −g, where H and g are the Hessian and gradient of s. e increment Δp is solved by Newton's algorithm iteratively or Levenberg-Marquardt algorithm [191,194,195] and can be added to current estimate in each iteration.
NDT-based registration has many advantages, including more robust and faster than transitional ICP algorithm when processing large-scale data, supporting the multiresolution representation of point cloud [196], and having no explicit correspondences [191]. However, it is hard to achieve high accuracy compared with ICP [197]. Additionally, a key challenge in the NDT algorithm is how to choose an appropriate cell size for the point cloud because the accuracy and speed depend on the cell size. In particular, when cell size is small, it requires more computing quantity and memory, while if the call size is big, the resolution of cells is high so that it results in less accurate registration [193]. Nevertheless, due to the attractive contribution of the NDT algorithm, an NDT-based solution is still a good choice in a fine-to-coarse strategy at the coarse step [198,199,200]. erefore, many modifications related to the NDT algorithm were proposed by researchers. For example, Yong et al. [201] introduced an improved normal distribution transform algorithm in a precise registration step. In this method, PDF is replaced by a mixed probability density function. Multilayered NDT [202], key-layered NDT [203], and variable size voxel NDT [204] were proposed to determine an appropriate size of cells, for converging to the optimal result and achieving high-quality registration. To avoid the discontinuity of the cost function, i.e., there are points existing at the cell boundaries, Arun et al. [205] introduced a segmented region growing NDT that generates the Gaussian clusters after removing the ground plane for accurate registration.

Others.
Along with the development of the ICP-based, feature-based, learning-based solutions, and probability density function, the problem of point cloud registration also can be solved through other methods. Specifically, as we all know that the transformation between two point clouds can be easily found out by a set of correspondences. erefore, Aiger et al. [206] focused on looking for the high confidence correspondences directly that only consist of four congruent points. Furthermore, auxiliary data generated from auxiliary approaches, such as image-assisted and Global Navigation Satellite System (GNSS), can also be used to simultaneously provide critical information for the task of point cloud registration.

Fast Global Registration.
Fast global registration (FGR) [31] provided a fast strategy for point cloud registration where no initialization is required. Specifically, FGR operates on candidate matches that cover the surfaces of objects, and no correspondence updates or closest-point queries are performed. e special feature of this method is that a joint registration can be produced directly through a single optimization of a robust objective defined densely over the surface. However, the existing approaches to solve point cloud registration usually produce candidates or more correspondences between two point clouds and then compute and update a global result. Moreover, in the fast global registration, the correspondences are established at once in optimization and will not be estimated again in the following steps, as shown in Figure 26. erefore, the expensive nearest-neighbor lookup is avoided to keep at a low computational cost. As a result, the line process for each correspondence and a linear system for pose estimation in the iteration step are efficient. FGR evaluates on multiple datasets such as UWA benchmark [207] and Stanford Bunny [208], compared with ICP with point-to-point and point-toplane, and ICP variants such as the Go-ICP. Experiments showed that FGR outperforms in the presence of noise.

4-Points Congruent Sets
Algorithm. 4-points congruent sets (4PCS) [206] provided an initial transformation for reading data without starting position assumption. Generally, a rigid registration transformation between two point clouds can be uniquely defined by a pair of triplets in which one from reference data and one from reading data. However, in this method, it looks for special 4-points bases, i.e., 4-coplanar congruent points from each point cloud, by searching in small potentially sets, as illustrated in Figure 27. e best rigid transformation is solved in the largest common point set (LCP) problem. e algorithm achieves close performance when the overlap rate of the pairwise point cloud is low, and there are outliers. In order to adapt to different applications, many researchers introduced more significant work related to classics 4PCS solution [209][210][211]. For example, in this work [212], in contrast to the quadratic time complexity of traditional 4PCS, it presented a super-4PCS that achieved the performance with linear running time. Pascal et al. [213] proposed a keypoint-based 4-points congruent sets (K-4PCS) solution for coarse registration of point cloud through extracting 3D key points as the input that can be a discriminative representation of the reading data. Semantic-keypoint-based 4PCS proposed by Xuming [214] handles massive point cloud and improves K-4PCS in terms of accuracy and computational efficiency through semantic features for urban building scenes.

Methods with Auxiliary Data.
In various registration tasks, the problem of finding the transformation between pairwise or multiview can be transformed into estimating the spatial position of acquisitions in a common coordinate. Specifically, when retrieving point cloud data, some techniques worked with a depth camera can provide its location and orientation in real-time, such as the Global Navigation Satellite System (GNSS), inertial measurement unit (IMU), and image-based algorithms. Specifically, GNSS refers to a constellation of satellites that produce positioning and timing data to GNSS receivers. ese data can facilitate the determination of location. IMU is an electronic device to measure the orientation, velocity, and gravitational forces of the body. Image-based algorithms have been widely used in simultaneous localization and mapping (SLAM) technique that aims to localize an autonomous mobile robot in an unknown environment while building a consistent and increment map around the robot [215].
In applications related to mobile mapping systems (MMSs) in which a rapid registration process is required and too large data are collected at once, a GNSS-IMU assistance system for forest mapping that can provide rough registration simultaneously is presented [216,217]. However, the GNSS signal usually loses under some complex areas such as building occlusion and under the canopy. Marek et al. [218] proposed the use of graph-SLAM [219] to generate a local map of forest such that a stereo camera focuses on collecting data and a GPS is used to calculate an estimate of the global tree positions, wherein a 3D map is first generated by vision odometry and then improved by robust graph-SLAM. For 3D city models with color information, Bernard et al. [220] estimated the relative camera parameters from aerial image sequences as a coarse transformation for ICP algorithm refinement. Xuming et al. [221] introduced an image-driven registration for urban scenes, where a scanning network was constructed by exploiting the image information in order to guide the coarse pairwise registration task.

Applications.
e point cloud registration is a critical perquisition for computer vision and robotics. However, the registration problem in computer vision usually is converted to the image matching problem (i.e., image registration) which is studied and researched in many fields, such as the medical domain [222,223], handwritten digit recognition [224], and image stitching [225,226].
For the point cloud registration, the robotic application is a hot topic. Pomerleau [34] provided four types of applications that consist of several good use cases covering different kinds of platforms, environments, and tasks. e first type of applications is on search and rescue tasks that need a real-time registration, in which 3D maps are required and can be used to provide the users with situation awareness and support critical decisions about risky platform motions. In this field, 3D map reconstruction is a basic and popular area of research, but it is merely a single application without more use cases. e second type of applications is on automation of inspection. Some tasks need to execute inspection or maintenance operations in environments that are difficult to access by humans due to dimensional, temperature, or air quality constraints. e third type of applications is on the development of an autonomous surface vessel in order to support environmental monitoring of freshwater bodies, in which a 3D laser was installed to complement the analysis of the ecosystem with geological information. e fourth type of applications is on an autonomous-driving or self-driving car. In addition, recently the deep learning techniques based on the point cloud are widely used [227] which take the raw point cloud as input directly.

Summary.
In this section, five categories of solutions and methods for solving point cloud registration are introduced. Specifically, in the ICP-based methods, its  steps, variants, and implementation details are described, and then the popular and open-source libraries or software is given. To use the ICP-based algorithm better, we need to know that ICP-based algorithms can obtain a good alignment result if providing an initial position for a point cloud to be aligned. e key is how to generate a good initial position for your applications. erefore, for applications that need point cloud registration, a popular strategy was proposed [228,229], which consists of two steps: (1) computing a global transformation as an initial position for the point cloud and (2) calculating a local transformation that can guarantee the result of registration. ese two steps are called coarse-to-fine strategies. In this case, feature-based, learning-based, and probability-based methods are usually used to produce a global transformation, and then further improvements are made by the ICP-based algorithms. e point-to-point and point-to-plane ICP methods are widely used in realworld situations [39,90]. To use the ICP algorithms better, many scholars considered the evaluation of ICP and its variants. For example, Peng et al. [39] took overlap ratio, angle, and distance as the influence factors of ICP and evaluated the performance on the same four datasets. In addition, Martin et al. [197] and Mouna et al. [230] compared the ICP and NDT in terms of accuracy and speed. Lachat [231] studied different software and tools to solve point cloud registration in terrestrial and airborne laser scanning. To overcome the bias on experiments, Pomerleau [232] provided datasets that cover a diverse range of challenging environments for registration algorithms.
Feature-based and learning-based methods are studied increasingly in recent years, because in general the point cloud registration problem was straightforwardly divided into several modules: feature extractor, matching, outlier rejection, and motion estimation. For the feature extractor module, in addition to the PFH and SHOT, what we describe in this section is the local-based feature. Various methods to extract the feature and description of the point cloud are also studied and summarized [140]. Feature-based methods usually extract pairwise or higher-order relationships in the histogram in a handcrafted way. However, many of these handcrafted descriptors that highly rely on distinctive geometric properties of point cloud data merely work well on real-world point clouds [233] that are noisy and low density. Learning-based methods involve regressing the transformation parameters by searching for the difference between features which generally are global-based. e stateof-art methods presented recent years are introduced, and a brief comparison of their performance and modules is provided. ey are less sensitive and more robust on realworld data. To obtain a good initial transformation, the learning-based and feature-based methods can be helpful. For better using these techniques, we should pay attention to the following points: (1) a feature detection algorithm should be considered, which attains the trade-off of accuracy and efficiency. (2) A learning framework should be established, particularly for the applications containing data collection and training. In addition, probability-based methods are also used for initial pose estimation, such as NDT and CPD that are often utilized for nonrigid registration.

Conclusion and Outlook
Point cloud registration is a crucial step for many applications that require to build a 3D map environment and recover the 3D surface of an object or scene. However, it is impossible to retrieve all surface data at once or directly obtain the relationship between two point clouds in each coordinate system. erefore, to place a strong emphasis on the comprehensive review of pairwise point cloud registration with various state-of-the-art algorithms, this paper begins by the basics and principles of point cloud registration, and it then goes on to divide the registration process into five main classifications: ICP-based algorithm, featurebased, machine learning-based, probability-based solution, and others. With these registration categories, we provide insight that every method has its applicable fields and disadvantages. For two completely identical point clouds with an arbitrary position, a valid transformation could be computed by traditional algorithms, such as ICP-based algorithms and feature-based solutions. For two acquired point clouds, machine-learning methods have been developed to achieve more robustness and accuracy than traditional algorithms based on experiments. erefore, learningbased methods have attracted more and more attention for providing a new perspective and have achieved great success for solving point cloud registration. However, in real-world data, the task of registration will face uncertain situations, such as the variant of overlap, fluctuation of noise, and the huge amount of computation due to massive input data. erefore, developing a general framework to solve the point cloud registration in the real world is still a challenging task.
ere is a brief guideline for those who want to use point cloud registration as a tool: when it has the apparent characteristics of the captured objects, feature-based is a good choice for your applications that need point cloud registration. However, when it is not sure what the retrieved objects are, machine-based and probability-based solutions can always provide a reliable transformation as a rough result. In this case, an ICP-based algorithm should be applied to achieve a refined registration.
With the advancement of 3D imaging technology, the resolution of point cloud data and the number of point sets will increase dramatically. For instance, a point cloud accurately representing the details on a workpiece acquired from the structured light camera (SLC) now has more than tens of millions of points. One frame data approximately describing the subway station taken by a LiDAR have at least five million points. erefore, a technique that probably is integrated with hardware and software optimization to swiftly process and operate these unstructured point cloud data is required in the future. Additionally, as a quantitative measurement, a ubiquitous dataset containing ground-truth transformation and different environments, such as indoor and outdoor scenes, is beneficial for evaluating learningbased methods and previous strategies. As more and more public datasets are expected to be established in the future, we 24 Mathematical Problems in Engineering believe that a comparison between traditional algorithms and machine-learning methods is essential for us to choose an appropriate registration solution for a specific application.

Data Availability
No data were used to support this study.

Conflicts of Interest
e authors declare that they have no conflicts of interest.