1. Introduction
LiDAR acquires point clouds from the environment in three-dimensional vision. However, many uncertain environmental factors can affect the accuracy of the data. These include partial occlusion of targets and blurring of target motion. In addition, LiDAR systems often have low spatial resolution and may miss vital information about the environment [
1,
2,
3]. As a result, raw point clouds collected by LiDAR can be sparse and incomplete, leading to significant differences from the actual geometry of objects and affecting the sensor system’s perception of the environment [
4,
5]. To solve these problems, point cloud completion techniques can be utilized to reconstruct and restore the missing information in sparse and incomplete point clouds. The process involves using various algorithms to fill in gaps in the data and create a more realistic representation of the environment. By improving the accuracy of point cloud data, sensor systems can better perceive their surroundings and make more informed decisions [
6]. Traditional point cloud completion methods mainly include the geometric symmetry method, surface reconstruction method, and template matching method. The geometric symmetry method is aimed at symmetrical objects, which can hardly be adapted to objects in nature. Surface reconstruction methods are also interpolation and fitting methods, which infer the location of missing points from existing points. However, missing locations are often uncertain and uneven. It can hardly fill in the shape accurately. And template matching method requires large databases, and the missing target must exist in the database. Not only the complexity is high, but also the universality is poor. The traditional method can hardly complete the unknown shape. Some partial shapes of the target are a lot of incomplete, which will bring great difficulties to the completion task. Therefore, the most popular point cloud completion method is to use the learning and optimization performance of deep learning networks to estimate the shape of the incomplete part [
7].
Point cloud processing framework: In the completion task, the point cloud is disordered and unstructured, unlike a two-dimensional image. Normal convolution frameworks cannot be used in point cloud processing tasks. Therefore, different methods of framework have been proposed to solve the unstructured problem in the point cloud completion task. PCN [
8] utilizes the multi-layer perceptron proposed by Point-Net [
9] and directly maps part of the point cloud to the full shape through the encoding and decoding structure. It reduces intermediate steps and losses. It is also the most widely used point cloud processing framework. Since then, many scholars have proposed some other frameworks to process point clouds. GR-Net [
10] maps the point cloud to the grid. They carry out three-dimensional convolution to the point cloud grid to retain the complete structure information. Similarly, the regularized point cloud is equivalent to being unitized and down-sampled and cannot retain the features of the detailed points. DGCNN [
11] utilizes graph convolution on the restored topology to extract features from point clouds. However, its learning ability is limited, and local point clouds cannot be effectively restored. Top-Net [
12] introduces a layered tree network that adapts to point cloud topology, focusing mainly on overall structure while neglecting details. ProxyFormer [
13] designs a missing part sensitive transformer to transform the random normal distribution into reasonable position information. Predicted point agents are more sensitive to the characteristics and location of missing parts, and they are better suited for subsequent coarse-to-fine processing. USSPA [
14] proposes an unsupervised symmetric shape-preserving automatic coding network. And, unlike previous methods of training each class separately, it can train multiple classes of data at once through classifier-guided discriminators. ACL-SPC [
15] proposes a self-monitoring framework for point cloud completion. It takes a single partial input and attempts to output a complete point cloud using an adaptive closed-loop system that forces the same output for changes in the input. The challenge of recovering details is an important problem for almost all point cloud completion frameworks, as it determines the effectiveness of the completion process [
16,
17,
18]. However, these frameworks need to transform or change the original information, and the comprehensive ability of the network is limited. Therefore, BCA-Net chooses the multi-layer perceptron framework to extract the point cloud features of the network, which has greater advantages in efficiency and cost. On this basis, it conducts special treatment for detailed reconstruction.
Residual structure: Residual structure can alleviate the problem of gradient disappearance caused by increasing network depth, which will contribute to the complete point cloud completion effect. Notable work, such as SA-Net [
19], introduces a skip attention mechanism to generate complete point clouds with different resolutions by passing geometric information about local regions of incomplete point clouds. ASHF-Net [
20] proposes a layered folding decoder with gated skip attention and multi-resolution completion targets to efficiently utilize partial input local structural details. NSFA [
21] proposes two feature aggregation strategies to express and reconstruct coordinates from their combination. Therefore, BCA-Net proposes a residual deformation architecture to reduce the noise and focus on the feature learning process. Every path optimization may lead to a deviation in the learning direction. Therefore, it is necessary to maintain regularization. Features are constantly normalized in the convolution process to prevent perturbations caused by too many layers in the network. Many networks make use of it to achieve efficient feature extraction [
22,
23]. The residual deformation architecture can ensure the accuracy of information and prevent structural deviations during multi-scale changes [
24,
25].
Detailed reconstruction method: The single use of the point cloud processing framework is insufficient for point cloud completion tasks. It may implement a global shape estimate, but the details are ignored. Therefore, it is necessary to design detailed reconstruction modules to enhance the point cloud completion effect. PF-Net [
26] proposes a multi-resolution encoder (MRE) to extract multi-layer features from partial point clouds and their low-resolution feature points. It enhances the ability of the network to extract semantic and geometric information. Folding-Net [
27] proposes increasing the possibility of points by learning to fold two-dimensional grids into three-dimensional space. Based on it, ECG [
28] proposes an edges-aware feature extension module that up-samples or expands point features via graph convolution, which preserves local geometric details better than simply copying point features up-samples. Spare-Net [
29] proposes channel-focused convolution of edge perception, which not only considers local information in K-neighbors but also makes wise use of the global context by aggregating global features and weighting each point’s feature channel concerns accordingly. VRC-Net [
30] adopts a two-subnetwork structure. One subnetwork uses a coarse complement point cloud generated by another subnetwork to enhance structural relationships through learning multi-scale local point features. It uses Point Self-focused kernel (PSA) and Point Selective Kernel Module (PSK) to further improve performance for detail recovery. Cascaded-PC [
31] designs a lifting module to double the size of the points while refining the position of the points through feature shrinkage and expansion units. LAKe-Net [
32] designs a refinement subnet to feed the multi-scale surface skeleton to each recursive skeleton auxiliary refinement module during the complement process. Continuous optimization of the point route is effective in restoring shape details. PMP-Net [
33] proposed an RPA module based on the original GRU [
34,
35] to memorize and aggregate the route sequence of points and obtain the exact position by continuously refining the moving path. Path aggregation is an efficient way to optimize point cloud details. However, the independent control of the reset gate in the polymerization unit will affect the polymerization effect. The aggregation unit can be optimized, and the effect needs to be enhanced.
Therefore, to further improve the point cloud completion performance in detail, it proposes a bidirectional confidence aggregation unit in the network. By adding a confidence gate to the update gate and the reset gate, BCA-Net can continuously standardize the reliability of points. It can strengthen the context information and guide the point cloud details of network completion. Because error points and noise points are easily occur in the entire process of point cloud compensation, not all point reliability is consistent. It considers the confidence levels of updates and resets when moving path predictions, rather than being confident that all points are correct. The confidence gate allows for more accurate and reliable point route refinement, resulting in better point cloud detail recovery.
In addition, it proposes a break and recombine refinement module to further enhance the point cloud detail reconstruction in the network. It is designed to assist in the recovery of point cloud details by fusing features at a deep level. Unlike traditional feature fusion methods that often require additional processing and optimization, the break and recombine refinement module allows for internal information processing by converting points from the optimization part into a new dimension. In image processing, a common practice is to convert the time domain image to the frequency domain for processing. It deletes or adds information in the frequency domain and then converts it back to the time domain. It enables internal information processing. From this point of view, it breaks and reorganizes the new points and completion points in the optimization part, and it combines one-dimensional and two-dimensional convolution. Fusion takes place at a deep level. Points are fused in a new dimension. In addition, the attention mechanism is used to weight the processing, making more important parts highlighted, and multiple branches are added to enrich the fusion level [
36,
37]. The structure is inspired by CC-Net [
38]. In BCA-Net, it built the breaking recombination module to enhance the fusion effect between points. It makes the point cloud details more precise.
The main contributions of the paper are summarized as follows:
It proposes a residual deformation architecture to regulate the learning direction of the network and reduce noise, ensuring the accuracy of the structure.
It designs the break and recombines refinement for high dimensional internal processing, achieving deep fusion and optimization, and recovering point cloud details.
It designs a bidirectional confidence aggregation unit to guide the recovery of point details by considering the confidence levels of updates and resets during moving path predictions.
The experiments demonstrate that our network has an enhanced effect on details and suppressed noise, achieving effective end-to-end point cloud completion.
2. Methods
Our network is mainly divided into two phases: residual deformation architecture and break and recombine refinement.
Figure 1 shows the process of the input point cloud from coarser to finer. In phase (a), the input point cloud is extracted and down-sampled by the set abstraction module. The residual structure is adopted in the feature propagation process, which retains the original characteristics of the network and prevents gradient explosion. It will be introduced in
Section 2.1. In addition, it designs a bilateral confidence aggregation unit to enhance local details and obtain coarse point clouds, which will be introduced in
Section 2.2. In phase (b), it designs the break and recombine module for further refinement to obtain the final complete point cloud, which will be introduced in
Section 2.3.
2.1. Residual Deformation Architecture
As shown in the lower-left corner of
Figure 1a, it designs a residual deformation architecture to extract the features of the fragmentary point cloud. It connects the point cloud of the previous stage through the
layer. The residual structure can regulate the direction of the network and reduce the noise. If the scale of the feature is
, the output result can be expressed as
The set abstraction (
) layer in the network abstracts a set of points into a set of fewer points and constructs a local region, using the centroid neighborhood grouping. The local region is then encoded into an eigenvector, which helps compensate for the loss of precision caused by maximum pooling. The feature propagation (
) layer utilizes an inverse distance weighted average based on K-nearest neighbor to interpolate features point by point. It connects features to the original point set through the unit point network. The diagram of the
module and
module is shown in
Figure 2, which are all from PointNet++ [
39]. The output result of the structure is a coarse point cloud
.
The residual deformation is used in the feature extraction step. Its purpose is to avoid deviating from the right direction due to the deepening of the number of network layers while extracting point cloud features. In the whole feature extraction process, the point cloud first passes through the set abstraction () layer to reduce the resolution, and then through the feature propagation () layer to improve the resolution. In the process of improving the resolution, the fusion of the original point cloud features with the same resolution is required. Our residual deformation further actively guides the direction of the convolution to optimize the features after fusion. It can reduce the loss caused by resolution changes.
2.2. Bilateral Confidence Aggregation Unit
As the network gradually replenishes the missing point cloud, the number of points will multiply. The accuracy of the added points is ignored. Once the optimization direction is wrong, noise points and discrete points will accumulate. To solve the problem, PMP-Net proposes a recurrent path aggregation (RPA) module. Further, the paper proposes the BCA module to further determine the reliability of the reset gate by adding a new gate control. The optimization for point paths changes the optimization idea and gradually optimizes the position of the point through a point path search. The network can plan the learning path for noise and error points. The point path search method can plan the learning path of noise points and error points. It utilizes a coarse-to-fine route for iterative learning. It is based on RGU, which is a kind of recurrent neural network.
Recurrent neural network (RNN) has excellent performance in processing sequence data. Its core concept is the cell state and gate structure, which can combine the front and back inputs across time. Therefore, it can be used as an iterative unit of the point path. The cell state is equivalent to a path that can transmit relevant information, and the feature vector can be passed on as a memory. Gate structure determines whether information needs to be remembered or forgotten.
Figure 3 shows a comparison of our bilateral confidence aggregation unit with other similar modules. Each iteration structure retains both the output and the hidden state as the path optimization direction of the current learning. And it is passed to the next iteration process. gated recurrent unit (GRU) is a type of RNN and is widely used because they are easy to train and have high performance. As shown in
Figure 3a, it has two gates: the reset gate and the update gate. It utilizes the sigmoid function as the gating signal. The information control signal is between 0 and 1, with values closer to 0 indicating that the information should be discarded and values closer to 1 indicating that the information should be retained.
The reset gate selectively combines input information with memory information to determine the degree of retention of information from the previous moment. The reset gate output at time
can be expressed as
where
is the sigmoid function.
is the weight matrix of the reset gate.
is the hidden state at time
.
is the input at time
.
The update gate can determine which information should be discarded or retained and select the updated information to be output. The update gate output
at time
can be expressed as
where
is the weight matrix of the update gate.
The candidate’s hidden state
can be expressed as
where
is the weight matrix of the hidden state.
is the tanh function.
PMP-Net introduced the gated recurrent unit (GRU) module into the point cloud completion network as a path search optimization unit. They design a recurrent path aggregation (RPA) module which is shown in
Figure 3b. PMP-Net proves that the module has a better effect than the original GRU module through experiments. However, the sequence processing needs high semantic requirements, so the need for hidden states is higher. The main body of point path search is the original input path, and the hidden state is used to assist path optimization. In addition, the gradient of the sigmoid function and tanh function is close to 0 near the extreme value, and the optimization algorithm updates the network slowly. As a result, the Relu functions are better suited for deep networks.
Like GRU modules, RPA modules include a reset gate and an update gate. The difference is that the candidate’s hidden state
is expressed as
where
is the Relu function.
The output result of the time
can be expressed as
It designs the bilateral confidence aggregation (BCA) unit to further improve the gating performance. As shown in
Figure 3c, it designs a new confidence gate after the reset gate to control the confidence of the network to the previous hidden state, rather than directly introducing it. The confidence gate quantifies the uncertainty of the hidden state and measures the confidence of path movement. It is very effective in restoring shape details. Unit mainly consists of two branches, the hidden state of the previous moment is the top branch, and the input of the previous moment is the bottom branch. In the RPA structure, the control for both the reset and update doors comes from the direct fusion of the top and bottom branches, ultimately controlling the aggregation of the bottom branches. In this process, the effect of the top branch is not obvious, and the hidden state is directly exploited without processing. In the BCA module, the hidden state is first further aggregated and restricted, the top branch is effectively used. The module realizes the bidirectional aggregation of the top and bottom branches.
The confidence gate output
at time
can be expressed as
The candidate’s hidden state
is expressed as
The output result
can be expressed as
2.3. Break and Recombine Refinement
After several iterations, the raw point cloud is completed into the coarse point cloud. As shown in
Figure 4, it designs the break and recombine refinement module to further add detail points and optimizations to the point cloud. Breaking means increasing dimension, and recombine means decreasing dimension. Our fusion optimization takes place in high-dimensional space. To better show the structure, it utilizes the dimensions that represent the features. It obtains the coarse point cloud
with
points through the previous step and sets the point
with
points to be fused. It utilizes the method of initial concatenation and then fusion. The concatenation result of the two sets of data
can be expressed as
Next, the splicing results are broken down into smaller parts. During the breaking process, dimensions are increased, and internal information is revealed. It allows for a more detailed analysis of the point cloud structure. The goal of the step is to reveal the hidden relationships and patterns within the point cloud, which can then be used for further processing and analysis. The high-dimensional fusion point cloud
can be represented as
where B is the batch size, C is the number of channels, and dim is the broken dimension.
and
are the result of the point cloud dimension expansion. AD is the ascending dimension.
It optimizes the result of raising dimensions in higher dimensions. To better integrate internal information, it sets up a structure of one main road and two auxiliary branches. The three routes can be expressed as , , and .
It constructs a cross-multiplicative structure to propagate the features of the main road to the branch road, which can be obtained as
where
is a dimensional reconstruction operation.
The initial result of the branch
and
can be expressed as
where
is matrix multiplication.
Finally, it carries out the fusion of each branch and gets the feature confidence
that can be expressed as
where
is the weight matrix. Feature confidence is used to focus on the more important part of the feature to be fused. The fusion result
can be expressed as
2.4. Loss Function
It utilizes Chamfer Distance and Earth Mover’s Distance as loss functions to train the network. Chamfer Distance measures the average distance between each point in a point cloud and the nearest point in another point cloud. It is the most extensive way to judge the completion effect. For point clouds
and
, the CD is defined as
EMD is a definition of distance measurement that can be used to measure the distance between two distributions, which can be expressed as
where
is considered as a bijection that minimizes the average distance between corresponding points in
and
.
The total loss function is defined as