Semantic Segmentation of In-Vehicle Point Cloud With Improved RangeNet++ Loss Function

To solve the problem of inaccurate object segmentation caused by unbalanced samples for in-vehicle point cloud, an improved semantic segmentation network RangeNet++ based on asymmetric loss function (AsL-RangeNet++) is proposed, which uses asymmetric loss (AsL) function and Adam optimizer to calculate and adjust object weights, achieve optimal point cloud segmentation. AsL-RangeNet++ can solve the problem of unbalance between positive and negative samples and label error in multi-label classification by calculating the weights of positive and negative samples respectively and more accurately segments the point cloud of small targets. A large number of experiments on the widely used SemanticKITTI dataset show that the proposed method has higher segmentation accuracy and better adaptability than the current mainstream methods.


I. INTRODUCTION
The 3D point cloud semantic segmentation based on the deep neural network can be divided into direct point cloud segmentation and indirect point cloud segmentation [1]. The Direct Point cloud segmentation method is to input the original point cloud directly into the network without any operation, and then segment it to make full use of the point cloud data, so that the network does not need any complex computation. However, all point clouds are calculated in the network, which will inevitably lead to too much data and reduce the network's computing speed. The Indirect Point cloud segmentation method is to transform the irregular point cloud obtained by sensors into a regular point cloud The associate editor coordinating the review of this manuscript and approving it for publication was Wai-Keung Fung . through a series of operations and then input the network for segmentation. Indirect Point cloud segmentation methods are classified into voxel-based segmentation methods, multiview-based segmentation methods, and irregular point cloudbased projection segmentation methods.
The most classic direct point cloud segmentation method is PointNet designed by Charles et al. in 2017 [2], it processes the original irregular point cloud scanned by LiDAR, and uses the maximum pool operation to obtain the global characteristics of the point cloud. In this process, the order of the point cloud does not change, and the classification probability of each point cloud is output by a Multilayer perceptron. PointNet is a more complete structure that integrates object classification, sample segmentation, and scene semantic information. Because of the difficulty of local feature extraction, PointNet is limited in recognizing and segmenting VOLUME 11, 2023 This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/ complex scene information and more detailed scenes. Subsequently, Charles et al. overcome this shortcoming by proposing a hierarchical neural network-PointNet++ [3], which can learn local features adaptively and hierarchically as the spatial scale information of the original point cloud increases. Based on PointNet++, new algorithms have been proposed recently. ResPointNet++ with U-Net encoder-decoder architecture was proposed in 2021 [4], which adopts a deep learning network with a residual mechanism through two neural network modules local aggregation operators and residual bottleneck module. The network structure has a U-Net encoder-decoder, which combines the deep residual network with conventional PointNet++ network to learn the complex local structure. Han et al. proposed a semantic segmentation network for urban scenes, the network consists of an efficient point-cloud subsampling strategy, a point-based feature extraction module, and a loss function for improving the overall performance [5]. Deng and Qiu put forward a global attention network for Point cloud semantic segmentation: GA-NET, which consists of a point-irrelevant global attention module and a point-independent global attention module, the first module learns the semantic information about points through random cross attention, the second shares the global map of all 3D point cloud, and uses the point cloud adaptive aggregation module to cluster the features [6].
In transforming the original point cloud into a regular point cloud by a multi-view-based segmentation method, Su et al. [7] proposed a novel CNN structure, which divides the point cloud with 3D shape into multiple rendering views independent of each other, and identifies the 3D point cloud through a single view. This algorithm has higher recognition accuracy than the most advanced shape descriptor at that time. The multi-views point cloud segmentation method can combine the information of these views into a single, compact shape descriptor, but can't make full use of the intrinsic relationship between views, To solve this problem, Jiang et al. proposed a multi-loop view convolution neural network [8]. This method introduces the hierarchical view-loopshape structure to represent the three-dimensional shape from different scales, it also can make full use of the relationship between views and improve the segmentation accuracy.
In voxel-based segmentation methods, Tchapmi et al. proposed an end-to-end framework SEGCloud [9] combining neural networks, tri-linear interpolation, and fully connected Conditional Random Fields (FC-CRF) based on coarse voxels for semantic segmentation of point cloud. Wang et al. proposed a convolutional neural network O-CNN [10] for 3D classification based on an octree system, which combines a convolutional autoencoder extreme learning machine (CAE-ELM) and an octree system. In this structure the input point cloud is voxelized for octree, features are extracted based on multiple geometric data, and feature discrimination is achieved by CAE-ELM. To mine more information features to enrich the context representation of 3D point cloud semantic segmentation, Cheng et al. proposed a neural network named PTANet, which consists of two parts: triple attention block and density scale learning strategy [11]. The triple attention block is composed of a location attention module, channel attention module, and local area attention module respectively, which are used to realize the connection and update of context features. The density scale learning strategy uses adaptive band kernel density estimation to fit the density scale of each point to complement the density information of local features.
In irregular point cloud-based projection segmentation methods, SqueezeSeg [12] is an end-to-end point cloud segmentation network, it takes the original LiDAR point cloud as input and performs spherical projection to obtain a 2D depth map for a more compact point cloud representation, then obtains instance-level labels by conventional clustering algorithms. SqueezeSegV2 [13] is more robust to dropout noise in point cloud by improving model structure, training loss, batch normalization, and increasing input channels, it consists of three major modules: learned intensity rendering, geodesic correlation alignment, and progressive domain calibration. For real-world data, the test accuracy of SqueezeSegV2 almost doubled that of SqueezeSeg. Squeeze-SegV3 [14] uses spatially adaptive convolution (SAC) to employ different filters according to different locations in the depth map, which overcomes the shortcomings of previous algorithms in network capacity utilization and segmentation performance degradation and effectively improves the segmentation accuracy. FPS-Net [15] uses the uniqueness and differences between image channels of the original point cloud after projection to segment it. In this method, multichannel images are grouped into different patterns, the specific features learned under different modules in the network are fused at the pixel level and the fused features are decoded hierarchically into the output space for pixel-level classification by circular convolution.
RangeNet++ [16] was proposed based on the ideas of SqueezeSeg and SqueezeSegV2 by . It uses the Darknet framework for feature extraction and the distance image generated by the original point cloud projection as the network input, combined with a convolutional neural network for deep image segmentation. The final segmentation results are obtained by a post-processing method based on a KNN search. However, when the training samples are unbalanced, the training results tend to be larger, resulting in inaccurate segmentation results.
Compared with the current popular point cloud segmentation method, we optimize the RangeNet++ network based on the AsL function which is more adaptive to small targets. The probabilistic transfer and asymmetric loss mechanisms of AsL-RangeNet++ are used to make the network better adaptable for segmented point clouds. The Adam optimizer is used to optimize the training process and speed up the convergence of the model training. Experiments conducted on the SemanticKITTI dataset show that our method is superior to the current algorithms. The rest of this paper is organized as follows. In SectionII, the principle of the RangeNet++ network is introduced and established by the original author. In SectionIII, we analyzed and improved the problems of RangeNet++, used the AsL function to calculate the weight of each category, and conducted network training. In sectionIV, the experimental results of the proposed algorithm are compared and analyzed with those of the current algorithms on the SemanticKITTI dataset. Finally, the obtained conclusions are listed in the fifth section.

II. PRINCIPLE OF RANGENET++
Rangenet++ is a deep convolution neural network structure that can accurately perform semantic segmentation using only the LiDAR point cloud and its reflectivity [17], this network can infer the label of the point cloud without discarding any point. The original point clouds obtained from the LiDAR are first spherical projected, then, the generated Range Images are fed into the fully convolutional neural network (CNN) and down-sampled in the horizontal direction. Finally, the features of the Range Image are extracted by the Darknet53 framework.

A. POINT CLOUD PROJECTION
LiDAR is scanned in the vertical plane by a laser beam for imaging, and this progressive exposure is equivalent to the rolling shutter of the camera. Unlike the Global shutter, the rolling shutter scans the frame line by line and exposes it line by line in a short period until all pixel points in the frame have been exposed. The advantage of this exposure method is that the scanning speed of each frame becomes faster, but the disadvantage is that it brings the ''Jelly Effect'' at the same time of exposure [18]. That is when there are dynamic objects in the scene or the camera is moving, different parts of the moving object are exposed successively at different times due to progressive exposure, and the captured image will appear a ''tilted'', ''wobbly'' or ''partially exposed'' situation.
To obtain the complete point cloud information scanned by the LiDAR, it is necessary to perform spherical projection on the point cloud and convert the original 3D point cloud that may be caused by the ''Jelly Effect'' into 2D depth image information to obtain a range image. That is, each point p i = (x, y, z) is converted to spherical coordinates and finally to image coordinates (u, v). The conversion formula is where h is the height of the Range Image and ω is the width of the Range Image, f = f _up + f _down is the angle of the vertical field view of the sensor, and r = ∥p i ∥ 2 is the range of point i.
After the point cloud is projected, the size of the obtained range image is h × ω and each pixel point includes the information of the point (x, y, z, r, f ). The allocation of each pixel point (u, v) corresponding to the point cloud is carried out in descending order of r value, which ensures that all points of range image are in the field of view of LiDAR. This projection method of mapping a 3D point cloud to a 2D image can apply 2D neural networks to a 3D point cloud, and can effectively solve the problem of insufficient samples during training.

B. FULLY CONVOLUTIONAL SEMANTIC SEGMENTATION
After the Range Image is obtained from the point cloud projection, the encoder-decoder hourglass structure can be used to divide the network by 2D full convolution, as shown in Fig. 1. The characteristic of this structure is that it has a significant downsampling encoder, which can effectively speed up the running speed compared with non-down sampling.
The network is based on the Darknet deep learning framework proposed in YOLOv3 [19], integrates the principle of pyramid type feature map, and combines the idea of deep residual network ResNet [20]. While ensuring accurate feature representation, the gradient problem is avoided to some extent. During training, stochastic gradient descent and weighted cross-entropy loss function are used to optimize the end-to-end process.
In order not to change the information of the vertical direction of the point cloud, this method only performs downsampling in the horizontal direction along with the Range Image in the encoder of the upper half of the network. For VOLUME 11, 2023 a Range Image of 64 × 1024 pixels in size, the number of pixels in the horizontal direction is reduced by 32 times, but the number of pixels in the vertical direction is still 64. The advantage of this processing is that it reduces the computational resource consumption while ensuring that the information in the vertical direction of Range Image does not change. The training speed is improved and the 2D-3D point cloud reconstruction process is accelerated.
The 2D-3D point cloud reconstruction is generally achieved by matching the parameters of the LiDAR, the scanned points, the distance between the LiDAR, and the coordinates of the pixel points in Range Image. However, when generating a Range Image by the spherical projection of the original point cloud, a large number of 3D point clouds are deleted. For example, 11,808 point clouds are deleted when projecting a laser point cloud image with 20,000 point clouds into a Range Image with the 32 × 256 pixel size. Therefore, to reconstruct all the point clouds in the semantic map, all the pixel points (u, v) in the Range Image are matched with all the original point clouds in the laser frame corresponding to the Range Image, and the point cloud information (x, y, z, r, f ) contained in each pixel point is indexed according to the 3D coordinate information of p i . After establishing the mapping relationship between them, labels are quickly generated for the point clouds corresponding to each pixel point without loss, to obtain a 3D point cloud space containing part of the semantic information. The mapping relationship is as follows where (x, y, z, r, f ) i is the coordinate information of the original point cloud p i , and (u, v) j is the j-th pixel in the Range Image.

D. POST-PROCESSING
After reconstructing the 2D distance image into the 3D point cloud space, to identify the points deleted due to spherical projection in the 3D point cloud space and determine which category these points belong to, a fast, GPU-based K-NearestNeighbor (KNN) [21] search method is used directly in this 3D point cloud space to traverse all the original point cloud and determine their semantic information. In the KNN search, the k points closest to each point in the original point cloud can be found in the 3D point cloud space containing some semantic information, and the Euclidean distance between the k points and the point can be calculated.
where (x, y, z) is the coordinate of the point, and (x i , y i , z i ) is the coordinate of the i-th point among these k points.
After traversing all the points, determine which point in the semantic point cloud is closest to the original point in the point cloud space, and then it belongs to the same category as the semantic point. A threshold is also set as the maximum allowable distance of those adjacent points, if the Euclidean distance of an adjacent point exceeds the threshold, it is considered that the adjacent point does not belong to the same class as the original point. By KNN search, each point in the original point cloud can be accurately clustered and segmented.

E. CEL FUNCTION
The original RangeNet++ model [22] uses the Cross Entropy Loss (CEL) function with increased weights for each category in the dataset to represent the gap between the predicted and actual data, i.e.
where c is the category. ω c is the weight of the category and is inversely proportional to its frequency of occurrence. y c is the unbounded logarithm, which is defined as where log it c is the unbounded output of category c. This method only discriminates the samples positively or negatively and calculates the class prediction for each pixel vector separately and the pixels in the image are learned equally without any difference. When LiDAR is used as an environmental sensing sensor, the obtained point clouds often have an uneven distribution of categories. This leads to the training process being dominated by the category with the higher number of point clouds, and features from smaller categories in the sample are ignored, which makes the learning results inaccurate. In deep neural networks, different categories in the data can be classified into hard positive samples (hard positive), hard negative samples (hard negative), easy positive samples (easy positive), and easy negative samples (easy negative). Because the CEL function only differentiates the samples positively/negatively and not easily/hard, it cannot solve the problem of poor final classification due to the difficulty of segmentation and the small number of point clouds of categories in the samples.

III. IMPROVED RangeNet++ AND ITS TRAINING A. IMPROVED METHOD
To reduce the impact of negative samples on training, and improve the segmentation accuracy of point clouds of small targets, in this research we use the AsL function as the Softmax function to calculate the weights and segment and recognize the objects for each category in the Range Image. The training process of the improved RangeNet++ is optimized by using the Adam optimizer. Finally, the 2D Range Image containing semantic information is re-projected back to the 3D point cloud space by KNN search to obtain the final segmentation result. The flow chart of the improved AsL-RangeNet++ is shown in Fig. 2.

B. ASL FUNCTION
Focal Loss (FL) [23] can segment the point clouds of small targets more accurately by increasing the easy negative weight, however, when the positive and negative samples are not balanced, the segmentation effect will be inaccurate. To overcome this problem, we propose an asymmetric loss (AsL) function, which integrates the two mechanisms of asymmetric loss and probability transfer into one formula, that is where, p is the network output probability, p = σ (z), σ (z) is the sigmoid function, L + is the loss component of the positive sample, L − is the loss component of the negative sample, and γ is the loss parameter. In formula (6), then γ + = 0, L + is the normal CEL function, and L − can reduce the loss caused by easy negative by the marginal probability m. Due to the need to focus more on the contribution caused by the positive sample, set γ + > γ − . Through this setting, the contribution of positive and negative samples to the loss function can be better controlled, and the network can learn valuable features from the positive samples.
When the negative sample probability is low (below the soft threshold), the asymmetric loss reduces the contribution of negative samples to the loss. Since the degree of imbalance in multi-label classification can be very high, this approach is not always ideal for reducing the probability of a negative sample. An additional asymmetric mechanism-probability drift is used for hard-threshold negative samples. It means that when the probability of a negative sample is very low, the loss due to that sample is completely ignored. The shifting probability is defined as where the marginal probability m is an adjustable hyperparameter, m ≥ 0. Some objects with small volume and small quantity account for a relatively low proportion in the overall point cloud, for these samples the probability drift with a hard threshold make the model actively ignore the loss caused by the least point cloud in the process of learning and training, and get better segment effect. Compared with FL and CEL, AsL can more accurately segment the point cloud with a small proportion in the point clouds, and divide the loss function L into L + and L − based on the loss parameter γ , according to the loss parameters, so that it can adaptively adapt to the point cloud of different sample categories. AsL reduces the probability of easy negatives in the training process by soft thresholding γ − and hard thresholding m, improves the importance of easy misclassified samples and the accuracy of the model, and avoids the model being dominated by a large number of easy negative samples during the training process. For some important positive sample point clouds, such as pedestrians, which are difficult to segment and whose number is much smaller than that of other categories, the uneven samples are equalized in combination with the characteristics of AsL, and the weight of this category is increased when the model is trained so that the accuracy of the model can be improved.
The Adam optimization algorithm is used as the optimizer for the experiments in this paper based on the optimization speed of the model and the robustness of the hyperparameters, etc. Compared with Saccharomyces Genome Database (SGD) algorithm [24], Adam can effectively improve the convergence speed of training and prevent the perturbation phenomenon in training after dividing the samples into several batches. To a certain extent, Adam can increase the stability of the model, so that the model training will not fall into the local optimal situation. In the model training process, the learning rate is initialized to 0.005, and the batch size is 8. The parameters of Adam are updated as where α is step size, β 1 and β 2 are exponential decay rates, m t is first moment estimate, v t is second raw moment estimate, θ t is parameter vector.

C. MODEL TRAINING
KITTI dataset is the largest and most used automatic driving scene data set, its data acquisition platform includes LiDAR, Inertial Navigation System, Camera, and samples at the frequency of 10Hz. The data set includes 389 pairs of stereoscopic images and optical flow maps, 39.2km laser point cloud sequences and more than 200K 3D labeled objects. SemanticKITTI [25] is a new data set generated based on the KITTI data set, it added class labels to all point clouds and contains all data of 360 • scanning point clouds of Velodyne VOLUME 11, 2023  LiDAR. The data set includes 20 classes such as Person, Road, and Car, and is divided into 22 sequences, of which 00∼07 and 09∼10 are training sets, 08 is a validation set and 11∼21 are test sets. The experiment in this paper is conducted on a DELL T640 server, and the parameters of the experimental platform are shown in Table 1.
The IoU is defined as the intersection between class prediction P c and ground truth G c divided by their union part. For class c Due to the large size of the experimental dataset, the hyperparameters are fine-tuned based on each training result to determine the appropriate values of γ + and γ − . The comparison graph of IoU values for each category is shown in Fig. 3. It can be seen that the results obtained by training on the AsL function are slightly improved overall compared to the CEL function, but the difference in the effect for the Person class and Bicyclist class is not significant.
For the Person class and Bicylist class, their weights are increased by 0.1, and the others increased by 0.2 in each epoch of training, and then retrain again. In the end, the weights of the Motorcycle class and Truck class are finally adjusted to 2.5, the weights of the Vehicle class and Bicylist class are adjusted to 1.5, and the weights of the Person class are adjusted to 3.5. The adjusted IoU comparison diagram is shown in Fig. 4.
After the weight adjustment, epoch = 50 is set for training, and the decreasing trend of the final loss function value is shown in Fig. 5. It can be seen that the convergence speed is  faster and the final value is smaller and tends to be flatter after improving the AsL function compared with the CEL function.
It also can be seen from Fig. 5 that the loss values corresponding to both loss functions are decreasing during the training process of 50 epochs, but the value of the loss function corresponding to the method in this paper decreases faster than the original method, which is lower than the original method, and converges at 46 epochs, while the value of the loss corresponding to the original method is still changing. In general, the method in this paper is superior to the original method.

IV. MODEL TRAINING AND VERIFICATION A. ACTUAL SEGMENTATION EFFECT OF POINT CLOUDS
Semantic segmentation is performed using the trained model for continuous point cloud data in the test dataset in SemanticKITTI. Comparing the segmentation effect of the AsL function and CEL function, the two-dimensional depth map and segmentation map of Scene 1 and Scene 2 are obtained respectively, as shown in Figs. 6 and 7. In these figures, the ''Person'' class is marked as a red point cloud, the ''Road'' class is marked as a purple point cloud, and the ''Car'' class is marked as a blue point cloud.
As can be seen from Fig. 6, for Scene 1, the AsL function can segment and recognize people and objects with relatively small, far away and dense point clouds more clearly and accurately, It can not only recognize the overall outline of pedestrians more accurately but also recognize more objects  than CEL. In addition, AsL can segment different types of objects at different distances along the trunk road more completely and accurately, including traffic signs, telegraph poles, and grass in Scene 1, and CEL is far inferior to AsL in recognizing pedestrians and objects.
In Fig. 7, the segmentation effects of the two methods on Scene 2 are compared. In this scene, the point cloud on the left side is relatively dense, with a large number of pedestrians and static objects distributed. In this scene, the point cloud on the left side is relatively dense, with a large number of pedestrians and static objects distributed. Compared with CEL, AsL can accurately segment the outline of pedestrians, plants in the distance, etc., showing its superiority in segmenting dense point clouds.
To further analyze the segmentation characteristics of the two loss functions, we verified the 3D point cloud space of the above two scenes, and the obtained results are shown in Figs. 8 and 9. As shown in Fig. 8 AsL has a better segmentation effect on pedestrians than CEL, it not only recognizes the pedestrian in the front scene but also accurately recognizes the pedestrians in the rear scene. In addition, AsL can also obtain clear contours and accurate results for the segmentation of grass at the lower right and other categories, however, CEL has no good pedestrian segmentation effect, and the effects of other categories are also worse than AsL.
For the 3D point cloud space of scene 2, it can be seen from Fig. 9 that pedestrians and bicycles can be accurately segmented on the AsL function, and the high-density categories at the crossroad do not cause the low accuracy recognition of AsL. On the other hand, the point cloud on the edge of the pedestrian shadow is also correctly segmented on the AsL function. The segmentation effect of the CEL function in Scene 2 is worse than our proposed method, and for the high-density categories in Scene 2, there are some partial overlaps between the Person class and other classes in the segmentation result, and the recognition accuracy is lower than AsL.

B. EVALUATION AND ANALYSIS
In this subsection, the mean Intersection of Union (mIoU) [26] is used to analyze and validate the improved AsL-RangeNet++. mIoU is a standard evaluation index of semantic segmentation, which is the average of each IoU in the dataset. For the dataset with C classes, mIoU is calculated VOLUME 11, 2023 by the following formula IoU c (11) mIoU represents the ratio of the intersection of the predicted and true results to the concatenated set, which can be used to represent the accuracy of the detection. For SemanticKITTI dataset, C = 19.
To illustrate the accuracy of the proposed method, We used mIoU as the segmentation measurement and obtained the mIoU values of the AsL function, CEL function [16], FL function [23], CE+Softmax loss function [27], and Negative Log Likelihood (NLL) [28] loss function respectively on the data of sequence 11 to 21 of SemanticKITTI dataset as the test dataset. The mIoU values of these five methods for the SemanticKITTI dataset are shown in Table 2. As can be seen from Table 2, for most categories, such as Person, Road, Parking, Sidewalk, Vegetation, and Trunk, the mIoU value of AsL is larger than that of the other four methods, especially for the Person class, its mIoU value is 10.1% higher than the second highest accuracy loss function CEL and 35.8% higher than the lowest accuracy loss function CE+softmax. The average mIoU value of AsL is 50.7%, it is also higher than that of the other four methods. Therefore, it can be said that the segmentation of the AsL function is more effective and accurate, especially for certain categories with low occupancy, far scenes, and high-density categories.
The Confusion Matrix (CM) for all categories in the dataset is also calculated for the RangeNet++ model based on the  AsL and CEL function, and the calculation formula is where, P is precision, A is accuracy, R is recall, TP is the number of predictions that are positive and accurate, TN is the number of predictions that are negative and accurate, FP is the number of predictions that are negative and accurate, and FN is the number of predictions that are negative and wrong.
The CM results are shown in Fig. 10, in this figure, the x-axis represents the predicted labels and the y-axis represents the true labels, the main diagonal elements represent the accuracy of correct classification in the true labels and each cell in the CM indicates the proportion of correct classifications, the darker the blue, the larger the proportion.
Comparing (a) and (b) in Fig. 10, it can be seen that in the confusion matrices of the two methods, the values of the main diagonal elements of AsL are higher than those of CEL. Among them, for Car, Bicyclist, Motorcyclist, Trunk, Trafficsign and other categories, the prediction accuracy of AsL is 10% or even more higher than that of CEL, which indicates that the proposed AsL-RangeNet++ has higher segmentation accuracy than the original RangeNet++.
To fully illustrate the accuracy of the proposed AsL-RangeNet++ in the in-vehicle point cloud, we compare the algorithm with the current methods commonly used in the field of point cloud segmentation (SqueezeSegV2, Squeeze-SegV3, etc.), mIoU is used to compare these three models. The experimental results of the mIoU value are shown in Table 3.
It can be seen from Table 3, that compared with Squeeze-SegV3 and SqueezeSegV2, Asl-Rangenet++ has the highest mIoU value for each category, and demonstrates the best segmentation accuracy on the in-vehicle point cloud of the SemanticKITTI dataset. The average mIoU value of AsL obtained based on the Asl-Rangenet++ is 50.7%, it is great than the values of SqueezeSegV3 and SqueezeSegV2.

V. CONCLUSION
In this paper, a new point cloud data segmentation method AsL-RangeNet++ is proposed based on RangeNet++ and its CEL function to solve the problem of inaccurate object segmentation caused by unbalanced samples for the in-vehicle point cloud. By an asymmetric loss function with both asymmetric loss and probabilistic transfer mechanisms and Adam optimizer, AsL-RangeNet++ is more adaptable to point clouds with small occupancy, the weight of negative samples with a small proportion can be ignored and the impact of negative samples on training be reduced. By calculating the weights of positive and negative samples respectively, the proposed method can solve the problem of unbalance between positive and negative samples and label error in multi-label classification, Especially for small target point cloud data, it shows good adaptability and high classification accuracy. A large number of experiments on the widely used SemanticKITTI dataset show that AsL-RangeNet++ has better adaptability to the segmentation of this dataset and the segmentation effect of various objects in the dataset scene is also more accurate and the contour is clearer than PointNet++ and SqueezeSeg. Compared with Cross Entropy (CE), NLL, Focal and CE+softmax loss functions, AsL-RangeNet++ has the highest mIoU value, reaching 50.7%. In the follow-up work, we will explore unsupervised domain adaptation of LiDAR point cloud.