Real-Time Evaluation of Perception Uncertainty and Validity Verification of Autonomous Driving

Yang, Mingliang; Jiang, Kun; Wen, Junze; Peng, Liang; Yang, Yanding; Wang, Hong; Yang, Mengmeng; Jiao, Xinyu; Yang, Diange

doi:10.3390/s23052867

Open AccessArticle

Real-Time Evaluation of Perception Uncertainty and Validity Verification of Autonomous Driving

School of Vehicle and Mobility, Tsinghua University, Beijing 100084, China

^*

Authors to whom correspondence should be addressed.

Sensors 2023, 23(5), 2867; https://doi.org/10.3390/s23052867

Submission received: 25 January 2023 / Revised: 20 February 2023 / Accepted: 24 February 2023 / Published: 6 March 2023

(This article belongs to the Section Vehicular Sensing)

Download

Browse Figures

Review Reports Versions Notes

Abstract

:

Deep neural network algorithms have achieved impressive performance in object detection. Real-time evaluation of perception uncertainty from deep neural network algorithms is indispensable for safe driving in autonomous vehicles. More research is required to determine how to assess the effectiveness and uncertainty of perception findings in real-time.This paper proposes a novel real-time evaluation method combining multi-source perception fusion and deep ensemble. The effectiveness of single-frame perception results is evaluated in real-time. Then, the spatial uncertainty of the detected objects and influencing factors are analyzed. Finally, the accuracy of spatial uncertainty is validated with the ground truth in the KITTI dataset. The research results show that the evaluation of perception effectiveness can reach 92% accuracy, and a positive correlation with the ground truth is found for both the uncertainty and the error. The spatial uncertainty is related to the distance and occlusion degree of detected objects.

Keywords:

autonomous driving; uncertainty estimation; deep learning; perception uncertainty; object detection; spatial uncertainty; deep ensemble; prediction entropy

1. Introduction

Perception, decision-making, and control are three core modules in autonomous driving(AD) [1,2]. Anomalies and uncertainty in perception directly impact how AD systems thoroughly comprehend the world and make driving decisions [3,4]. For AD systems to operate safely, the estimation of perceptual uncertainties must be done in real time and with accuracy [5]. Perception algorithms significantly influence the effectiveness of perception outcomes [6,7,8]. AD perception algorithms have undergone the development stages of the rule model algorithm, machine learning, and Deep Neural Network (DNN) algorithms [9].

Among these, the perception algorithm based on the rule model performs parameter optimization through feature extraction and manual modeling in a “top-down” manner, which has poor versatility and low efficiency [10,11]. DNN algorithms driven by data can handle situations that regular model algorithms cannot handle. Especially when the training data are sufficient, DNN algorithms show better comprehensiveness and accuracy of perception [12,13]. With the development of computer hardware computing power, storage, and other technologies, the proportion of DNN algorithms in autonomous driving systems is growing higher and higher. However, DNN algorithms have uncertainty and poor interpretability due to dataset uncertainty, training process uncertainty, and network internal structure factors [14,15,16].

As shown in Figure 1, perception imperfection can have a crucial influence on AD, especially when missed detection and large spatial uncertainties occur around the vehicle. When the current perception has a serious error, the failure should be detected in real-time for possible emergency response due to safety requirements.

Existing research focuses on the quantitative evaluation methods, influencing factors, trigger mechanisms, and processing methods of DNN uncertainties [7,12,17]. However, most of these references focus on uncertainty analysis with only one DNN; there are few references on mutual inspection based on multiple neural networks. It is difficult for a single neural network to avoid missed detection, which is a typical long-tail scenario in perception. Many approaches have emerged for false detection [18], perception anomalies [19], and perception risk assessment [5,20,21]. Most of these methods are judged based on the continuity of the perception data stream and the confidence index of the object output. These methods lack matching and verification with other data, meaning that the accuracy must be demonstrated. In addition, there are research methods for studying the perception uncertainty from the perspective of the safety of the intended functionality (SOTIF) [6,22,23]. These studies focus more on capturing uncertainty and researching the trigger mechanisms of uncertainty. The utilization of uncertain results can be applied in decision-making and the adjustment of the internal structure of the network [24,25,26]. Even if certain references extract uncertainty and process the uncertainty inside the network or through a decision-making algorithm, there is no quantitative demonstration of the correctness of the extracted uncertainty. In addition, many methods cannot be implemented in real-time.

A complete and accurate uncertainty assessment must face three core issues. How can we extract uncertainty completely to deal with long-tail scenarios under missed detection conditions in real time? How can the extracted uncertainty be verified without the ground truth of the dataset? What are the factors influencing neural network uncertainty?

Therefore, how to judge the effectiveness and uncertainty of perception results based on the matching of multi-source perception results in real-time needs further research. This work proposes a novel real-time evaluation method for AD perception. We select two kinds of objects detection DNN algorithms as our basic perception methods: Pointpillars, based on LiDAR [27], and SMOKE (Single-Stage Monocular 3D Object Detection via Keypoint Estimation camera), based on camera [28]. Then, we analyze the uncertainty results to evaluate the effectiveness of the perception results in real-time. Furthermore, quantitative analysis is carried out on the spatial uncertainty of the detected objects. Finally, the uncertainty results analyzed are verified based on the ground truth. In addition, the factors influencing DNN uncertainty are demonstrated quantitatively.

Our contributions can be summarized as follows:

(1) A real-time perception effectiveness estimation algorithm is proposed combining multi-source perception fusion and a deep learning ensemble. The model can judge the effectiveness of perception results and capture the spatial uncertainties of detected objects. This model can handle missed detections that are not intractable for a single network.

(2) While judging the perception effectiveness in real-time, the model can extract the spatial uncertainty of detected objects in real-time. Our study of the correlation of the ground truth with the uncertainty and error verifies the proposed model’s effectiveness.

(3) The perception effectiveness and spatial uncertainty obtained based on this model are verified on the KITTI dataset, demonstrating the correctness and accuracy of the model. The influencing factors of perception uncertainty are analyzed and verified.

The research in this paper can provide theoretical and practical guidance for real-time judgment of the effectiveness of AD perception results.

The remainder of the paper is structured as follows: Section 2 summarizes related works about perception uncertainties. Section 3 introduces the methodology used to evaluate uncertainty in this paper. Section 4 discusses the experimental results and demonstrates the relationship between the extracted uncertainties and the ground truth error. Finally, Section 5 summarizes our conclusions and describes possible future work.

2. Related Works

Significant studies have been devoted to capturing the perception uncertainty of DNN algorithms and to judging and monitoring perception anomalies.

Concept uncertainty, dataset uncertainty, network training, and migration uncertainty influence the uncertainty of perception based on DNN algorithms [29,30,31]. The Uncertainty Classification can be summarized into epistemic uncertainty and aleatoric uncertainty. Perception uncertainty theories include Bayesian theory, sampling-based methods, Gaussian theory, evidence theory, and propagation mechanisms [6,32,33]. For specific research methods, the Bayesian model can effectively capture uncertainty, although the computation cost is high.

Subsequently, Laplace Approximation (LA), Variational Inference (VI), and Markov Chain Monte Carlo (MCMC) have optimized the Bayesian model. For epistemic uncertainty, the Monte Carlo Dropout (MCD) [30] method and Deep Ensemble (DE) sampling [34] have been proposed. Compared with the MCD method, DE has higher calculation accuracy and meets real-time requirements.

Regarding uncertainty evaluation metrics, Di Feng [30] summarized the precision, recall, and F1-Score, along with the mean average precision, Shannon entropy, mutual information, calibration plot, error curve, etc. Zining Wang [35] proposed a new evaluation metric, Jaccard IoU (JIoU), that incorporates label uncertainty. Stefano Gasperini [36] provided separate uncertainties for each output signal: objectness, class, location, and size, and proposed a new metric to evaluate location and size uncertainty.

In terms of uncertainty influencing factors, Di Feng [37] proved that uncertainty is related to multiple factors, such as the detection distance, occlusion, softmax score, and orientation. Hujie Pan [38] found that the corner uncertainty distribution agrees with the point cloud distribution in the bounding box, meaning that the corner with denser observed points has lower uncertainty.

Evaluated perception uncertainty can be applied in the following aspects: improving perception performance, optimizing decision-making control algorithms, and providing warning and monitoring for AD systems. Gregory P. Meyer [13] estimated the uncertainty in detection by predicting a probability distribution over object bounding boxes, and proposed a method to improve the ability to learn the probability distribution by considering the potential noise in the ground-truth labeled data. Di Feng [37] leveraged heteroscedastic aleatoric uncertainty to improve its detection performance significantly. Q.M Rahman [5] studied run-time monitoring of machine learning for robotic perceptions based on perception uncertainty. In addition, from the perspective of SOTIF, Liang Peng [6,23] proposed a trigger mechanism for perception uncertainty and optimized the decision based on the uncertainty extracted.

In all, the mentioned references have greatly contributed to the evaluation methods and applications of perception uncertainty. However, the research object is usually one single detection network. In this situation, although the spatial uncertainty can be studied, it is difficult to deal with false detection and missed detection. This calls for mutual inspection of multiple perception algorithms. Moreover, perception failure needs to be detected in real-time to ensure safety. However, this is seldom studied in existing research. Therefore, it is important to judge the effectiveness of the current perception results. Finally, the analyzed perception uncertainty is a learning-based guess as to the relation of the error to the ground truth. Therefore, studying the relationship between real-time uncertainty evaluation, the error, and the ground truth is important. Thus, we have made an effort to contribute to these three problems, as stated in the Introduction.

3. Methodology

This section first introduces the theory of the DE method for uncertainty assessment, the clustering algorithm for multi-network results, and the principle of the matching algorithm for multi-source perception results. These three algorithms are the basis for the judgment of perception effectiveness and spatial uncertainty. Then, the perception effectiveness judgment algorithm and evaluation metrics for spatial uncertainty are discussed.

The logic flow between different algorithms is shown in Figure 2. PointPillars and SMOKE, shown in Figure 2, are two DNN algorithms for object detection, and represent two sources of perception. The DNN algorithm can detect objects using multiple neural networks after DE processing. The objects detected by different networks can be classified into different objects after being processed by the clustering algorithm. After the matching algorithm matches the objects from the two perception sources, the effectiveness of the current perception scene can be judged in real-time and combined with the remaining clustering results. Based on the clustering results, the spatial uncertainty of the detected objects can be solved directly.

The schematic sequence of the perception effectiveness and spatial uncertainty evaluation is shown in Figure 3. As shown in Figure 3, the uncertainty of detection results is primarily collected with Deep Ensemble. With multiple perception inputs, mutual inspection is carried out to analyze missed/false detection. Then, through a statistical study, the perception effectiveness is judged. If the result is judged to be effective, the spatial uncertainty of the outputs is estimated. Finally, with the ground truth of the objects from the dataset, the effectiveness judgment and the spatial uncertainty estimation result can be verified.

3.1. Deep Ensemble

The ensembles consist of two kinds of methods: randomization–based approaches and boosting-based approaches [34].

Bayesian neural networks provide a natural explanation for uncertainty estimation in deep learning [39]. The posterior distribution parameters in the Bayesian formula are related to the network parameters. Randomization–based sampling of the network parameters is required to simulate the output distribution. The Monte Carlo method can be used to sample the neural network parameters randomly, then the statistics can simulate this distribution [40]. The deep ensemble method based on randomization-based approaches is a simplified method. The deep ensemble can achieve random fine-tuning network parameters within a certain range. This pre-parameter sampling method does not increase the amount of calculation used in reasoning. Different sampling networks’ operation processes are the same, and can be calculated in parallel [41,42].

Therefore, the present study focuses on the randomization–based approach, as this approach is better suited for distributionand parallel computation. We use the DE approach to capture the real-time uncertainty of classification and regression outputs. The pseudo-code of the approach training procedure is summarized in Algorithm 1.

Algorithm 1 Deep Ensemble

Input: Neural networks and number of networks

{Net}_{1 : N}

.

Output: Associated classification probability

P_{1 : N}

, location of detected objects

L_{1 : N}

, rotation

R_{1 : N}

, and dimension

D_{1 : N}

.

1:: Construction and selection of neural networks.
2:: Parameter $W_{1 : N}$ initialization of neural networks
3:: for $n = 1$ to N do
4:: Randomize the data loading mode for each network
5:: Train different networks on the same dataset
6:: Obtain classification and regression results
7:: end for
8:: return $P_{1 : N}$ , $L_{1 : N}$ , $R_{1 : N}$ , $D_{1 : N}$ .

In the training of neural networks, deep ensemble requires random initialization of the neural network parameters, along with randomized shuffling of data points, which is sufficient to obtain good training effects in practice.

3.2. Clustering Algorithm

The number and order of detected objects by different networks in the same frame scene are inconsistent; thus, the accurate matching of objects is an important guarantee of the accuracy and uncertainty of perception results. The objects merging strategy for sampling-based uncertainty used in this study is the basic sequential algorithmic scheme with intra-sample exclusivity (BSAS_excl) [17]. The approach training procedure is summarized in Algorithm 2.

Algorithm 2 Basic sequential algorithmic scheme with intra-sample exclusivity

Input: A set of predictions

P_{1 : N}

.

Output: A set of clusters

C_{1 : N}

.

1:: Create a cluster for each for each object box in $C_{1}$ .
2:: for $i = 2$ to $N$ do
3:: Set $e x c l_f l a g$ = $0_{n}$ , $n$ is the number of clusters
4:: for $b o x_j$ in $P_{i}$ , for $c l u s t e r_k$ in $C_{n}$ do
5:: if affinity( $b o x_j, c l u s t e r_k) \geq θ$ and $c l u s t e r_k = 0$ then
6:: Put $b o x_j i n c l u s t e r_k, e x c l_f l a g (k) = 1$
7:: else
8:: Create a new cluster, n = n + 1
9:: end if
10:: end for
11:: end for
12:: return $C$ .

3.3. Object-Matching Algorithm

If there are multiple sources of perception, the perception results from different sources can be matched for further mutual inspection. Multi-source perception mutual inspection is an important method to improve perception performance. The object-matching algorithm is the basis of multi-source mutual inspection. This paper uses an algorithm based on triangle similarity [43,44] matching, which can effectively deal with the matching errors caused by object position errors.

The scheme flow of objects matching algorithm is shown in Figure 4. The steps of object matching can be summarized from step(1) to step(7)

(1)

Set object sets

S_{A}

and object sets

S_{B}

. Calculate the number of object sets

N_{A}

and the number of object sets

N_{B}

.

(2)

Calculate the minimum value of elements in the object sets

m = m i n (N_{A}, N_{B})

.

(3)

If

m = 1

, use the K-Nearest Neighbor (KNN) method to calculate the distance of all of the points in

S_{A}

to all of the points in

S_{B}

. The distance between point A and point B can be calculated in Equation (1).

d_{A B} = \sqrt{{(x_{A} - x_{B})}^{2} + {(y_{A} - y_{B})}^{2} + {(z_{A} - z_{B})}^{2}}

(1)

where

d_{A B}

represents the 3D space distance between two points with set distance threshold

τ_{d}

. If

d_{A B} < τ_{d}

, this shows that the two points can be matched. Otherwise, the two points cannot be matched, indicating that the identified object may be a false negative or false positive.

(4)

If

m = 2

, it is necessary to add a point at a distance to form a triangle, and both object sets need to add a point. The detection range is set to be 50 m. In order to achieve a better matching effect, the coordinates of the object points selected in this study can be (500,500) and (499,499). Points farther than 50 m away are added to form triangles for matching. These points do not belong to the objects within the perceived range and will not be output. It should be noted that the two points cannot be the same, but the distance between them cannot be too large. After adding points, triangle matching method can be used, which is shown below.

(5)

If

m \geq 3

, the triangle matching method can be used. The diagram of the triangle matching algorithm is shown in Figure 5.

(6)

The principle of the triangle matching method can be described from step (1) to step (7). Taking the object in the BEV perspective as an example, this paper only considers the index elements in x and z directions in the KITTI dataset camera coordinate system.

(1): Numbering data to all of the points in $S_{A}$ and $S_{B}$ .
(2): Calculate the coordinate difference values $x_{d i s A}$ , $z_{d i s A}$ , $x_{d i s B}$ , $z_{d i s B}$ between the maximum and minimum values in x and z directions in object sets $S_{A}$ and object sets $S_{B}$ by using Equations (2) and (3).

$x_{d i s} = x_{m a x} - x_{m i n}, \forall x \in S_{A}, S_{B}$

(2)

$z_{d i s} = z_{m a x} - z_{m i n}, \forall z \in S_{A}, S_{B}$

(3)
(3): Sort the points in object sets $S_{A}$ and object sets $S_{B}$ . Calculate the maximum value of the coordinate difference values. If the maximum value is in the x direction, the object sets $S_{A}$ and object sets $S_{B}$ are sorted by the value of x. Otherwise, object sets $S_{A}$ and object sets $S_{B}$ are sorted by the value of z.
(4): Randomly select three points to form a triangle in object sets $S_{A}$ and object sets $S_{B}$ , and return the index of the point location.
(5): The formed triangles in object sets $S_{A}$ and object sets $S_{B}$ are normalized. The side length of each triangle is divided by the shortest side length. This setting can ensure the setting of a uniform threshold for successful matching.
(6): Calculate the error sum of the edges and points for each triangle in object sets $S_{A}$ with all triangles in object sets $S_{B}$ . Take any two points in a triangle as an example. As shown in Figure 5, Point $A_{1}$ and point $A_{2}$ are corresponding points. The error of point A can be expressed in Equation (4).

$P_{A e r r o r} = |x_{A_{1}} - x_{A_{2}}| + |z_{A_{1}} - z_{A_{2}}|, \forall A_{1} \in S_{A}, \forall A_{2} \in S_{B}$

(4)

The error of edge AB can be expressed in Equation (5).

$d_{A B e r r o r} = |d_{A_{1} B_{1}} - d_{A_{2} B_{2}}|, \forall A_{1}, B_{1} \in S_{A}, \forall A_{2}, B_{2} \in S_{B}$

(5)

The total error (TE) of two triangles can be expressed in Equation (6).

$T_{e r r o r} = d_{A B e r r o r} + d_{A C e r r o r} + d_{B C e r r o r} + P_{A e r r o r} + P_{B e r r o r} + P_{C e r r o r}$

(6)
(7): Calculate the minimum value $T_{e r r o r}$ of all trianglestriangle. If $T_{e r r o r}$ is less than the triangle error threshold $τ_{T}$ , the two triangles are matched, and the corresponding points of the triangle sorted by (3) are the matching objects.

(7)

Judge the remaining points, and repeat steps (3), (4), and (5) until all points are matched.

3.4. Perception Effectiveness Judgment

Each perception source will have missed and false detection; thus, it is necessary to judge the effectiveness of the current frame perception information in real time. If the perception results are invalid, it is necessary to make an early warning, or perception switching can be performed directly. If the perception results are valid, the spatial uncertainty of the detected objects is calculated to carry out further trajectory planning and the construction of the drivable space.

Therefore, here comes three key questions: (1) What are the criteria for judging the effectiveness of perception results? (2) How can we judge the effectiveness of current perception results in real time? (3) How can we verify the judgment results?

First of all, for the real-time perception effectiveness judgment, the precision, recall, and F1-score of the perception results of the current frame are applied as the criteria. The judgment threshold is based on the statistical results of the KITTI full data set. Second, for the method of perception effectiveness judgment, this paper adopts a fusion model based on multi-source perception and deep ensemble to judge the effectiveness of current perception results in real time. Third, to verify the judgment results, this paper matches the perception results with the ground truth of the dataset. And calculate the precision, recall, and F1 score of each frame information to judge the effectiveness of the perception results. Compare the validity results of the real-time judgment with the validity of the ground truth matching results to verify the validity of the judgment results. Finally, calculate the anomaly diagnosis rate.

The flowchart of perception effectiveness judgment is shown in Figure 6. The DE method is used to study the uncertainty of object occupation. The normal detection (True Positive, TP), missed detection (False Negative, FN), and virtual detection(false positive, FP) of the perception results are studied. If there is only one perception source, it can evaluate the object occupation uncertainty according to the number of networks

N_{n e t}

and the average confidence

p_{m}

. If there are multiple perception sources, this paper uses the method combining deep ensemble and multi-source perception mutual acceptance to research occupancy uncertainty.

First, the steps of uncertainty research for one perception source are as follows.

(1): The perception results are processed in the DE method. After clustering and statistics, the number of networks detecting the object $N_{n e t}$ and the average confidence level $p_{m}$ of the detected object are calculated in Equation (7), respectively.

$p_{m} = p (y = c | D) = \frac{1}{N_{n e t}} \sum_{i = 1}^{T} p (y = c | x, W_{t})$

(7)

where c represents the classification of detected objects. D represents the dataset to evaluate the detected objects and $W_{t}$ represents the weights of the network.
(2): Set detection network number threshold $τ_{N}$ and average confidence $τ_{P_{1}}$ and $τ_{P_{2}}$ .
(3): Judge the uncertainty by using Equation (8).

$f (N_{n e t}, p_{m}) = \{\begin{matrix} T P & , & i f N_{n e t} \geq τ_{N} a n d p_{m} \geq τ_{P_{1}} \\ F N & , & i f N_{n e t} < τ_{N} a n d p_{m} \geq τ_{P_{1}} \\ F P & , & i f N_{n e t} \geq τ_{N} a n d p_{m} < τ_{P_{2}} \end{matrix}$

(8)

If there are multiple sources of perception, the matching of different perception results is carried out first. Then the uncertainty research is carried out by using the DE method.

(1): First, the triangle matching method is used for object matching of multi-source perception results. Objects that are successfully matched are considered to be real objects.
(2): For the results that cannot be matched in each perception source, the DE method is used for judgment.
(3): The final processing results are fused.

In addition, for the uncertainty of classification and occupation of each detected object, the prediction entropy

E_{p e}

is selected in this study as the uncertainty evaluation index, which is calculated in Equation (9). During the calculation of prediction entropy, the confidence levels of different networks are averaged, and then the prediction entropy is calculated

E_{p e} = - \sum_{p_{m}}^{C} (p_{m} l o g p_{m} + (1 - p_{m}) l o g (1 - p_{m}))

(9)

Because there is missed detection in the calculation process, it is necessary to punish the object detected to calculate the detected object’s uncertainty more objectively, which is calculated in Equation (10).

E_{p e}^{*} = E_{p e} (1 + p (T - N_{n e t}))

(10)

where p represents the penalty coefficient.

3.5. Spatial Uncertainty

The spatial information in the regression of the neural network algorithm includes the horizontal position (x), vertical position (z), and vertical position (y) of the detected object, the length (l), width (w), and height (h) of the detected object, and the orientation (r) of the detected object. The corresponding uncertainty evaluation indicators can be represented by variance and total variance (TV).

The variance of each indicator can be expressed in Equation (11)

v a r_{x} = \frac{1}{N_{n e t}} \sum_{1}^{N_{n e t}} {(x_{i} - \frac{1}{N_{n e t}} \sum_{1}^{N_{n e t}} x_{i})}^{2}

(11)

Similarly, the variance of other indicators is calculated using the same equation. Each object has a location and dimension, and the TV of location and dimension can be calculated in Equations (12) and (13).

T V_{l o c} = v a r_{x} + v a r_{y} + v a r_{z}

(12)

T V_{d i m} = v a r_{l} + v a r_{w} + v a r_{h}

(13)

4. Experimental Results

In this section, experiments are carried out in the KITTI dataset to validate the proposed methods. First, we introduce the experiment settings, including the detection networks and the implementation details. Then, the quantitative results are introduced.

4.1. Experiment Settings

4.1.1. PointPillars 3D Objects Detection Network

The Network Architecture of PointPillars is shown in Figure 7. The main components of the network are a Pillar Feature Network, Backbone and SSD Detection Head. The point clouds are converted to a pillar index tensor and stacked pillar. The encoder uses the stacked pillars to learn a set of features that can be scattered back to a 2D pseudo-image for a convolutional neural network(CNN). The features from the backbone are used by the detection head to predict 3D bounding boxes for objects. According to 3D bounding boxes, detection can be matched to evaluate the uncertainty of classification, regression, and a plurality of outputs.

The training implementation platform was CUDA11.6, CUDNN 8.8, and Pytorch1.10.1. We trained the network with a batch size of 2 on one Geforce TITAN V GPU for five epochs. The learning rate was set at 2 ×

10^{- 4}

and drops at 15. During testing, we used the top 100 detected 3D projected points and filtered them with a threshold of 0.5. No data augmentation methods or NMS (Non-Maximum Suppression) were used in the test procedure.

4.1.2. SMOKE 3D Objects Detection Network

The Network Architecture of SMOKE is shown in Figure 8. SMOKE predicts the three-dimensional bounding box of each detection object by combining the estimation of a single key point with the three-dimensional regression variables. SMOKE uses a high-level integrated network DLA-34 as the backbone. The high level is replaced by a deformable convolutional network (DCN). SMOKE proposes a multi-step separation method to construct a three-dimensional bounding box, which greatly improves the training convergence and detection accuracy. In the 3D detection network, the target of SMOKE is represented by a key point. With camera parameters, projecting the key point can completely restore the 3D position.

The training implementation platform was CUDA10.0, CUDNN 7.5, and Pytorch1.1. The original image resolution was padded to 1280 × 384. We trained the network with a batch size of 32 on four Geforce TITAN X GPUs for 60 epochs. The learning rate was set at 2.5 ×

10^{- 4}

and dropped by a factor of 10 at 25 and 40 epochs. During testing, we used the top 100 detected 3D projected points and filtered them with a threshold of 0.25. No data augmentation methods or NMS (Non-Maximum Suppression) were used in the test procedure.

4.1.3. Implementation Details

The experiment used the official KITTI object detection benchmark dataset, including samples pointing to clouds and images. The experiment simultaneously trained the lidar point cloud and compared it with the fusion method using lidar and image. The dataset sample included 7481 training samples and 7518 test samples; the experimental study divided the training set into 3712 training samples and 3769 validation samples. The classification results include cars, pedestrians and cyclists. PointPillars used a DNN network for cars and a DNN for pedestrians, and cyclists during training. In validation, the output results included three categories: car, bicycle, and pedestrian.

The perception results from PointPllars are based on lidar and those from SMOKE are based on a camera. The two DNN algorithms are three-dimensional object detection networks, and both are trained and evaluated on the KITTI dataset, meaning that that the results can be compared and matched. In DE parameter settings, both neural network parameters adapt kai-ming uniform distribution to achieve the initialization of parameters. The data loading model adapts the shuffle format. The total number of networks of PointPillars and SMOKE is set to 5, respectively.

In the parameter setting of the clustering algorithm, the three-dimensional space intersection over union (IOU) is set to 0.1 to better achieve the result matching. In the triangle matching algorithm, the KNN method and the triangle error threshold are set to 15. The above parameters are manually adjusted and optimized according to the effect of the actual experiment.

In the perception effectiveness judgment algorithm, there are key parameter thresholds such as the number of object recognition networks, object mean score, precision, recall, and F1-score of the current frame results. In this study, we counted and calculated the precision, recall, and F1-score of the objects detected in 3769 frames matching the KITTI ground truth by using PointPillars and SMOKE after DE, as shown in Table 1. In addition, we counted and calculated the number of object recognition networks and the average object score. We counted the number of detected networks and the average object score of the detected objects in 3769 frames using PointPillars and SMOKE after DE, as shown in Table 2.

Based on the above statistical results, this paper sets the number of detected networks and the average object score of PointPillars are 4 and 0.6, respectively. The number of detected networks and the average object score of SMOKE are 2 and 0.4, respectively. The thresholds that precision, recall and F1 scores, are set 76.5%, 47.1% and 0.583, respectively.

4.2. Results

This section quantitatively demonstrates the perception effectiveness, focusing on three typical cases. Then, the spatial uncertainty of the detected objects is extracted and the relationship between uncertainty and ground-truth error is demonstrated. Finally, this section verifies the relationship between perception uncertainty, object distance, and occlusion degree.

4.2.1. Perception Effectiveness Judgement

This section counts and displays the judgment results of perception effectiveness from the macroscopic and microscopic perspectives. This section first counts the effectiveness results of 1000 frames on the KITTI dataset, including the results of correct judgment (including the direct judgment of no objects as invalid perception) and the results of wrong judgment.

The specific statistical results are shown in Table 3.

The correct results judged include three situations. (1) The current frame scene is judged valid, and the result is also valid after matching the ground truth. (2) The current frame scene is judged invalid, and the result is invalid after matching the ground truth. (3) The perception has no output result, which is judged invalid. However, the truth result shows there are objects, indicating that the judgment of perception effectiveness is correct. Based on the above analysis, the effectiveness judgment and verification of 1000 frames of the KITTI evaluation data set are carried out. The statistical results show that 920 frames are correct, which contains the perception results without outputs, and the judgment of 80 frames is wrong. Therefore, it can be concluded that the accuracy rate of perception effectiveness judgment can reach 92%. The statistical results of large samples in the data set can reflect the method’s effectiveness.

In addition, this paper selects the perception results that are judged to be valid and judged to be invalid. The perception effectiveness judgments are correct.

Figure 9 shows the results when the perception effectiveness judgments are correct. Among these results, the perception outputs are judged to be valid results, which are valid after matching and verifying with the ground truth of the KITTI dataset. Hence, the judgment of perception effectiveness is correct. From a microscopic point of view, this paper selects the 25th frame, the 542nd frame, the 634th frame, and the 1932nd frame in the KITTI dataset. Among them, the green boxes represent the ground truth, and the blue boxes represent the results of the fusion model matching of PointPillars and SMOKE, which are considered real perception results. The red boxes are the perception results that PointPillars considered detected objects after the DE method based on the number of detected networks and the average object score. The yellow boxes are the perception results that SMOKE considered detected objects after the DE method based on the number of detected networks and the average object score. The red FN and FP represent the missed detection and false detection of PointPillars after the DE method based on the number of detected networks and the average object score, respectively. The blue FN and FP represent the missed detection and false detection of SMOKE after the DE method based on the number of detected networks and the average object score, respectively. The figure’s background is the original point cloud information projection based on lidar under the grid map, where white represents the space occupied, and black represents the unoccupied space. The depth of the color represents the probability of being occupied.

Figure 9 shows that the bounding boxes between the perception results and the ground truth results do not match exactly. This is mainly due to the uncertainty of the perception algorithm. Therefore, there are spatial errors in the perception results which generally do not affect AD safety. In the perception results of the 25th frame, there is no missed detection. Outside the range of the ground truth, a few objects are detected. These objects basically match the occupancy information of the original point cloud. Hence, they must be considered in the subsequent drivable space construction and decision-making process, which is more helpful for AD safety. In the 542nd and 634th perception results, although there is a missed detection at the farthest point, the update frequency of the sensor is relatively fast, and the missed detection in the far distance will not cause driving safety of AD. In the perception result of the 1932nd frame, SMOKE detects a missed detection, which is consistent with the occupancy information of the original point cloud in the grid map. This indicates that the dataset may have errors in data labeling in certain cases. The perception result of this frame is more conservative and accurate, which can guarantee the safety of AD.

Figure 10 shows the results that the perception effectiveness judgments are correct.The perception outputs are judged to be invalid, and are invalid after matching and verifying with the ground truth of the dataset, meaning that the judgment of perception validity was correct. From a microscopic point of view, this paper selects the first frame, the 702nd frame, the 949th frame, and the 979th frame in the KITTI dataset. The information represented by different colors and boxes in the figure is consistent with that in Figure 9.

Among these perception results, there are cases of missed detection and false detection that are inaccurate, and there is a significant gap with the true value. If there are missed detections in the perception results, this directly impacts the safety of AD. Timely evaluation and identification of these perception failure scenarios can optimize the decision-making strategy and improve the safety of AD.

Figure 11 shows the situation that the perception algorithm has no output. This paper selected the 81st and 260th frames in KIITTI. The green box represents the ground truth, where the perception algorithm has no perception output result; thus, this paper concludes that these frames’ perception results are invalid. The perception effectiveness is judged to be correct.

The results above show that the perception effectiveness can be evaluated in real time based on multi-source mutual inspection and the proposed DE fusion algorithm. If the current perception result is valid, the spatial uncertainty of the detected objects can further support accurate decision-making. If the perception result is invalid, it can be handled by perception switching, manual takeover, or emergency stop to ensure the safety of AD.

4.2.2. Spatial Uncertainty

The spatial uncertainty mainly includes the location of the perception results. The mean and variance of these perception results based on DE are calculated, respectively. In addition, by comparing the perception results with the ground truth of the dataset, the calculation error can also represent the uncertainty of the perception results.

This paper selects a frame of perception results to show spatial uncertainty. Figure 12 shows the ground truth of the dataset, perception results, uncertainty of perception results, and errors between perception results and ground truth. The green boxes indicate the ground truth of the dataset, and the blue boxes indicate the perception results. Red boxes represent that the standard deviation perception result is considered based on the DE method, which indicates the uncertainty. Yellow represents the perception result considering the errors between the perception results and ground truth. The comparison in the figure can reflect the size of the perception error. These perception errors and other uncertainty can provide a reference for decision-making and ensure the safety of AD.

The results of spatial uncertainty show that the detected objects of DNN algorithm have uncertainty in location. The location uncertainty of different objects is different, and this gap is very large in some cases. Among them, the influencing factors of uncertainty are a problem worth studying, and represent a meaningful topic to further study the accuracy of spatial uncertainty.

4.2.3. Validation of Perception Uncertainty

In order to verify the accuracy of the spatial uncertainty of the detected objects, this paper calculates the error between the perception results and ground truth. And the paper demonstrates the correlation between uncertainty and error results. This paper researches the correlation of the position and orientation uncertainty of objects detected based on PointPillars and SMOKE, respectively.

Figure 13 shows the correlation of uncertainty extracted and the error between perception results and ground truth in PointPillars. The correlation coefficients of the horizontal, longitudinal, vertical, and orientation of the ego vehicle using PointPillars in the entire KITTI dataset are 0.317, 0.299, 0.168, and 0.657, respectively.

Figure 14 shows the correlation of the extracted uncertainty and the error between perception results and ground truth in SMOKE. The correlation coefficients of Horizontal, longitudinal, vertical, and orientation of the ego vehicle using SMOKE in the entire KITTI dataset are 0.184, 0.159, 0.058 and 0.569, respectively.

The results indicate that there is a positive correlation between uncertainty and errors between perception results and ground truth based on DNN algorithms, which shows that the method of extracting uncertainty is scientific and accurate.

4.2.4. Influencing Factors of Uncertainty

This paper studies the relationship of the distance and occlusion factors with the perception uncertainty. This paper analyzes the perception results within 50 m. In studying the distance factor, the distance is divided into 20 categories with a step size of 2.5 m. Objects within the same distance interval are counted and classified into one class. Then, we calculate the average uncertainty of these objects. The occlusion factors are divided into 0,1,2,3 levels, meaning that the statistical results of objects detected uncertainty in the four levels are counted. Then the relationship between occlusion and uncertainty factors is studied.

The red line in Figure 15 shows the trend between the distance factor and prediction entropy, TV, and TE of location. The blue line is the scatter plot line between these factors. From a quantitative perspective, the Pearson coefficient between the prediction entropy, TE of PointPillars, and distance are 0.584 and 0.107, respectively. the Pearson coefficient between prediction entropy, TV of SMOKE, and distance are 0.899 and 0.348, respectively. It should be noted that within a distance of 10 m, the size of the uncertainty shows a downward trend in a local area, which is mainly related to the sensors’ installation position and the sensor characteristics. If the sensor is installed on the ego vehicle roof, the observation effect of the objects detected will not be good, resulting in increased uncertainty.At the same time, in the ultra-short range, the sensor resolution decreases, which increases the perception uncertainty. Generally speaking, there is a positive correlation between perception uncertainty and distance beyond a certain distance. However, within the ultra-short range of the ego vehicle the poor detection effect of objects may cause higher perception uncertainty.

The red line in Figure 16 shows the trend between the occlusion factor and the prediction entropy, TV, and TE of location. The blue line is the scatter plot line between these factors. The Pearson coefficients between prediction entropy, TV, and TE of Pointpillars are 0.665, 0.505, and 0.987, respectively. In general, the higher the degree of occlusion, the higher the uncertainty of objects detected. However, the prediction entropy and TV in this study show a downward trend when the occlusion degree is 3. This is related to the data distribution of the KITTI dataset. There are very few sample data with an occlusion degree of 3, resulting in statistical result errors. However, there is a positive correlation between the perception uncertainty and the degree of occlusion in the sample.

In conclusion, the experimental results proved:

(1): The proposed real-time judgment on perception effectiveness has a high accuracy of 92%.
(2): The estimation of spatial uncertainty based on DE is positively correlated to the ground truth error.

In addition, the influencing factors of perception uncertainty are discussed.

5. Conclusions

This paper proposes a fusion model based on multi-source perception and Deep Ensemble to judge the effectiveness of perceptual results in each frame and evaluate the spatial uncertainty of the objects detected simultaneously. Based on the KITTI dataset, the research results of this paper show that the accuracy of judging the effectiveness of perception based on the multi-source perception inspection and Deep Ensemble fusion model can reach 92%. In addition, a positive correlation is found between perception spatial uncertainty and error between perception results and ground truth. The results grant the uncertainty evaluation a physical meaning anchored to the objective error. In addition, this study found that perception uncertainty is related to the distance and the degree of occlusion of the detected objects.

Compared with the previous research, the research in this paper can effectively and real-time deal with the missed detection in the long tail scenario and judge the perception effectiveness, which is very important for the safe operation of autonomous driving. The research in this paper can effectively improve the accuracy of perception and further improve the safety of autonomous driving. Suppose a perception failure is found in real-time. In such a case, it can be replaced with other perception sources to realize real-time judgment and switching of autonomous driving perception sources and ensure the safety of autonomous driving. Although we have achieved good results, we only studied the uncertainty of perception occupancy, which needs to be verified on real vehicles. In the future, we hope to further study the uncertainty of semantics and motion of perception results to improve AD’s perception performance and safety. The model proposed in this article is incredibly important for assessing perception uncertainty in real-time, and could benefit autonomous driving safety even more.

Author Contributions

All authors contributed to this work. Conceptualization, D.Y., M.Y. (Mingliang Yang), X.J., H.W. and K.J.; methodology, L.P., M.Y. (Mingliang Yang); software, J.W., Y.Y.; validation; formal analysis, M.Y. (Mingliang Yang) and X.J.; writing—original draft, review and editing, M.Y. (Mingliang Yang), M.Y. (Mengmeng Yang) and X.J. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded in part by the National Natural Science Foundation of China under Grants U22A20104 and 52102464, and Beijing Municipal Science and Technology Commission (GrantNo.Z221100008122011).

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

KITTI dataset: https://www.cvlibs.net/datasets/kitti/ (accessed on 16 March 2022).

Conflicts of Interest

The authors declare no conflict of interest.

Abbreviations

The following abbreviations are used in this manuscript:

DE	Deep Ensemble
DNN	Deep neural network
KNN	K-Nearest Neighbor
AD	Autonomous driving
MCD	Monte Carlo Dropout
MCMC	Markov Chain Monte Carlo
IOU	Intersection over union
JIOU	Jaccard IoU
SOTIF	Safety of the intended functionality
BSAS_excl	Basic sequential algorithmic scheme with intra-sample exclusivity
TP	True Positive
FP	False Positive
FN	False Negative
TV	Total Variance
TE	Total Error
CNN	Convolutional Neural Network
DCN	Deformable Convolutional Network

References

Chen, L.; He, Y.; Wang, Q.; Pan, W.; Ming, Z. Joint optimization of sensing, decision-making and motion-controlling for autonomous vehicles: A deep reinforcement learning approach. IEEE Trans. Veh. Technol. 2022, 71, 4642–4654. [Google Scholar] [CrossRef]
Cosgun, A.; Ma, L.; Chiu, J.; Huang, J.; Demir, M.; Anon, A.M.; Lian, T.; Tafish, H.; Al-Stouhi, S. Towards full automated drive in urban environments: A demonstration in gomentum station, california. In Proceedings of the 2017 IEEE Intelligent Vehicles Symposium (IV), Los Angeles, CA, USA, 11–14 June 2017; pp. 1811–1818. [Google Scholar]
Chen, J.; Sun, J. Platoon Separation Strategy Optimization Method Based on Deep Cognition of a Driver’s Behavior at Signalized Intersections. IEEE Access 2020, 8, 17779–17791. [Google Scholar] [CrossRef]
Wang, S.; Jiang, K.; Chen, J.; Yang, M.; Fu, Z.; Wen, T.; Yang, D. Skeleton-based Traffic Command Recognition at Road Intersections for Intelligent Vehicles. Neurocomputing 2022, 501, 123–134. [Google Scholar] [CrossRef]
Rahman, Q.M.; Corke, P.; Dayoub, F. Run-Time Monitoring of Machine Learning for Robotic Perception: A Survey of Emerging Trends. IEEE Access 2021, 9, 20067–20075. [Google Scholar] [CrossRef]
Peng, L.; Li, B.; Yu, W.; Yang, K.; Shao, W.; Wang, H. SOTIF Entropy: Online SOTIF Risk Quantification and Mitigation for Autonomous Driving. arXiv 2022, arXiv:2211.04009. [Google Scholar]
Feng, D.; Harakeh, A.; Waslander, S.L.; Dietmayer, K. A Review and Comparative Study on Probabilistic Object Detection in Autonomous Driving. IEEE Trans. Intell. Transp. Syst. 2022, 23, 9961–9980. [Google Scholar] [CrossRef]
Feng, D.; Haase-Schutz, C.; Rosenbaum, L.; Hertlein, H.; Glaser, C.; Timm, F.; Wiesbeck, W.; Dietmayer, K. Deep Multi-Modal Object Detection and Semantic Segmentation for Autonomous Driving: Datasets, Methods, and Challenges. IEEE Trans. Intell. Transp. Syst. 2021, 22, 1341–1360. [Google Scholar] [CrossRef] [Green Version]
Kiran, B.R.; Sobh, I.; Talpaert, V.; Mannion, P.; Al Sallab, A.A.; Yogamani, S.; Pérez, P. Deep reinforcement learning for autonomous driving: A survey. IEEE Trans. Intell. Transp. Syst. 2021, 23, 4909–4926. [Google Scholar] [CrossRef]
Junior: The Stanford entry in the Urban Challenge. J. Field Robot. 2008, 25, 569–597. [CrossRef] [Green Version]
Held, D.; Guillory, D.; Rebsamen, B.; Thrun, S.; Savarese, S. A Probabilistic Framework for Real-time 3D Segmentation using Spatial, Temporal, and Semantic Cues. In Proceedings of the Robotics: Science and Systems 2016, Ann Arbor, MI, USA, 18–22 June 2016. [Google Scholar]
Van Amersfoort, J.; Smith, L.; Teh, Y.W.; Gal, Y. Uncertainty estimation using a single deep deterministic neural network. PMLR 2020, 119, 9690–9700. [Google Scholar]
Meyer, G.P.; Thakurdesai, N. Learning an uncertainty-aware object detector for autonomous driving. In Proceedings of the IEEE International Conference on Intelligent Robots and Systems, Las Vegas, NV, USA, 25–29 October 2020; pp. 10521–10527. [Google Scholar] [CrossRef]
Liu, Y.; Yang, G.; Hosseiny, M.; Azadikhah, A.; Mirak, S.A.; Miao, Q.; Raman, S.S.; Sung, K. Exploring uncertainty measures in Bayesian deep attentive neural networks for prostate zonal segmentation. IEEE Access 2020, 8, 151817–151828. [Google Scholar] [CrossRef]
Tao, Y.; Sun, H.; Cai, Y. Predictions of deep excavation responses considering model uncertainty: Integrating BiLSTM neural networks with Bayesian updating. Int. J. Geomech. 2022, 22, 04021250. [Google Scholar] [CrossRef]
Zhang, X.; Chan, F.T.; Mahadevan, S. Explainable machine learning in image classification models: An uncertainty quantification perspective. Knowl. Based Syst. 2022, 243, 108418. [Google Scholar] [CrossRef]
Miller, D.; Dayoub, F.; Milford, M.; Sunderhauf, N. Evaluating merging strategies for sampling-based uncertainty techniques in object detection. In Proceedings of the IEEE International Conference on Robotics and Automation, Montreal, QC, Canada, 20–24 May 2019; pp. 2348–2354. [Google Scholar] [CrossRef] [Green Version]
Goswami, S. False Detection (Positives and Negatives) in Object Detection. arXiv 2020, arXiv:abs/2008.06986. [Google Scholar]
Bogdoll, D.; Nitsche, M.; Zollner, J.M. Anomaly Detection in Autonomous Driving: A Survey. In Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, New Orleans, LA, USA, 19–20 June 2022; pp. 4487–4498. [Google Scholar] [CrossRef]
Jiao, X.; Chen, J.; Wang, Y.; Jiang, K.; Cao, Z.; Yang, M.; Yang, D. Reliable autonomous driving environment model with unified state-extended boundary. IEEE Trans. Intell. Transp. Syst. 2022, 24, 516–527. [Google Scholar] [CrossRef]
Jiao, X.; Cao, Z.; Chen, J.; Jiang, K.; Yang, D. A General Autonomous Driving Planner Adaptive to Scenario Characteristics. IEEE Trans. Intell. Transp. Syst. 2022, 23, 21228–21240. [Google Scholar] [CrossRef]
Liu, J.; Wang, H.; Peng, L.; Cao, Z.; Yang, D.; Li, J. PNNUAD: Perception Neural Networks Uncertainty Aware Decision-Making for Autonomous Vehicle. IEEE Trans. Intell. Transp. Syst. 2022, 23, 24355–24368. [Google Scholar] [CrossRef]
Peng, L.; Wang, H.; Li, J. Uncertainty Evaluation of Object Detection Algorithms for Autonomous Vehicles. Automot. Innov. 2021, 4, 241–252. [Google Scholar] [CrossRef]
Cao, Z.; Liu, J.; Zhou, W.; Jiao, X.; Yang, D. LiDAR-based Object Detection Failure Tolerated Autonomous Driving Planning System. In Proceedings of the 2021 IEEE Intelligent Vehicles Symposium (IV), Nagoya, Japan, 11–17 July 2021; pp. 122–128. [Google Scholar]
Yang, L.; Zhang, X.; Wang, L.; Zhu, M.; Zhang, C.F.; Li, J. Mix-Teaching: A Simple, Unified and Effective Semi-Supervised Learning Framework for Monocular 3D Object Detection. arXiv 2022, arXiv:2207.04448. [Google Scholar]
Pitropov, M.; Huang, C.; Abdelzad, V.; Czarnecki, K.; Waslander, S. LiDAR-MIMO: Efficient Uncertainty Estimation for LiDAR-based 3D Object Detection. In Proceedings of the IEEE Intelligent Vehicles Symposium, Aachen, Germany, 4–9 June 2022; pp. 813–820. [Google Scholar] [CrossRef]
Lang, A.H.; Vora, S.; Caesar, H.; Zhou, L.; Yang, J.; Beijbom, O. PointPillars: Fast Encoders for Object Detection From Point Clouds. arXiv 2019, arXiv:1812.05784. [Google Scholar]
Liu, Z.; Wu, Z.; Toth, R. SMOKE: Single-stage monocular 3D object detection via keypoint estimation. In Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops, Seattle, WA, USA, 14–19 June 2020; pp. 4289–4298. [Google Scholar] [CrossRef]
Czarnecki, K.; Salay, R. Towards a framework to manage perceptual uncertainty for safe automated driving. In Proceedings of the International Conferrence on Computer Safety, Vasteras, Sweden, 18–21 September 2018; Volume 11094 LNCS, pp. 439–445. [Google Scholar] [CrossRef] [Green Version]
Feng, D.; Wang, Z.; Zhou, Y.; Rosenbaum, L.; Timm, F.; Dietmayer, K.; Tomizuka, M.; Zhan, W. Labels are Not Perfect: Inferring Spatial Uncertainty in Object Detection. IEEE Trans. Intell. Transp. Syst. 2022, 23, 9981–9994. [Google Scholar] [CrossRef]
Wu, P.; Chen, S.; Metaxas, D.N. MotionNet: Joint Perception and Motion Prediction for Autonomous Driving Based on Bird’s Eye View Maps. In Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops, Seattle, WA, USA, 14–19 June 2020; pp. 11382–11392. [Google Scholar] [CrossRef]
Mena, J.; Pujol, O.; Vitrià, J. A Survey on Uncertainty Estimation in Deep Learning Classification Systems from a Bayesian Perspective. ACM Comput. Surv. 2022, 54, 1–36. [Google Scholar] [CrossRef]
Melucci, M. Relevance Feedback Algorithms Inspired by Quantum Detection. IEEE Trans. Knowl. Data Eng. 2016, 28, 1022–1034. [Google Scholar] [CrossRef]
Lakshminarayanan, B.; Pritzel, A.; Blundell, C. Simple and scalable predictive uncertainty estimation using deep ensembles. In Proceedings of the 2017 Conference on Neural Information Processing Systems, Long Beach, CA, USA, 4–9 December 2017; pp. 6403–6414. [Google Scholar]
Wang, Z.; Feng, D.; Zhou, Y.; Rosenbaum, L.; Timm, F.; Dietmayer, K.; Tomizuka, M.; Zhan, W. Inferring spatial uncertainty in object detection. In Proceedings of the IEEE International Conference on Intelligent Robots and Systems, Las Vegas, NV, USA, 25–29 October 2020; pp. 5792–5799. [Google Scholar] [CrossRef]
Gasperini, S.; Haug, J.; Mahani, M.A.N.; Marcos-Ramiro, A.; Navab, N.; Busam, B.; Tombari, F. CertainNet: Sampling-Free Uncertainty Estimation for Object Detection. IEEE Robot. Autom. Lett. 2022, 7, 698–705. [Google Scholar] [CrossRef]
Feng, D.; Rosenbaum, L.; Timm, F.; Dietmayer, K. Leveraging heteroscedastic aleatoric uncertainties for robust real-time LiDAR 3D object detection. In Proceedings of the IEEE Intelligent Vehicles Symposium, Paris, France, 9–12 June 2019; pp. 1280–1287. [Google Scholar] [CrossRef] [Green Version]
Pan, H.; Wang, Z.; Zhan, W.; Tomizuka, M. Towards Better Performance and More Explainable Uncertainty for 3D Object Detection of Autonomous Vehicles. In Proceedings of the 2020 IEEE 23rd International Conference on Intelligent Transportation Systems, ITSC 2020, Rhodes, Greece, 20–23 September 2020. [Google Scholar] [CrossRef]
Kendall, A.; Badrinarayanan, V.; Cipolla, R. Bayesian SegNet: Model Uncertainty in Deep Convolutional Encoder-Decoder Architectures for Scene Understanding. arXiv 2015, arXiv:1511.02680. [Google Scholar]
Michelmore, R.; Wicker, M.; Laurenti, L.; Cardelli, L.; Gal, Y.; Kwiatkowska, M. Uncertainty Quantification with Statistical Guarantees in End-to-End Autonomous Driving Control. In Proceedings of the International Conference on Robotics and Automation, Paris, France, 31 May–31 August 2020. [Google Scholar]
Shao, W.; Xu, Y.; Peng, L.; Li, J.; Wang, H. Failure Detection for Motion Prediction of Autonomous Driving: An Uncertainty Perspective. arXiv 2023, arXiv:2301.04421. [Google Scholar]
Huang, Z.; Wu, J.; Lv, C. Efficient deep reinforcement learning with imitative expert priors for autonomous driving. IEEE Trans. Neural Netw. Learn. Syst. 2022, 1–13. [Google Scholar] [CrossRef]
Bern, M.; Edelsbrunner, H.; Eppstein, D.; Tan, S. Edge insertion for optimal triangulations. Discret. Comput. Geom. 1993. [Google Scholar] [CrossRef] [Green Version]
Fekete, S.P. The Complexity of MaxMin Length Triangulation. arXiv 2012, arXiv:1208.0202. [Google Scholar]

Figure 1. Comparison graph between ground truth and perception results under the KITTI dataset based on DNN algorithms. There is uncertainty in perception results, such as missed detection, false detection, position errors, and orientation errors. Green boxes represent the ground truth of data labels and blue boxes donate the perception results of DNN algorithms based on lidar and camera. The red numbers represent the order of the dataset frames.

Figure 2. The logic flow between different algorithms.

Figure 3. The Schematic sequence of the perception effectiveness judgment and uncertainty evaluation.

Figure 4. Scheme flow of objects matching algorithm.

Figure 5. Objects matching algorithm:Diagram of triangle matching.

Figure 6. Scheme flow of the perception effectiveness judgment.

Figure 7. Scheme flow of PointPillars [27].

Figure 8. Scheme flow of SMOKE [28].

Figure 9. Judgment results of perception effectiveness: the judgment is correct. The judgment and verification are valid after matching and verifying with the ground truth.

Figure 10. Judgment results of perception effectiveness: the judgment is correct. The judgment and verification are invalid after matching and verifying with the ground truth.

Figure 11. Judgment results of perception effectiveness: the judgment is correct. The judgment and verification are invalid after matching and verifying with the ground truth.

Figure 12. Spatial uncertainty based on deep ensemble.

Figure 13. Correlation research between uncertainty and error of Pointpillars (3769 frames) In order: the horizontal direction, longitudinal direction, vertical direction, and orientation of the car.

Figure 14. Correlation research between uncertainty and error of SMOKE (3769 frames) In order: the horizontal direction, longitudinal direction, vertical direction, and orientation of the car.

Figure 15. The relationship between object distance and perception uncertainty.

Figure 16. The relationship between object occlusion and perception uncertainty.

Table 1. Statistical mean index after matching the ground truth and perception results of PointPillars and SMOKE after DE (3769 frames).

Net	Precision	Recall	F1-Score
PointPillars	86.8%	57.8%	0.6939
SMOKE	76.5%	47.1%	0.583

Table 2. The number of detected networks and the average object score of PointPillars and SMOKE after DE (3769 frames).

Net	NumNET	MeanScore
PointPillars	4.375	0.6748
SMOKE	2.8659	0.4388

Table 3. Perception effectiveness judgment results (1000 frames).

Correct Judgment	Wrong Judgment	Failure Diagnosis Rate
920 frames	80 frames	92%

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Yang, M.; Jiang, K.; Wen, J.; Peng, L.; Yang, Y.; Wang, H.; Yang, M.; Jiao, X.; Yang, D. Real-Time Evaluation of Perception Uncertainty and Validity Verification of Autonomous Driving. Sensors 2023, 23, 2867. https://doi.org/10.3390/s23052867

AMA Style

Yang M, Jiang K, Wen J, Peng L, Yang Y, Wang H, Yang M, Jiao X, Yang D. Real-Time Evaluation of Perception Uncertainty and Validity Verification of Autonomous Driving. Sensors. 2023; 23(5):2867. https://doi.org/10.3390/s23052867

Chicago/Turabian Style

Yang, Mingliang, Kun Jiang, Junze Wen, Liang Peng, Yanding Yang, Hong Wang, Mengmeng Yang, Xinyu Jiao, and Diange Yang. 2023. "Real-Time Evaluation of Perception Uncertainty and Validity Verification of Autonomous Driving" Sensors 23, no. 5: 2867. https://doi.org/10.3390/s23052867

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Real-Time Evaluation of Perception Uncertainty and Validity Verification of Autonomous Driving

Abstract

1. Introduction

2. Related Works

3. Methodology

3.1. Deep Ensemble

3.2. Clustering Algorithm

3.3. Object-Matching Algorithm

3.4. Perception Effectiveness Judgment

3.5. Spatial Uncertainty

4. Experimental Results

4.1. Experiment Settings

4.1.1. PointPillars 3D Objects Detection Network

4.1.2. SMOKE 3D Objects Detection Network

4.1.3. Implementation Details

4.2. Results

4.2.1. Perception Effectiveness Judgement

4.2.2. Spatial Uncertainty

4.2.3. Validation of Perception Uncertainty

4.2.4. Influencing Factors of Uncertainty

5. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

Abbreviations

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI