A Study of Unmanned Path Planning Based on a Double-Twin RBM-BP Deep Neural Network

Addressing the shortcomings of unmanned path planning, such as significant error and low precision, a path-planning algorithm based on the whale optimization algorithm (WOA)-optimized double-blinking restricted Boltzmann machine-back propagation (RBM-BP) deep neural network model is proposed. The model consists mainly of two twin RBMs and one BP neural network. One twin RBM is used for feature extraction of the unmanned path location, and the other RBM is used for the path similarity calculation. The model uses the WOA algorithm to optimize parameters, which reduces the number of training sessions, shortens the training time, and reduces the training errors of the neural network. In the MATLAB simulation experiment, the proposed algorithm is superior to several other neural network algorithms in terms of training errors. The comparison of the optimal path under the simulation of complex road conditions shows the superior performance of this algorithm. To further test the performance algorithm introduced in this paper, a flower bed, computer room and other actual scenarios were chosen to conduct path-planning experiments for unmanned paths. The results show that the proposed algorithm has obvious advantages in path selection, reducing the running time and improving the running efficiency. Therefore, it has definitive practical value in unmanned driving.


Introduction
In recent years, driverless driving has been the focus of research in the automotive industry in various countries [1][2][3]. Yi et al. [4] proposed a design framework for unmanned driving trajectory planning based on real-time maneuvering decisions, dividing the trajectory space into homotopic regions, and linearizing the trajectory. The simulation experiment takes extreme conditions as the research object, which can effectively avoid accidents. Rajurkar et al. [5] proposed an optimal path-planning scheme for automatic vehicles based on a genetic algorithm to optimize the fuzzy controller. An automatic driving model based on neural networks has been proposed by NA. Neural networks have been widely used in many fields [6][7][8][9]. Li et al. [10] introduced deep learning into unmanned driving path planning and predicted the optimal path through a deep learning algorithm to guide the vehicle forward. Sallab et al. [11], Isele et al. [12], Pan et al. [13], Xia et al. [14], Xiong et al. [15] and others used deep reinforcement learning for unmanned driving and achieved good results. The problem with this type of research is that the number of iterations is vast. Generally, it can require more than 1000 sessions to obtain better training results, which increases the algorithm complexity. The training process also requires substantial data support.
For these problems, the paper has proposed a double twin RBP-BP deep learning neural network model based on WOA optimization. In this algorithm, a twin RBP model is used to process the feature map of the unmanned driving path, and another twin RBM model is used to process the path similarity. The WOA algorithm is used to optimize the twin RBM parameters in the deep learning network so that the deep learning neural network training error and the optimal parameters are obtained, which offers better prediction effects.

Restricted Boltzmann Machine
The restricted Boltzmann machine [16] is a neural network model based on statistical mechanics and an energy model. It uses energy to represent the stable state of the entire system. The smaller the energy usage is, the more stable the system is. Otherwise, it shows that the system is in a certain state of fluctuation and instability. It has the characteristics of a fast learning rate and strong learning ability. It is an important model in the field of deep learning. The structure is shown in Fig. 1.
In Fig. 1 v refers to the visible layer. Data samples are imported from the visible layer. h refers to the hidden layer. It represents the characteristics of the extracted data. Suppose that the RBM network has m visible units and n hidden units. The vectors v ¼ ðv 1 ; v 2 ; . . . v m Þ T and h ¼ ðh 1 ; h 2 ; . . . h n Þ T refer to the status of the unit at the visible layer and the hidden layer. Suppose a ¼ ða 1 ; a 2 ; . . . a m Þ T is the offset vector at the visible layer, and a i refers to the offset of the i visible unit at the visible layer. Suppose b ¼ ðb 1 ; b 2 ; . . . b n Þ T is the offset vector at the hidden layer, and W 2 R mÂn is the weighted matrix between the visible layer and the hidden layer. w i;j is the weight of the j node and the i node at the visible layer. The process of RBM training is to learn to obtain the parameter h ¼ fa i ; b j ; W ij g: In Eq. (1), Eðv; hÞ refers to the energy function of RBM. The network assigns a probability to each pair of visible and hidden vectors through the energy transfer function: where Z is the partition function and its expression is: Eq. (3) uses the following Eq. (4) to express the logarithmic gradient of weights: where v i h j data is the expected data, and v i h j model is the expected model. Then, the RBM learning rules can be obtained as follows: For data expectation, since there is no direct connection between RBM hidden layer units, an unbiased sample of data distribution can be quickly obtained. Suppose randomly given training images are v, the probability of the binary state of the hidden layer unit being is: For the same reason, the probability of the binary state of the visible layer unit is set to 1: In summary, the main objective of RBM is to calculate h ¼ fa i ; b j ; W ij g. This is mainly obtained by calculating the maximum log likelihood of RBM on the learning set. The calculation Eq. (8) is as follows: In order to obtain the optimal value of Eq. (8), it is calculated by the gradient ascent method [17]. The calculation process is as follows: For the calculation of Eq. (9), the literature [18] is used to calculate the derivative, and the calculation process is as follows: In Eq. (10), P hjv t ð Þ ; h À Á means the probability distribution of the hidden layer when the visible unit is set to uniform learning sample v t ð Þ , P hjv; h ð Þmeans the joint distribution of visible units and hidden layer units. According to the two distributions, the offset number distribution of parameter h ¼ fa i ; b j ; W ij g is: Eqs. (11)-(13) are RBM learning rules.

Whale Optimization Algorithm
The whale optimization algorithm [19] uses a set of search agents to determine the global optimal solution of the optimization problem. The search process for a given problem begins with a set of random solutions, and the candidate solutions are updated through optimization rules until the end conditions are met. The whale algorithm is divided into three stages: predation, bubble attack and food search.
(1) Surround predator In the initial stage of the algorithm, humpback whales do not know where the food is; they all obtain the position information of the food through group cooperation. Other whale individuals will approach this position and gradually surround the food, so the following mathematical model is used: In the Eq. (14), D ! refers to the distance vector from the search agent to the target food, and t refers to the current iteration times. C ! and A ! are coefficient vectors, X Ã is the local optimal solution, X ! is the place vector, and C ! and A ! are expressed as follows: In the Eq. (17), a ! refers to the linearly decreasing vector from 2 to 0, and r is a random number between 0 and 1.
(2) Bubble attack In this stage, the humpback whale was used to attack the bubble, and the behavior of the whale preying and spitting out the bubble was designed by shrinking the surroundings and spirally updating the position to achieve the goal of local whale optimization.

1) Shrinking and surrounding principle
When jAj < 1, the individual whale approaches the best whale at its current location, and the larger jAj is, the faster the pace of the whale.

2) Spiral update position
The individual humpback whale first calculates the distance from the current optimal whale and then swims in a spiral. When searching for food, the mathematical model of the spiral is: In the Eq. (18), b is the constant coefficient, and l is a random vector between 0 and 1.

3) Food searching stage
Humpback whales obtain good results by controlling the jAj vector. When jAj > 1, individual humpback whales have approached the position of the reference humpback whale, and the individual whale has updated its position toward the randomly selected humpback whale. The model is expressed as follows: In the Eqs. (19) and (20), X rand ! is a randomly obtained position vector of the reference humpback whale.
3 WOA-Optimized Dual Twin RBM-BP Neural Networks' Unmanned Driving Path-Planning Algorithm 3.1 Theoretical Analysis of WOA Optimized Double-Twin RBM-BP Because RBM parameter design is an extremely complicated task, RBM has no rules to follow during design, and it is difficult to ensure the optimization of the network. The parameter learning rate in the doubletwin RBM-BP deep learning neural networks, the number of visible layers v, the number of hidden layers h, and the parameter set jointly determine the performance of the double-twin RBM-BP neural networks. Therefore, the parameter design complexity is more complicated than the traditional single RBM. Optimizing these six parameters reasonably is the key to the effectiveness of the model in this paper. Using genetic algorithms, particle swarm optimization and other biomimetic algorithms to optimize BP neural network parameters can improve its performance. Thus, this paper uses the more powerful WOA algorithm for parameter optimization of the twin RBM-BP deep learning neural networks. The parameters v, h, and a total of 12 sets of parameters in the twin RBM-BP are optimized by the WOA algorithm.

Training Model Based on the Double Twin RBM-BP Neural Network
The RBM-BP combines the advantages of RBM and BP and targets complex and high-dimensional network traffic data. We use RBM's strong feature learning ability and unsupervised learning of highdimensional data to remove redundant features and reduce data complexity. This can reduce the training complexity of the data and improve the recognition accuracy of the deep learning network. However, the RBM-BP networks require huge calculations and have shortcomings regarding the training sample library. In this paper, we will use double twin RBM-BP networks to achieve the use of a small number of layers to reduce the number of training sessions. The structure of twin RBM-BP is shown in Fig. 2.

WOA-Optimized RBM-BP Parameter
During the predation phase, X ! refers to the location vector. In this paper, the variables that need to be optimized include the corresponding vectors of the parameter learning rate e, number of visible layers v, number of hidden layers h and parameter set h ¼ fa i ; b j ; W ij g. Therefore, X ! in Eq. (15) can be expressed as follows: In the spiral phase, the optimization objective function involved in the calculation is the training error of the twin RBM deep learning neural networks, so it is expressed as: Through optimization, the parameters v 1,opt , h 1,opt , e 1,opt and h ¼ fa i ; b j ; W ij g 1,opt and v 2,opt , h 2,opt , e 2,opt and h ¼ fa i ; b j ; W ij g 2,opt can be obtained. Apply the value of the two groups of the optimal parameters e; v; h; fa i ; b j ; W ij g Â Ã to the twin RBM deep learning neural networks to complete the optimization of the neural networks. Optimize two sets of parameters of twin RBM networks through WOA e; v; h; fa i ; b j ; W ij g Â Ã to minimize the training errors.

Path Planning
After optimizing the RBM-BP deep learning neural network parameters through the WOA algorithm, the following two sets of RBM models and their corresponding learning rules can be obtained: Eqs. (23) and (25) represent the model and learning rules of the first RBM in the twin RBM, which is mainly used for feature extraction of the input map data. After entering the map data matrix, neural networks are trained according to the obstacle area and non-obstacle area in the map. The training goal is to train with the shortest distance.
Eqs. (24) and (26) represent the model and learning rules of the second RBM in the twin RBM because there may be many different routes under the shortest distance between the starting point and the end point given at random. Therefore, it is necessary to analyze the similarity of these routes. In the route selection, routes with a small number of turns should be considered as much as possible, even if the number of turns is the same, as far as possible, and the angle of the turns is small. Therefore, the closeness between the optimal driving decision made by the twin RBM, and the actual target point is fed back to the neural network system as a feedback value.

Algorithm Process
According to the process described above, the design process of the algorithm in this paper is shown in Tab. 1: Table 1: Design process of the algorithm in this paper Input: RBM-BP related parameters, whale algorithm related parameters, route start and end points, number of iterations, map data matrix Output: Route planning of the model Step 1: Initialize the whale population Step 2: Calculate the corresponding fitness value of each individual whale Step 3: Save X * as the best whale Step 4: For loop iteration optimization.
Step 5: For every whale, calculate the corresponding fitness value

Simulation Experiment
To further illustrate the route planning effect of the algorithm in this paper, the basic RBM-BP deep neural networks, CNN networks, Q-Learning network and the improved Q-Learning network are compared with the algorithm in this paper (as shown in Tab. 2). The content of the comparison includes the comparison of training error, optimal route length under complex road conditions and route planning effect. In the experiment part, the hardware platform is CPU i7-9700, memory 32GDDR3, Wins 10, simulation software MATLAB 2019. Take the unmanned driving smart car in Fig. 3 as the research object. The car consists of an FPGA chip of model zynq7020 (as shown in Fig. 4), a CCD camera, four infrared sensors, eight ultrasonic sensors, eight hall sensors (for detecting driving speed), four angle sensors (detection of steering), and an acceleration sensor. The circuit is shown in Fig. 5. To improve the effect of the simulation, the movement direction of the car is expressed as up, down, left, top left, bottom left, top right, bottom right, and eight directions. Set the motion space of the trolley to a two-dimensional plane and simulate the environment area as a 20 × 20 grid (as shown in Fig. 6). Step 6: If the newly calculated fitness value fitness < current minimum value f obj;new < min f obj È É Step 7: Replace the best whale X ¼ X new End if. End for.

Comparison of Optimization Indexes of Neural Networks under Different Algorithms
To verify the advantages of this algorithm in optimizing the parameters of double-stacked RBM-BP neural networks, this paper optimizes and contrasts the double-stacked RBM-BP neural networks using the genetic algorithm (GA), ant colony optimization (ACO), particle swarm optimization (PSO), and WOA algorithm. The source of the dataset is UCI wine quality [20]. Randomly select 1000 data points as the training dataset and select 100 data points from the remaining data points as the test dataset. All data of the training dataset and the test dataset are normalized between [−1,1]. Adopt the average variance mean square error (MSE) and mean absolute percentage error (MAPE). N is the total number of samples, y n is the predicted value, and t n is the actual value. The Eq is as follows: Figs. 7-8. show the comparison of the five algorithms in the indexes MSE and MAPE. It is found from the training results of the algorithm in this paper are better than those of the other two algorithms. Fig. 9 shows the comparison of the training errors under the five algorithms. It is found that as the iterations increase, the training errors of the five algorithms are reduced gradually. However, the error reduction rate of the algorithm in this paper is significantly better than that of the other four algorithms. When the number of iterations is approximately 175, the training error of the algorithm in this paper  iterations. This shows that the algorithm in this paper uses twin RBM-BP to train the input data and WOA optimizes the neural network parameters to effectively reduce the number of training sessions, reduce the error rate, and achieve better training goals. Fig. 10 shows a schematic diagram of simulating a complex path, and Fig. 11 shows the path-planning effect of five algorithms in simulating complex road conditions. From the trajectory shown in the figure, the path length corresponding to the algorithm in this paper has obvious advantages. Compared with the RBM-BP, CNN networks, Q-Learning network and improved Q-Learning network are reduced by 16%, 28%, 16% and 8%, respectively. Fifty different sets of starting and ending points are randomly selected in the same scene, and the result of the average path is shown in Fig. 12. It can be found that the algorithm in this paper has certain advantages under the conditions of complex road conditions, mainly because the whale algorithm is optimized, so that the optimization effect of the parameters of the double-stacked RBM-BP networks is significantly improved. Therefore, it has a good planning effect under complex road conditions.  It is found from Figs. 15 and 17 that the simulated route of the algorithm in this paper almost matches the actual route traveled, which shows that the algorithm of this paper has a good result in path planning in actual scenarios. Figs. 16 and 18 show the comparison of the actual routes traveled by the five algorithms in the two scenarios. In Fig. 15, the algorithm of this paper circumvents the obstacle and goes directly to the end point. The trajectory does not show too much during the driving process. The curve is complete, and the RBM-BP route trajectory fluctuates greatly. The CNN curve has experienced two obstacles. The curve effect of Q-Learning and improved Q-Learning is slightly worse than that of the algorithm in this paper. The algorithm in Fig. 17 can circumvent obstacles and rush to the end in time, and the trajectory alters the least in the whole process, which shows that the planning effect of the algorithm in this paper is good. The path lengths of the CNN, RBM-BP and the improved Q-Learning are significantly greater than those of the algorithm in this paper. From the comparison of the results in the above figures, it has been shown that the algorithm in this paper has a better route planning effect in actual scenarios.

Conflicts of Interest:
The authors declare that they have no conflicts of interest to report regarding the present study.