Research on UUV Obstacle Avoiding Method Based on Recurrent Neural Networks

In this paper, we present an online obstacle avoidance planning method for unmanned underwater vehicle (UUV) based on clockwork recurrent neural network (CW-RNN) and long short-term memory (LSTM), respectively. In essence, UUV online obstacle avoidance planning is a spatiotemporal sequence planning problem with the spatiotemporal data sequence of sensors as input and control instruction to motion controller of UUV as output. And recurrent neural networks (RNNs) have proven to give state-of-the-art performance on many sequence labeling and sequence prediction tasks. In order to train the networks, a UUV obstacle avoidance dataset is generated and an offline training and testing is adopted in this paper. Finally, the proposed two types of RNN based online obstacle avoidance planners are compared in path cost, obstacle avoidance planning success rate, training time, time-consumption, learning, and generalization, respectively. And the good performance of the proposedmethods is demonstrated with a series of simulation experiments in different environments.


Introduction
The online obstacle avoidance planner is one of the important modules of UUV which reflects its intelligence level, which requires the UUV to plan a collision-free trajectory autonomously when it navigates in long range and unknown environment.At present, the main obstacle avoidance methods include traditional methods [1][2][3][4][5][6], bionics algorithm [7][8][9][10][11], and reinforcement learning methods [12][13][14][15].Traditionally, a bottleneck restricting the development of UUV obstacle avoidance technology is the uncertainty of underwater sensing equipment.And, the performance of obstacle avoidance in complex environment and even maze environment is not satisfactory.
LSTM is a RNN architecture that employs three special gating schemes to address the vanishing and exploding gradient problems.It is able to process complex sequential information for learning features from long-term input data and has proven to give state-of-the-art performance on many challenging problems involving precipitation nowcasting, predicting water table depth, traffic forecasting, object tracking, punctuation prediction, and so on.
The aim of precipitation nowcasting is to provide a forecast of the rainfall intensity in a local region over a relatively short period of time (e.g., 0-6 hours).Shi et al. formulated precipitation nowcasting as a spatiotemporal sequence forecasting problem with the sequence of past radar maps as input and the sequence of a fixed number of future radar maps as output and proposed the convolutional LSTM model, in which the convolutional structures are used to extract features and LSTM is used to do forecasting problem [16].
Long-term predictions of water table depth in agricultural areas face enormous challenges because of their complex, heterogeneous hydrogeological characteristics, boundary conditions, and human activities.In addition, there are nonlinear interactions among these factors.Zhang et al. proposed a time series model based on LSTM to alternate computationally expensive physical models, especially in areas where hydrogeological data are difficult to obtain [17].
Accurate and real-time traffic flow prediction especially short-term traffic flow information is an important part of intelligent transportation system.However, due to the stochastic and nonlinear nature of traffic flow, accurately predicting traffic state is a challenging task.The LSTM is able to learn time series with long time dependency and automatically determine the optimal time lags.Ma et al. found this feature is especially desirable for traffic prediction problems, where future traffic condition is commonly relevant to the previous events with long time spans and proposed a LSTM-based traffic flow prediction method to capture nonlinear traffic dynamic in an effective manner [18].[20].
Object tracking is a fundamental problem in computer vision with a wide range of applications.The target of a tracking system is to estimate the state sequence of the object based on observation sequence.LSTM has been introduced in object tracking for object representations via sequence learning.Li et al. employed LSTM units to directly learn temporally correlated representations of the objects in long sequences [21].Zhou et al. introduced a bidirectional LSTMbased appearance model to learn the spatial contextual dependency [22].Wang et al. proposed a 3D fish tracking method and multifish tracking method in which a LSTM network is employed to model the fish's motion process [23,24].
In other fields, Chen et al. modeled and predicted China stock returns using LSTM and improved the accuracy of stock return prediction greatly [25].Sak et al. demonstrated the state-of-the-art performance of LSTM networks on speech recognition tasks compared with RNN and deep neural networks (DNNs) models [26].Wu et al. utilized LSTM to solve remaining useful life estimation problem and got good remaining useful life prediction accuracy [27].Chherawala et al. presented a handwriting recognition model based on LSTM network which automatically learns features from the input image in a supervised fashion [28].
In 2014, Koutník et al. introduced CW-RNN which simplifies the RNN architecture, improves the performance of network, and speeds up the network evaluation [29].Achanta et al. showed that CW-RNN is equivalent to the standard RNN architecture with a time-varying leaky integration [30].
Two end-to-end online obstacle avoidance planners based on LSTM and CW-RNN, respectively, are presented in this paper.The obstacle avoidance planners take the information obtained by multibeam forward looking sonar (FLS) as input and directly output control instruction to motion controller of UUV.The RNN based obstacle avoidance planners remain robust performance even though the effects of measurement noises are considered.And due to the strong learning ability of RNN, the obstacle avoidance planners are capable for obstacle avoidance in the environments which far much complex than those environments existed in training samples.

UUV System Modeling
The obstacle avoidance planning on the vertical plane is usually achieved through depth adjustment, while the depth adjustment strategy often brings large pitch adjustment, which affects the attitude control of UUV.Therefore, this paper adopts the strategy of horizontal plane obstacle avoidance regulation priority and defines a horizontal 3 degrees-of freedom (DPF) control model for UUV, which can not only guarantee the safety of UUV collision avoidance planning, but also facilitate the UUV motion control.
The North East-fixed reference frame and body-fixed reference frame are shown in Figure 1.The 3 DOF control model of UUV is described as follows [31]: where  = [, , ]  is position vector correspond to the position of UUV in North East-fixed reference frame and the heading of UUV, respectively, () is the transformation matrix from North East-fixed reference frame to body-fixed reference frame,  = [, V, ]  denotes the velocity vector including the surge, sway, and yaw of UUV in body-fixed reference frame; the actuator input is denoted by  = [  0   ]  , and  =   +   , () =   () +   (), and () =   +   () denote the system inertia matrix, coriolis-centripetal matrix, and damping matrix, respectively.Specifically, A constant current is assumed in this paper which is expressed as a vector [  , V  ]  in body-fixed reference frame.And then the kinematic and dynamic equations of UUV can be described as where  = − 22 V  + ( 23 −    23 −   ),  = ( 32 −    32 )V  −  33  +   , and Assume there are two propellers distribute in the horizontal plane of UUV.And the force vector  is modeled as where   and   denote the speeds of propellers of UUV, respectively,   is the distance between the propeller and central axis of UUV, and (  ) and (  ) denote propeller coefficients.

Simulation Model of Sonar
The input data of obstacle avoidance planners proposed by this paper are obtained by multibeam forward looking sonar.A 2D simulation model of multibeam FLS based on SeaBat 8125 is established in this section.SeaBat 8125 is a state-ofthe-art high-resolution multibeam echosounder [32].It has a 120 ∘ field of view sector, 80 beams with width of 1.5 ∘ , and the maximum scan radius of 120.To simplify the input information of network, define the distance vector where    is the distance information detected by th ray of sonar at time step  and if    > 120, then set    = 120.The precision of sonar is set as 5.And taking the uncertainty of sonar detection into account, this paper sets the false alarm rate as 10%.

The Structures of Obstacle Avoidance Planners
4.1.The Structure of CW-RNN.The forward propagation of standard RNN is as follows: where   and   denote the weight matrices from input layer and hidden layer to hidden layer respectively,   is the weight matrix between hidden layer and output layer,   ,   , and   are the input vector, hidden state vector, and output Output layer The network structure of CW-RNN.Bias units are omitted to simplify the visualization network.
vector at time step , respectively, and   and   correspond to the biases of hidden layer and output layer, respectively.As shown in Figure 2, the neurons in hidden layer are grouped into  modules of size  in the forward propagation of CW-RNN.Each module i is set an explicit clock   = 2 −1 to operate.For every module j, only if   ≤   , the recurrent connections from module  to module  are existed.And the state of modules i will be updated only if the modules i satisfy (MOD  ) = 0 at each time step .The longterm memory is restored by the modules have long period.The local information obtained from input data is solved by modules with short period.
Therefore,   and   are partitioned into  blocksrows corresponding to  modules, and   is a block-upper triangular matrix: where each block-row    is partitioned into block-columns {0, ⋅ ⋅ ⋅ , 0,   , , ⋅ ⋅ ⋅ ,   , } and 4.2.The Structure of LSTM.In LSTM, the memory blocks are used to replace the hidden units in RNN.As shown in Figure 3, such a memory block consists of a cell, an input gate, an output gate, and a forget gate.The current state of hidden layer is restored in cell, the three import gate units, which control the input, output, and forget of cell, respectively.The forward propagation of LSTM is as follows: ) where

Construction of UUV Autonomous Obstacle Avoidance Planning Learning System
The principle framework of UUV autonomous obstacle avoidance planning learning system is shown in  trained offline.Then these fully trained planners are used to do obstacle avoidance planning for UUV in real time according to the environmental information obtained by FLS and some information of UUV obtained by motion and attitude sensor.The motion controller controls the UUV based on control commands output by online obstacle avoidance planners.
The flowchart of RNN based online obstacle avoidance planning system is as follows.
Step 1. Initialize the start position and target position of UUV, and deploy UUV in the start position.
Step 2. Acquire data from sonar, motion, and attitude sensors.
Step 3. The online RNN obstacle avoidance planner output the desired yaw and velocity of UUV according to sensors data.
Step 4. UUV adjusts its heading and velocity according to the output instruction of online RNN obstacle avoidance planner.
Step 5. Determine whether the UUV reach the target position, and if so, the obstacle avoidance planning algorithm is stopped.Else, jump to Step 2.

Data Processing and Network Training
The input sequence   of obstacle avoidance planners at time step t consists of distance vector   and the angle between UUV and target in North East-fixed reference frame   .The output vector of obstacle avoidance planners at time step t is constituted by the adjustment of heading and the velocity of UUV.The dataset consists of 120,000 training samples and 4810 test samples.In the dataset, the start point, target point, and obstacles are generated randomly.And Min-Max normalization is used to preprocess input and output data.
The only difference between the two types of obstacle avoidance planners is the structure hidden layers which are composed by CW-RNN and LSTM, respectively.This setting is convenient for comparison between the performance of CW-RNN and LSTM on obstacle avoidance for UUV.The two types of obstacle avoidance planners consist of input layer, hidden layer, middle layer, and output layer.There are 81 neurons in input layer, 23 neurons in middle layer, and 2 neurons in output layer.To overcome the problem of overfitting, dropout with 0.6 keep probability is used in the process of train.The loss function is mean squared error (MSE); the weights are updated using the backpropagation through time minibatch gradient descent to minimize MSE of which batch size is set as 10000.And the optimizer is Adam optimizer; the maximum number of iterations is 20000.All networks are trained at Core i3 CPU 2.00GHz×4.The parameters of four networks are shown in Table 1.And the MSE of the four networks on test dataset is shown in Figure 5. Table 1 and Figure 5 show that, for the same network, the offline training time of the networks increases and the convergence slows down as the number of parameters rises, but the best MSE reduces.And in the early stage of training, the network with fewer parameters converges faster, while in the later stage, the opposite happens.Compared with CW-RNN, LSTM converges faster and obtains better results.

Results and Analysis
In this section, a statistical experiment and several illustrative examples are present to validate the ability of obstacle avoidance algorithms.The size of the map is set to 800 × 1200; the velocity of UUV is set as a constant 8.And taking the environmental factors into consideration, this paper added 10% false alarm rate to sonar data in simulation test cases.

Statistical Experiment.
In order to verify the obstacle avoidance planning effect of each network under different environmental disturbances, the statistical experiment is designed in this paper.The experiment counted the performance of different networks on 100 random maps at the false alarm rate of 5%, 10%, and 15%, respectively.The experimental results are shown in Table 2. Table 2 shows that, for the same network, the more parameters, the higher planning success rate, the lower path cost, but more time the algorithm takes.Compared with CW-RNN, LSTM has advantages in path cost, success rate, and stability.The reasons for the failure of each network planning are shown in Table 3.Among them, 'nonarrival' means that UUV stops near the target point, not at the target point.'Disorientation' means that UUV drifts through the map after dodging obstacles, rather than moving toward the target.The disorientation occurs when the obstacle avoidance planner cannot extract the target information.It can be seen from Table 3 that the increase of false alarm rate makes the probability of collision and disorientation path planned by CW-RNN increase, but it has little effect on LSTM.This indicates that LSTM is superior to CW-RNN in processing of long-term memory.As shown in Table 3, CW-RNNs get a higher probability than LSTMs both in the terms of collision and lost, which indicates that LSTM has better ability to learn and extract detailed features than CW-RNN.speed of UUV are shown in Figures 6-9, respectively.As shown in the simulation results, in the maps with the same complexity as the training environment, the four proposed obstacle avoidance algorithms can quickly generate the path without collision with obstacles, and the planning results satisfy the UUV kinematics.In this simulation test case, all the four obstacle avoidance algorithms show strong learning ability.And compared with other structures, there are fewer oscillations in the path planned by LSTM45.

Simulation Test Case 2.
Assume that the start point is (156, 39) and the target position of UUV is (630, 1070).
Figure 10 shows the tracks of UUV planned by the four obstacle avoidance algorithms.As the simulation results show that all methods are effectively controlling UUV to avoid the obstacles and reach the target position.And all RNN based obstacle avoidance planners have learned the ability that adjusts UUV's heading to navigate toward the target position quickly after avoiding obstacles.As Figures 11 and  12 show, the yaw and propeller speed of UUV planned by RNN based obstacle avoidance planners are conformed to the actual practice.In the map with discrete distribution of obstacles, even though the environment complexity of the map is improved, the four obstacle avoidance algorithms can generate noncollision paths.The simulation results indicate that all the four algorithms have a degree of generalization ability and adaptive capability.map than those maps is included in train and test dataset is adopted in this simulation test case.The tracks, yaw, and propeller speed of UUV are shown in Figures 13, 14, and 15, respectively.As shown in the simulation results, in the complex environment with continuous distribution of obstacles, UUV is planned by CW-RNN96 to avoid obstacles with the roam mode.This is because the CW-RNN96 cannot extract the target point information, which is more detailed than the obstacle information.And LSTM18, CW-RNN180, and LSTM45 are still capable for obstacle avoidance, which exhibit satisfactory abilities of learning and generalization in this problem.Although LSTM18 has fewer parameters than CW-RNN96, it has a better performance in complex environment.

Simulation Test Case 4.
In order to test the generalization and exploration ability of various methods, this simulation test case adopts a maze map of continuous obstacles shown in Figure 16.In the training set, the target is all set on the east side of the map, and UUV always moves on the west side of the target point, which means the angle between UUV and target in North East-fixed reference frame is 180 ∘ <   < 360 ∘ .In this map, the target point is in the middle of the map, and UUV must move around the target and reach the target, which means 0 ∘ <   < 360 ∘ .The simulation results are shown in Figures 16-18.It can be seen from the simulation results that all the methods performed well in the early stage of planning (180 ∘ <   < 360 ∘ ).As UUV moves, the range of   changes to (0 ∘ , 180 ∘ ), collision exists in the path CW-RNN96 planning, disorientations exist in the path of CW-RNN180 planning disorientation, and nonarrival exists in the path of LSTM18 planning.Only LSTM45 has the capability of path generation in this maze environment and shows excellent performance.The simulation results show a strong generalization and exploration ability of LSTM45.

Simulation Test Case 5.
The results of statistical experiment and simulation test 2-4 show that, compared with the three methods, LSTM45 is the best method for UUV obstacle avoidance planning.In this test case, a series of simulations in dynamic environments are used in to further test LSTM45's ability of obstacle avoidance.Figures 19 and 20 show the simulation results of LSTM45 in several dynamic environments that the obstacle with different motions.And Figures 21 and 22 show the simulation results of LSTM45 in complex environment with many static and moving obstacles.Assume that the dynamic obstacles always travel in straight lines with constant velocity.The directions of motion of obstacles are indicated by the arrow in obstacles.The velocities of obstacles are set as 8kn in Figure 19(a) and 4kn in other cases.The simulation results show that LSTM45 drives UUV navigates toward the target, until a collision threat is found.After obstacle avoidance, UUV is planned to move toward the target continue.Although the training set does not contain any dynamic obstacles, LSTM45 still explore the strategy to avoid dynamic obstacles.
It can be seen from the above experiments that (1) when the number of parameters is similar, CW-RNN and LSTM also show similar performance in terms of training time, (3) For all the four algorithms, LSTM45 has the best learning ability, generalization and exploration ability and robustness.It is able to solve the problem of obstacle avoidance for UUV in a dynamic or even complex dynamic environment after being trained in simple and static environments.

Conclusion
Inspired by state-of-the-art performance of CW-RNN and LSTM on many sequence prediction tasks, this paper presented two types of obstacle avoidance algorithms based on CW-RNN and LSTM, respectively, and compared the performance of CW-RNN and LSTM on obstacle avoidance task.The proposed obstacle avoidance algorithms based on LSTM and CW-RNN achieved a very robust performance on the online obstacle avoidance problem of UUV under unknown environment and remained robust performance even though the effects of measurement noises are considered.And due to the strong learning ability and generalization ability, the obstacle avoidance algorithms are capable for obstacle avoidance in the environments which are much complex than those environments existing in training samples.When the number of parameters is similar, CW-RNN and LSTM also show similar performance in terms of training time, the best loss, and time-consumption, but in terms of path cost, obstacle avoidance planning success rate, generalization ability, and robustness, LSTM has a better performance.For all the proposed four methods, LSTM45 obtained the best performance in terms of learning ability, generalization, and exploration ability and robustness.The simulation in dynamic environment verified further the excellent ability in obstacle avoidance planning.

Figure 1 :
Figure 1: Global and local coordinate systems.

Figure 4 .
At first, the RNN based obstacle avoidance planners are

Figure 4 :
Figure 4: Principle framework of RNN based obstacle avoidance planning learning system.

Case 1 .
For further analysis of the learning ability of the proposed obstacle avoidance algorithms, this simulation test case tests the obstacle avoidance performance of the four structures in two maps with the same complexity as maps in training set.The tracks, yaw, and propeller

Figure 5 :
Figure 5: The mean squared error of all structures on test set.

Figure 6 :Figure 7 :
Figure 6: Online planning results of four obstacle avoidance algorithms of UUV in simulation test cases 1(a) and (b).

7. 4 .Figure 8 :Figure 9 :
Figure 8: Curves of left propeller speed controlling feedback corresponding with the different obstacle avoidance algorithms in simulation test cases 1(a) and (b).

Figure 10 :Figure 11 :Figure 12 :Figure 13 :Figure 14 :
Figure 10: Online planning results of four obstacle avoidance algorithms of UUV in simulation test case 2.
,   ,   , and   are the weight matrices from input vector to input gate, forget gate, cell, and output gate, respectively;  ℎ ,  ℎ ,  ℎ , and  ℎ are the weight matrices from the output of memory block at previous time step to input gate, forget gate, cell, and output gate, respectively;   ,   ,   , and   are biases of input gate, forget gate, cell, and output gate, respectively; (⋅) is activation function of gate unit, which is set as logistic sigmoid function in this paper; • represents element-wise product.
,   ,   ,   , and ℎ  are outputs of input gate, forget gate, cell, output gate, and memory block at time step t, respectively;   is input vector of memory block at time step t; ℎ −1 is the output vector of memory block at t-1 time step;

Table 1 :
The performance of all structures in training.

Table 2 :
The performance of all structures in statistical experiment.

Table 3 :
The reasons for failure of obstacle avoidance planning in statistical experiment.