Next Article in Journal
Study of Free Oscillations of Bays in the Northwestern Part of Posyet Bay
Previous Article in Journal
Acoustic Delicing of Atlantic Salmon (Salmo salar): Fish Welfare and Salmon Lice (Lepeophtheirus salmonis) Dynamics
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Hyperparameter Optimization for the LSTM Method of AUV Model Identification Based on Q-Learning

1
School of Information Science and Engineering, Ocean University of China, Qingdao 266000, China
2
Institute of Oceanographic Instrumentation, Qilu University of Technology (Shandong Academy of Sciences), Qingdao 266000, China
*
Author to whom correspondence should be addressed.
J. Mar. Sci. Eng. 2022, 10(8), 1002; https://doi.org/10.3390/jmse10081002
Submission received: 12 May 2022 / Revised: 16 July 2022 / Accepted: 16 July 2022 / Published: 22 July 2022
(This article belongs to the Section Ocean Engineering)

Abstract

:
An accurate mathematical model is a basis for controlling and estimating the state of an Autonomous underwater vehicle (AUV) system, so how to improve its accuracy is a fundamental problem in the field of automatic control. However, AUV systems are complex, uncertain, and highly non-linear, and it is not easy to obtain through traditional modeling methods. We fit an accurate dynamic AUV model in this study using the long short-term memory (LSTM) neural network approach. As hyper-parameter values have a significant impact on LSTM performance, it is important to select the optimal combination of hyper-parameters. The present research uses the improved Q-learning reinforcement learning algorithm to achieve this aim by improving its recognition accuracy on the verification dataset. To improve the efficiency of action exploration, we improve the Q-learning algorithm and choose the optimal initial state according to the Q table in each round of learning. It can effectively avoid the ineffective exploration of the reinforcement learning agent between the poor-performing hyperparameter combinations. Finally, the experiments based on simulated or actual trial data demonstrate that the proposed model identification method can effectively predict kinematic motion data, and more importantly, the modified Q-Learning approach can optimize the network hyperparameters in the LSTM.

1. Introduction

The ocean occupies most of the earth’s surface and is an important area for commercial activities, scientific research, and resource extraction, and its impact is critical to all aspects [1,2]. Over the past half-century, oceanographic research has shown that the oceans and seafloor are critical to understanding the planet. Exploring the marine environment provides valuable knowledge for many areas of science and engineering. AUVs have been widely used in marine engineering due to their unique advantages. Autonomous underwater vehicles (AUVs) equipped with various sensors have broad applications in scientific, military, and commercial missions, such as deep-sea exploration, cable/pipe tracking, feature tracking, and more [3].
AUVs are complex nonlinear coupled systems, and it is challenging to model them accurately [4,5]. Therefore, studying the non-linear mechanism of AUV systems and finding mathematical methods that can accurately express them have become the focus of AUV research, which has essential educational and economic value [6,7,8,9]. Neural network models have recently been widely used in AUV system model identification. In the identification networks, recursive structures are adapted to acquire dynamic information and improve the communication between neurons [10]. The experimental results show that the method can fit the AUV system. The weight values of the network were tuned using a hybrid algorithm of the genetic algorithm (GA) and the error backpropagation algorithm (BP) [11]. Moreover, the neural network with the hybrid learning algorithm improves the learning speed of convergence and identification accuracy. An approximation of the lumped disturbance and estimation of the parametric uncertainty is achieved by a dynamic neural network [12]. The Bayesian network is used to construct a state observer for robot control when both the actuator and sensor models are used [13]. A multi-scale attention-based long short-term memory (LSTM) model is adopted to identify the ship’s non-linear model under different ocean conditions. The experiment results show that the LSTM model has higher prediction accuracy than the traditional support vector regression (SVR) and radial basis function (RBF) model [14]. Ref. [15] also shows that the LSTM method has a high quality of time-series prediction and has important practical applications. Therefore, this study uses an LSTM neural network approach for AUV system identification.
Neural networks have good non-linear mapping ability and have been widely used in system identification, especially in non-linear systems [16,17,18]. The value of the neural network hyperparameters can significantly affect its performance [19]. However, its value has no theoretical guidance, and different systems often require different values. Therefore, it is of great practical significance to study a method for automatic optimization of hyperparameters of system identification algorithm [20,21].
In response to the above problems, some optimization methods of neural network model structure based on reinforcement learning have been proposed. Meta-modeling based on reinforcement learning enables automated generation of high-performing convolutional neural network (CNN) architectures for a given learning task [22]. A novel multi-objective reinforcement learning method is proposed for hyperparameter optimization to solve the limitations in the actual environment, such as latency and CPU utilization [23]. A context-based meta-RL approach is used to maximize the accuracy of the validation set [24]. It can tackle the data-inefficiency problem of hyperparameter optimization. The above methods have proved the effectiveness of reinforcement learning in hyperparameter optimization. If the hyperparameter optimization problem of a neural network is regarded as a reinforcement learning problem of continuous state-action space, such as deep Q network (DQN) [25] or proximal policy optimization (PPO) [26], the algorithm needs to go through hundreds or even thousands of times of learning to ensure convergence. When the computing resources are minimal, it is not suitable for the situation where the search space of parameters is ample or the performance evaluation of the algorithm is very expensive. Therefore, the Q-learning [27] algorithm is selected in this paper to solve the hyperparameter optimization problem of the LSTM neural network.
Our main objective in this paper is to identify the AUV model and optimize hyperparameters using an improved Q-learning LSTM neural network method. The main contributions of this paper lie in the following three points: (1) We adopt the LSTM method to identify the AUV dynamic model and conduct an in-depth analysis of its principle; (2) Reinforcement learning framework to solve the hyperparameter optimization problem of the LSTM method; (3) The method’s effectiveness in this paper is verified by verifying the actual AUV dataset.
The rest of the paper is organized as follows: Section 2 provides the AUV’s hydrodynamic model and force analysis. Section 3 describes in details of the proposed method. The experimental results are presented and discussed in Section 4. Finally, Section 5 gives the main conclusions and discusses future work.

2. AUV Model Analysis

2.1. AUV Hydrodynamic Model

This research focuses on the "Sailfish" 210 AUV developed by the Underwater Vehicle Laboratory of Ocean University of China (Qingdao, China), as shown in Figure 1. The cabin structure comprises four sections: the bow, the navigation cabin, the electronic energy cabin, and the propulsion system cabin. A unified electrical and mechanical interface is used between the cabins, which is beneficial for us to configure different loads for the AUV for different task requirements. The bow is generally equipped with an underwater acoustic communication machine; the navigation cabin includes attitude and heading reference system (AHRS), global positioning system (GPS), doppler velocity log (DVL), etc. The electronic energy cabin includes batteries, industrial computer systems, etc.; the propulsion system cabin mainly includes steering gear and thruster motors. The experiments carried out in this paper are based on the "Sailfish" 210 AUV platform, which has a maximum speed of 5 kn (2.5 m/s). The parameters are shown in Table 1.
The general motion of the AUV was described with two coordinate systems, body-fixed reference frame ( G x y z ) and earth-fixed reference frame ( E ξ η ζ ) [28]. AUV translational and rotational motions are described in six degrees of freedom (6DOF) as follows:
η = [ η 1 T η 2 T ] T , η 1 = η ξ ζ T , η 2 = ϕ θ ψ T v = [ v 1 T v 2 T ] T , v 1 = u v w T , v 2 = p q r T τ = [ τ 1 T τ 2 T ] T , τ 1 = X Y Z T , τ 2 = K M N T
where η 1 and η 2 refer to the position and orientation of the AUV with respect to the earth-fixed reference frame, υ denotes the translational and rotational speeds with respect to the body-fixed reference frame, τ 1 and τ 2 refer to the external forces and moments with respect to the body-fixed reference frame. A diagram of the AUV coordinate system is shown in Figure 1.
Translational velocities u v w are converted from linear velocities [ ξ ˙ η ˙ ζ ˙ ] by T 1 :
T 1 = cos ψ cos θ cos ψ sin θ sin ϕ sin ψ cos ϕ cos ψ sin θ cos ϕ + sin ψ sin ϕ sin ψ cos θ sin ψ sin θ sin ϕ + cos ψ cos ϕ sin ψ sin θ cos ϕ cos ψ sin ϕ sin θ cos θ sin ϕ cos θ cos ϕ
Then,
ξ ˙ η ˙ ζ ˙ = T 1 u v w
The angular rates ϕ ˙ θ ˙ ψ ˙ are converted to the rotational velocities p q r by T 2 :
T 2 = 1 sin ϕ tan θ cos ϕ tan θ 0 cos ϕ sin ϕ 0 sin ϕ / cos θ cos ϕ / cos θ
Thus,
ϕ ˙ θ ˙ ψ ˙ = T 2 p q r
The general motion is described in ( G x y z ) by Equation (6), where the first three equations describe the translation and the last three describe rotation:
m [ ( u ˙ v r + w q ) x G ( q 2 + r 2 ) + y G ( p q r ˙ ) + z G ( p r + q ˙ ) ] = X m [ ( v ˙ w p + u r ) y G ( r 2 + p 2 ) + z G ( q r p ˙ ) + x G ( q p + r ˙ ) ] = Y m [ ( w ˙ u q + v q ) z G ( p 2 + q 2 ) + x G ( r p q ˙ ) + y G ( r q + p ˙ ) ] = Z I x p ˙ + ( I z I y ) q r + m [ y G ( w ˙ + p v q u ) z G ( v ˙ + r u p w ) ] = K I y q ˙ + ( I x I z ) r p + m [ z G ( u ˙ + w q v r ) x G ( w ˙ + p v u q ) ] = M I z r ˙ + ( I y I x ) p q + m [ x G ( r ˙ + u r p w ) y G ( u ˙ + q w v r ) ] = N
where:
m: AUV mass.
x G , y G , z G : the position of centre gravity of the AUV.
I x , I y , I z : the moment of inertia of the AUV.
u , v , w : velocities along the x-axis, y-axis, and z-axis of the AUV.
p , q , r : roll angular velocity, pitch angular velocity and yaw angular velocity.
u ˙ , v ˙ , w ˙ , p ˙ , q ˙ , r ˙ : linear acceleration and angular acceleration.
X , Y , Z , K , M , N : external force and moment.

2.2. The Dynamical Principles AUV Model Identification

The dynamic model mathematically describes the essential law of the interaction between the AUV and the environment, which can well reflect the state transition of the AUV under the action of force. This paper will realize the AUV model identification based on the dynamic model. The external force (moment) exerted on the AUV mainly includes gravity and buoyancy, hydrodynamic force, rudder force, thrust force, etc. [29]. The force analysis and model identification principle are detailed below.

2.2.1. The Static Force

The static force of the AUV is generated by the gravity P and buoyancy B. The center of buoyancy and center of gravity coordinates in the body-fixed frame are ( 0 , 0 , 0 ) and ( 0 , 0 , z e ) . The component of the static force in the earth-fixed frame is ( 0 , 0 , P B ) , which can be obtained by converting it to the motion coordinate system by Equation (7):
X = ( P B ) sin ( θ ) Y = ( P B ) cos θ sin ϕ Z = ( P B ) cos θ cos ϕ K = p h cos θ sin ϕ M = p h sin θ N = 0
where p is the underwater full displacement of AUV and h represents the depth of AUV.

2.2.2. Hydrodynamic Force

The hydrodynamic forces of AUV are usually divided into inertial hydrodynamic forces and viscous hydrodynamic forces, and the interaction between the two is ignored. In infinitely deep, wide and still water, the hydrodynamic forces of the AUV depend only on its motion and is a function of motion parameters u , v , w , p , q , r , u ˙ , v ˙ , w ˙ , p ˙ , q ˙ , r ˙ . According to the idea of Taylor expansion, the hydrodynamic forces ( X H , Y H , Z H ) and the moments ( K H , M H , N H ) are expanded to obtain the expression of the hydrodynamic forces:
X H = [ X q q q 2 + X r r r 2 + X r p r p ] + [ X u ˙ u ˙ + X v r v r + X w q w q ] + [ X u u u u + X v v v 2 + X w w w 2 ] Y H = [ Y r ˙ r ˙ + Y q r q r + Y p ˙ p ˙ + Y p q p q + Y p p p p ] + [ Y v ˙ v ˙ + Y v q v q + Y w r w r + Y w p w p ] + [ Y r u r + Y v r v v v 2 + w 2 1 2 r + Y p u p ] + [ Y 0 u 2 + Y v u v + Y v v v v 2 + w 2 1 2 ] + Y v w v w Z H = [ Z q ˙ q ˙ + Z r r r 2 + Z p p p 2 + Z r p r p ] + [ Z w ˙ w ˙ + Z v r v r + Z v p v p ] + [ Z q u q + Z w q w w v 2 + w 2 1 2 q ] + [ Z 0 u 2 + Z w u w + Z w w w v 2 + w 2 1 2 ] + [ Z w u w + Z w w w v 2 + w 2 1 2 ] + Z v v v 2 K H = [ K p ˙ p ˙ + K r ˙ r ˙ + K q r q r + K p q p q + K p p p p ] + [ K p u p + K r u r + K v ˙ v ˙ ] + [ K v q v q + K w p w p + K w r w r ] + [ K 0 u 2 + K v u v + K v v v v v 2 + w 2 1 2 ] + K v w v w M H = [ M q ˙ q ˙ + M r r r 2 + M q q q q + M p p p 2 + M r p r p ] + [ M w ˙ w ˙ + M v r v r + M v p v p ] + [ M q u q + M w q q v 2 + w 2 1 2 ] + [ M 0 u 2 + M w u w + M w w w v 2 + w 2 1 2 ] + [ M w u w + M w w w v 2 + w 2 1 2 ] + M v v v 2 N H = [ N r ˙ r ˙ + N q r q r + N r r r r + N p ˙ p ˙ + N p q p q ] + [ N v ˙ v ˙ + N w r w r + N v q v q + N w p w p ] + [ N r u r + N v r r v 2 + w 2 1 2 + N p u p ] + [ N 0 u 2 + N v u v + N v v v v 2 + w 2 1 2 ] + N v w v w

2.2.3. Thrust

The thrust generated by the propeller is calculated as follows:
X T = ( 1 t ) ρ n 2 D 4 K T
where r: the propeller rotational speed; D: the propeller diameter; t: thrust derating factor; ρ : the water density; and K T : dimensionless thrust coefficient. K T is a function related to the advance ratio J = u ( 1 w ) n D , which can be approximated as:
K T = k 0 + k 1 J + k 2 J 2
where k 0 , k 1 , k 2 are a constant coefficients.
n = u ( 1 w ) D J can be obtained from the advance ratio formula, and the relevant variables are substituted into Equation (9), the functional relationship between thrust and speed can be obtained with the following equation:
X T = 1 2 ρ L 2 u 2 ( a T + b T + c T )
where L is the body length, a T = μ k 2 , b T = μ k 1 μ k 1 J J , c T = μ k 0 μ k 0 J 2 J 2 , μ = 2 ( 1 t ) ( 1 w ) 2 D 2 2 ( 1 t ) ( 1 w ) 2 D 2 L 2 L 2 .

2.2.4. Rudder Force

This paper discusses the underactuated underwater robot, whose rudder force comes from a pair of horizontal and vertical rudders mounted on its tail. When the rudder moves at speed V and angle of attack α , it will be subjected to two parts of force: lift force perpendicular to the direction of the water flow, and the resistance along the direction of the water flow, the calculation formula is as follows:
L = 1 2 C L ρ A R V 2 D = 1 2 C D ρ A R V 2
where C L is the lift coefficient, C D is the drag coefficient, and A R is the cross-sectional area of the rudder.

2.2.5. The Principle of Identification

F e l s e = [ X e l s e , Y e l s e , Z e l s e , K e l s e , M e l s e , N e l s e ] T is used to represent static force, thrust force, rudder force, and disturbance force. Then, using the superposition principle, we can obtain the AUV force expression:
F = F H + F e l s e
Bring Equation (13) into Equation (6), and simplify the equation of motion, we can obtain:
E X ˙ = F v i s + F e l s e
where:
X = u , v , w , p , q , r T
E = m X u ˙ 0 0 0 m z G m y G 0 m Y v ˙ 0 m z g 0 m x G Y r ˙ 0 0 m Z w ˙ m y G m x G Z q ˙ 0 0 m z G m y G I x K p ˙ 0 0 m z G 0 m x G M w ˙ 0 I y M q ˙ 0 m y G m x G N v ˙ 0 0 0 I z N r ˙
F v i s = X v i s , Y v i s , Z v i s , K v i s , M v i s , N v i s T is non-inertial hydrodynamic force.
It can be seen from the above analysis that the acceleration of the AUV is recorded as the combined action of the non-inertial hydrodynamic force and other forces except the hydrodynamic force, denoted as X ˙ = f ( u , v , w , p , q , r , n , δ r , δ s ) . Further, the acceleration results from the non-inertial hydrodynamic force F v i s = f H ( u , v , w , p , q , r ) , the thrust force F T = f T ( u , n ) , and the rudder force F L / D = f δ ( u , δ r , δ s ) .
Based on the above analysis of the AUV model, the AUV states are denoted as X t = [ x , y , z , ϕ , θ , ψ , u , v , w , p , q , r ] , then the state changes can be expressed as Δ X t = f 1 ( ϕ , θ , ψ , u , v , w , p , q , r , n , δ r , δ s ) . The historical data of AUV implies the causal relationship of its dynamic model and has the characteristics of a hidden Markov model, which can be used to build an AUV data-driven model.

3. AUV Model Identification Method

AUV system identification is essentially a mathematical modeling method. Its primary purpose is to build its mathematical model, which can be used in many aspects, such as controller involvement, system prediction, and system simulation. We can see from the above that the AUV system is a complex and uncertain, highly nonlinear system, and it is not easy to obtain an accurate dynamic model. We aim at this problem by adopting a data-driven model identification method based on LSTM neural network. Further, the Q method is used to optimize its hyperparameters to improve the learning efficiency of the LSTM neural network.

3.1. Fundamentals of System Identification

Identification based on a neural network means that the neural network is directly used to learn the mapping relationship between input and output. The learning criterion minimizes the error between the network’s output and the system’s actual output. From the above, the goal of learning is to minimize the objective function of error [30], which is as follows:
E = 1 2 ( y ( t ) y n ( t ) ) 2
where, y n ( t ) is the output of the neural network at time t, and y ( t ) is the actual output of the system at time t. Neural networks can fit any function with arbitrary precision. In principle, the desired output will be obtained as long as there is enough training data and input.
Since the environment is full of time-series information, the information before and after them is related to a certain extent. For example, AUV’s position and attitude data are all data sequences that change with time. Additionally, the LSTM [31,32] is good at processing this kind of data information. Its basic structure is shown in Figure 2.
The network structure introduces a cell state which contains all the information at the last moment. When new information is encountered, a series of operations will be taken to choose between the old and new information. Coupled with the introduced “memory-forgetting” mechanism, the processing of long-term series data can be realized. The structure mainly includes input, forget, and output gates. The input are h t 1 and x t , the output is h t , and the cell states are c t 1 and c t .
The forget gate controls the time dependence and effects of previous inputs and determines which states are remembered or forgotten. The output of the forget gate is:
o f = σ ( w f x t + u f h t 1 ) C t 1
The input gate is also called the selection memory stage. It is to decide the degree of consideration for the current moment. The calculations for each part are as follows:
i t = σ ( w i x t + u i h t 1 ) C t = tanh ( w c x t + u c h t 1 ) C t = f t C t 1 + C t i t
The output gate determines the final output information. The computational procedure is summarized as follows.
O t = σ ( w o x t + u o h t 1 ) h t = O t tanh ( C t )
The above is the basic principle of the neural network LSTM, which uses the gated state to selectively memorize the input information to meet memory needs, while forgetting the long-term sequence information.

3.2. Identification Principle and Process

After the actual test of AUV, we can obtain the dataset required for model identification. After removing the invalid data, an input and output model is established according to the navigation control instruction information of the AUV and its posture information. According to Equation (14), we can obtain:
x ˙ = d x d t = x 4 x 5 x 6 E 1 F 1 E 2 F 2 E 3 F 3 x 10 x 11 x 12 E 4 F 4 E 5 F 5 E 6 F 6
where, x 1 = x , x 2 = y , x 3 = z , x 4 = u , x 5 = v , x 6 = w , x 7 = ϕ , x 8 = θ , x 9 = ψ , x 10 = p , x 11 = q , x 12 = r F = F ( ϕ , θ , ψ , u , v , w , p , q , r , n , δ r , δ s ) . The subscripts of the matrices E n and F n represent the row and column of the matrix, respectively. Next, the learned AUV dynamic model can be expressed as:
P ( x t , u t ) : ( x t , u t ) ( y t + 1 y t ) = Δ y t
where u t is comprised of the thruster command n t and rudder angle commands ( δ r t , δ s t ) .
The structure of the LSTM-based AUV model is shown in Figure 3. The input elements of the input layer is x i n p t = [ x t , u t ] , and the output of neural network is x o u t t = [ Δ y t ] . The attitude and speed information of the next moment can be obtained by using the control instruction, attitude and speed information of 20 sets of time-series data. The learned model can be optimized according to the set loss function. More details of network architectures are described below.
To avoid inconsistencies due to different relative scale sizes of different features, we normalize the data as follows:
f : x x = x x min x max x min
where x , x R , x min = min ( x ) , x max = max ( x ) .
Model evaluation criteria are mainly used to evaluate the accuracy of the recognition model. In this study, the mean square error between the output of the neural network and the actual output is used as the evaluation index, as follows:
M S E = 1 N n = 1 N [ d ( n ) y ( n ) ] 2
where d ( n ) is the output of the neural network, y ( n ) is the system’s actual output, and N is the number of datasets calculated at one time.
For the system identification of the AUV in this study, the model is trained by minimizing the error M = 1 | D | ( I , Δ y ) D 1 2 | | ( Δ y t O ) | | 2 , where O = P ( I ) , the training dataset D consists of input–output training pairs I : ( x , u ) O : ( Δ y ) . The multi-step error is adopted to test the effectiveness of the learned dynamic model, as shown in Equation (25).
M H = 1 | D H | ( O , u t , Δ y ) D H 1 H h = 1 H 1 2 | | ( Δ y P ( ( O 4 12 , u t ) ) ) | | 2
where O 4 12 represents the desired input data ( ϕ , θ , ψ , u , v , w , p , q , r ) from the first four columns of the output data O obtained in the previous step. Additionally, the training dataset D H consists of input–output data pairs I : ( O 4 12 , u ) O : ( Δ y ) . To sum up, the learned multi-step model can update the AUV state ( x , y , z , ϕ , θ , ψ , u , v , w , p , q , r ) in a cyclic manner using only action instructions u .

3.3. Hyperparameter Optimization for Identification Algorithm

3.3.1. MDP Modeling of LSTM Hyperparameter Optimization Problems

The performance of the LSTM algorithm is highly dependent on hyperparameters. Moreover, different tasks often require different hyperparameter configurations. To achieve high-precision identification of the AUV system, we adopt reinforcement learning to optimize the hyperparameter configuration of the LSTM network.
Reinforcement learning is one of the many categories of machine learning methods in which the best/suboptimal strategy is determined by interacting with dynamic environments [33]. A Markov Decision Process (MDP) includes five elements: M = S , A , R , T , γ : where:
S: the set of possible states.
A: the set of actions generated by the policy.
R: reward model.
T: dynamics model, the probability of reaching the next state with the current state and action.
γ : discount factor (between 0 and 1).
Figure 4 shows the basic structure of reinforcement learning. In this way, the agent learns how to map states to actions. At time t, the agent receives state s t and produces action a t , then transitions to the next state s t + 1 and obtains reward r t . The process does not stop until the final condition is reached. To improve the optimization efficiency, the hyperparameter optimization of the LSTM neural network is regarded as a reinforcement learning problem with discrete state space and discrete action space. Then, the above elements are designed for this problem.
Action space: The hyperparameters to be optimized and the candidate values of each hyperparameter are determined according to the LSTM network model structure. We concentrate on the number of neurons N n 1 and N n 2 in the hidden layers, batch size N b and time step N t s in this study, while other settings are determined empirically. Therefore, the hyperparameters to be optimized and their candidate values are shown in Table 2.
State space: In this paper, the current hyperparameter configuration a of the network is taken as the state s at time t. Then, the state space S is the same as the action space A, that is, ( N n 1 , N n 2 , N b , N t s ).
Reward: It can be seen from the previous analysis that the error of identification decreases with a decreasing output error. So we define the immediate reward as the root mean square error of test sets with Equation (26).
r = 1 N t n = 1 N t [ d t ( n ) y t ( n ) ] 2
where N t is the number of datasets, d t , and y t present the output of network and the real output on the testing set, respectively.

3.3.2. Hyperparameter Optimization Method

The hyperparameter optimization problem of the LSTM neural network is defined as a reinforcement learning problem with discrete state space and action space. Reinforcement learning methods for value function or policy approximation through neural networks, such as DQN, are also suitable for solving reinforcement learning problems with discrete action spaces. However, this kind of algorithm needs much learning to ensure convergence, and it is not suitable for these methods when the computing resources are very limited. Thus, the Q-Learning algorithm is selected in this paper to solve the problem.
Q-Learning is a temporal difference algorithm designed to solve the reinforcement learning problem. The optimal action policy π : s t a t can be obtained by maximizing the action value Q function Q s t , a t , which reflects the long-term impact of an action. The Q function is updated according to Equation (27):
Q s t , a t = 1 α Q s t , a t + α r t + γ max a t + 1 Q s t + 1 , a t + 1
where 0 < α < 1 is a learning rate, 0 < γ < 1 is the discount factor.
The structure of the LSTM hyperparameter optimization method based on Q-Learning is shown in Figure 5. The hyperparameter optimization process is as follows: first, the Q-value table is initialized with zeros, and the initial hyperparameter configurations (that is, initial state s 0 ) are randomly selected in the action space. Then, the action a t is chosen according to the e-greedy selection rule. Next, we perform the new action a t and the system acquires a new state s t + 1 and reward r t + 1 . Finally, the Q-value table is updated according to the formula. The round ends when the termination conditions are met.
To improve the optimization efficiency of hyperparameters, we improve the above methods. In each round of learning, in addition to randomly selecting the initial state in the first round, the agent selects the current best hyperparameter configuration as the initial state to start optimization. It can effectively avoid the ineffective exploration of the reinforcement learning agent between the poor-performing hyperparameter combinations.

3.4. Identification Algorithm

After introducing the model identification method based on LSTM and the hyperparameter optimization method based on improved Q-learning, we can obtain the complete process and framework of the method in this paper. The algorithm flow of hyperparameter optimization for the LSTM method of AUV model identification based on Q-Learning is shown in Table 3.
Figure 6 shows a detailed system model identification block diagram. Offline models can be obtained by offline training through real historical data. During the actual sailing, the system will regularly check the accuracy of the learned dynamics model. If the error is greater than σ pre-determined empirically and does not meet the requirements, the model will be retrained based on new real-time data.

4. Results

In order to verify the effectiveness of the proposed method, numerous experiments have been performed. The experiments were divided into two parts based on simulation data (see Section 4.1) and real data (see Section 4.2). Simulations were run on the model described in Section 2. The real data were acquired by the Sailfish AUV.

4.1. Results on Simulation Data

First, the validity of AUV system identification based on the LSTM neural network is verified. A set of hyperparameters is randomly set for the LSTM neural network, and a system identification experiment is carried out for the above AUV simulation system. In total, 4990 input and output data pairs are used for model identification, of which 4000 sets of data are used as training sets to train the LSTM neural network. The other 990 sets of data are used as validation sets to test the recognition effect of the model. The number of training is set to 100.
The change curve of the loss function during training is shown in Figure 7. As shown in the figure, the loss curve converges during training, and the value of the loss function decreases with iterations. From the perspective of the convergence of the loss curve, the application of the LSTM neural network can effectively realize the identification of the AUV system.
After 100 times of training, the validation dataset is brought into the network model to test. After calculation, the squared sum of the output error is 8.490711, and the mean and variance of the absolute value of the error are 0.1761263 and 0.0145471, respectively. We can see that the output of the LSTM network model can fit the actual output of the system after 100 times of training. However, the error between the network output and the real output is large. The fitting curve between the network output and the real output will be shown in the comparison results of different methods later.
The large output fitting error of the above LSTM neural network model is because the hyperparameter settings of the model are not suitable. The identification accuracy is greatly affected by the network hyperparameter settings. Inappropriate hyperparameter settings may even lead to non-convergence. In order to achieve high-precision system identification, a reinforcement learning algorithm is used to optimize the selection of the hyperparameters of the above neural network.
Data were divided into training and test sets during each test, yielding a training set of 4000 and a test set of 990. The proposed method was performed for 100 episodes. To demonstrate the optimization performance of the method, we compare the results of five different stages (proc1-0th episode, proc2-25th episode, proc3-50th episode, proc4-75th episode, and proc5-100th episode).
We can see from the above results that the five optimized LSTM neural network models can make the loss curve converge quickly, shown in Figure 8. It can be also noticed that the convergence speed increases as the episode increases, and this method has prominent optimization characteristics.
The statistical results of the output errors of the five groups of LSTM neural network models on the validation set are recorded in Table 4. In order to make the trend of MSE more apparent, it is enlarged 1000 times and displayed in Figure 9. As can be seen from the graph, the MAE, MSE and RMSE of the error of making predictions on the validation set decrease as the number of episodes increases. The above results also demonstrate the performance of the method from a statistical point of view.
In order to further illustrate the advantages of this method, we compared the effect of the method before and after optimization with the commonly used DR algorithm. We compare the prediction results on the validation set optimized after 0 episodes and 100 episodes with the results predicted by the DR algorithm. The fitting curves of the network output and the real output of the AUV’s position, linear velocity, angle, and angular velocity are shown in Figure 10, Figure 11, Figure 12 and Figure 13, respectively. It can be seen from the figure that the optimized LSTM model makes the output of the 12 variables of the identification model have better agreement with the actual system output. In order to more clearly reflect the effectiveness of the method proposed in this paper, the MAE, MSE, and RMSE indicators of the prediction error are shown in Table 5. Compared with the unoptimized approach and DR, our method shows superior performance, with less MAE (28.42%, 29.13%), MSE (38.66%, 38.96%), and RMSE (21.70%, 25.81%). It can be seen that the deviation between the output of our method and the actual system is smaller than that of the other two methods.

4.2. Results on Real Data

The AUV dataset is required for training and validating the LSTM neural network model identification method. Therefore, the experiments of data collection should be carried out first. The experiments were carried out on the Sailfish AUV, as shown in Figure 14.
The actual data includes the sensor’s noise in the acquisition process, and it is more difficult to obtain an accurate model than the simulation. Therefore, the method performed 200 episodes. The experimental dataset is divided into training and testing sets, yielding a training set of 4000 and a test set of 990. From the loss curve of the LSTM neural network in the training process, this method has prominent learning characteristics, as shown in Figure 15.
The fitting curve between the network output and the real output is shown in Figure 16, Figure 17, Figure 18 and Figure 19. The calculated MSE of the output error is 550.830181, and the MAE and RMSE values are 6.250734 and 23.469771, respectively. We can see that the LSTM neural network obtained by optimization can fit the system’s actual output and achieve high-precision recognition of AUVs. In order to illustrate the superiority of the method, it is compared with the commonly used dead reckoning (DR) method. The results are shown in Table 6, the proposed method provided 64.90% higher MAE, 64.20% higher MSE, and 37.76% higher RMSE than the DR method.
Due to the small prediction bias for the AUV motion state, It turns out, yet again, that the proposed method has high predictive power. Therefore, the proposed identification method is of great significance to the actual navigation control of AUV.

5. Conclusions

Aiming at the identification problem of the AUV system, this paper adopts a neural network hyperparameter optimization method based on Q-Learning. This method has been experimentally verified, and the conclusions can be summarized as follows:
1. The LSTM framework has the characteristics of natural Markovization, which can model time series data with high precision. It is found that the historical data of AUV implies the causal relationship of its dynamic model and has the characteristics of a hidden Markov model. The experimental results also show that the adopted method can predict the AUV model well.
2. Optimally selecting hyperparameters can significantly improve the efficiency of LSTMs in specific tasks. It is concluded that the improved Q-learning method can make the LSTM neural network realize the high-precision identification of the AUV system.
3. The offline training in the system model identification framework can reduce online learning time and ensure the security of the initial online use. The online learning model can also ensure its validity.
4. The proposed method has high model identification accuracy and has certain application prospects. We can apply this method to fault diagnosis of AUV, design of the model-based controller, and other aspects.
However, the hyperparameter optimization method only considers recognition accuracy. Our method can potentially be improved in the convergence speed. Moreover, the performance of the LSTM and the improved Q method used in this paper still has certain limitations. We will improve the identification method and optimization method later.

Author Contributions

Conceptualization, D.W.; methodology, D.W.; software, D.W.; validation, D.W.; formal analysis, B.H.; investigation, D.W.; resources, Y.S.; data curation, Y.S.; writing—original draft preparation, D.W.; writing—review and editing, D.W.; visualization, D.W.; supervision, J.W.; project administration, B.H.; funding acquisition, P.Q. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the National Key Research and Development Program of China (Project No.2016YFC0301400), the Fundamental Research Funds for the Central Universities (Project No.201961005), the National Natural Science Foundation of China (Project No.51379198), the National Natural Science Foundation of China (under grant No.51809246), and the National Natural Science Foundation of Shandong Province (under grant No.ZR2018QF003).

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

Not applicable.

Conflicts of Interest

The authors declare no conflict of interest.

References

  1. Fang, Y.; Huang, Z.; Pu, J.; Zhang, J. AUV position tracking and trajectory control based on fast-deployed deep reinforcement learning method. Ocean Eng. 2022, 245, 110452. [Google Scholar] [CrossRef]
  2. Praczyk, T. Using Neuro—Evolutionary Techniques to Tune Odometric Navigational System of Small Biomimetic Autonomous Underwater Vehicle—Preliminary Report. J. Intell. Robot. Syst. 2020, 100, 363–376. [Google Scholar] [CrossRef] [Green Version]
  3. Yuan, C.; Licht, S.; He, H. Formation Learning Control of Multiple Autonomous Underwater Vehicles With Heterogeneous Nonlinear Uncertain Dynamics. IEEE Trans. Cybern. 2018, 48, 2920–2934. [Google Scholar] [CrossRef] [PubMed]
  4. Qiao, L.; Zhang, W. Adaptive Second-Order Fast Nonsingular Terminal Sliding Mode Tracking Control for Fully Actuated Autonomous Underwater Vehicles. IEEE J. Ocean. Eng. 2019, 44, 363–385. [Google Scholar] [CrossRef]
  5. Min, F.; Pan, G.; Xu, X. Modeling of Autonomous Underwater Vehicles with Multi-Propellers Based on Maximum Likelihood Method. J. Mar. Sci. Eng. 2020, 8, 407. [Google Scholar] [CrossRef]
  6. Deng, F.; Levi, C.; Yin, H.; Duan, M. Identification of an Autonomous Underwater Vehicle hydrodynamic model using three Kalman filters. Ocean Eng. 2021, 229, 108962. [Google Scholar] [CrossRef]
  7. Wu, B.; Han, X.; Hui, N. System Identification and Controller Design of a Novel Autonomous Underwater Vehicle. Machines 2021, 9, 109. [Google Scholar] [CrossRef]
  8. Wang, D.; He, B.; Shen, Y.; Li, G.; Chen, G. A Modified ALOS Method of Path Tracking for AUVs with Reinforcement Learning Accelerated by Dynamic Data-Driven AUV Model. J. Intell. Robot. Syst. 2022, 104, 1–23. [Google Scholar] [CrossRef]
  9. Bresciani, M.; Costanzi, R.; Manzari, V.; Peralta, G.; Terracciano, D.S.; Caiti, A. Dynamic parameters identification for a longitudinal model of an AUV exploiting experimental data. In Proceedings of the Global Oceans 2020: Singapore—U.S. Gulf Coast, Biloxi, MS, USA, 5–30 October 2020; pp. 1–6. [Google Scholar]
  10. Jiang, C.m.; Wan, L.; Sun, Y.S. Design of motion control system of pipeline detection AUV. J. Cent. South Univ. 2017, 24, 637–646. [Google Scholar] [CrossRef]
  11. Wang, J.G.; Jiang, C.M.; Sun, Y.S.; He, B.; Li, J.Q. Neural network identification of underwater vehicle by hybrid learning algorithm. Zhongnan Daxue Xuebao (Ziran Kexue Ban)/J. Cent. South Univ. (Sci. Technol.) 2011, 42, 427–431. [Google Scholar]
  12. Muñoz Palacios, F.; Cervantes Rojas, J.S.; Valdovinos, J.; Sandre Hernandez, O.; Salazar, S.; Romero, H. Dynamic Neural Network-Based Adaptive Tracking Control for an Autonomous Underwater Vehicle Subject to Modeling and Parametric Uncertainties. Appl. Sci. 2021, 11, 2797. [Google Scholar] [CrossRef]
  13. Kim, D.; Park, M.; Park, Y.L. Probabilistic Modeling and Bayesian Filtering for Improved State Estimation for Soft Robots. IEEE Trans. Robot. 2021, 37, 1728–1741. [Google Scholar] [CrossRef]
  14. Zhang, T.; Zheng, X.Q.; Liu, M.X. Multiscale attention-based LSTM for ship motion prediction. Ocean Eng. 2021, 230, 109066. [Google Scholar] [CrossRef]
  15. Shahi, T.B.; Shrestha, A.; Neupane, A.; Guo, W. Stock Price Forecasting with Deep Learning: A Comparative Study. Mathematics 2020, 8, 1441. [Google Scholar] [CrossRef]
  16. Yam, L.; Yan, Y.; Jiang, J. Vibration-based damage detection for composite structures using wavelet transform and neural network identification. Compos. Struct. 2003, 60, 403–412. [Google Scholar] [CrossRef]
  17. Dahunsi, O.; Pedro, J. Neural Network-Based Identification and Approximate Predictive Control of a Servo-Hydraulic Vehicle Suspension System. Eng. Lett. 2010, 18, 357. [Google Scholar]
  18. Liu, J.; Du, J. Composite learning tracking control for underactuated autonomous underwater vehicle with unknown dynamics and disturbances in three-dimension space. Appl. Ocean Res. 2021, 112, 102686. [Google Scholar] [CrossRef]
  19. Dong, X.; Shen, J.; Wang, W.; Shao, L.; Ling, H.; Porikli, F. Dynamical Hyperparameter Optimization via Deep Reinforcement Learning in Tracking. IEEE Trans. Pattern Anal. Mach. Intell. 2019, 43, 1515–1529. [Google Scholar] [CrossRef]
  20. Mercangöz, M.; Cortinovis, A.; Schönborn, S. Autonomous Process Model Identification using Recurrent Neural Networks and Hyperparameter Optimization. IFAC-PapersOnLine 2020, 53, 11614–11619. [Google Scholar] [CrossRef]
  21. Sena, M.; Erkilinc, M.S.; Dippon, T.; Shariati, B.; Emmerich, R.; Fischer, J.K.; Freund, R. Bayesian Optimization for Nonlinear System Identification and Pre-Distortion in Cognitive Transmitters. J. Light. Technol. 2021, 39, 5008–5020. [Google Scholar] [CrossRef]
  22. Baker, B.; Gupta, O.; Naik, N.; Raskar, R. Designing Neural Network Architectures using Reinforcement Learning. arXiv 2016, arXiv:1611.02167. [Google Scholar]
  23. Chen, S.; Wu, J.; Liu, X. EMORL: Effective multi-objective reinforcement learning method for hyperparameter optimization. Eng. Appl. Artif. Intell. 2021, 104, 104315. [Google Scholar] [CrossRef]
  24. Liu, X.; Wu, J.; Chen, S. A context-based meta-reinforcement learning approach to efficient hyperparameter optimization. Neurocomputing 2022, 478, 89–103. [Google Scholar] [CrossRef]
  25. Mnih, V.; Kavukcuoglu, K.; Silver, D.; Graves, A.; Antonoglou, I.; Wierstra, D.; Riedmiller, M.A. Playing Atari with Deep Reinforcement Learning. arXiv 2013, arXiv:1312.5602. [Google Scholar]
  26. Schulman, J.; Wolski, F.; Dhariwal, P.; Radford, A.; Klimov, O. Proximal Policy Optimization Algorithms. arXiv 2017, arXiv:1707.06347. [Google Scholar]
  27. Watkins, C.; Dayan, P. Technical Note: Q-Learning. Mach. Learn. 1992, 8, 279–292. [Google Scholar] [CrossRef]
  28. Nouri, N.M.; Valadi, M.; Asgharian, J. Optimal input design for hydrodynamic derivatives estimation of nonlinear dynamic model of AUV. Nonlinear Dyn. 2018, 92, 139–151. [Google Scholar] [CrossRef]
  29. Prestero, T. Verification of a Six-Degree of Freedom Simulation Model for the REMUS Autonomous Underwater Vehicle. Ph.D. Thesis, Massachusetts Institute of Technology, Cambridge, MA, USA, 2011. [Google Scholar]
  30. Rashid, T.; Hassan, M.; Mohammadi, M.; Fraser, K. Improvement of Variant Adaptable LSTM Trained With Metaheuristic Algorithms for Healthcare Analysis. In Research Anthology on Artificial Intelligence Applications in Security; Information Resources Management Association: Hershey, PA, USA, 2021; pp. 1031–1051. [Google Scholar]
  31. Rashid, T.A.; Fattah, P.; Awla, D.K. Using Accuracy Measure for Improving the Training of LSTM with Metaheuristic Algorithms. Procedia Comput. Sci. 2018, 140, 324–333. [Google Scholar] [CrossRef]
  32. Hochreiter, S.; Schmidhuber, J. Long Short-Term Memory. Neural Comput. 1997, 9, 1735–1780. [Google Scholar] [CrossRef]
  33. Sutton, R.; Barto, A. Reinforcement Learning: An Introduction. IEEE Trans. Neural Netw. 1998, 9, 1054. [Google Scholar] [CrossRef]
Figure 1. AUV body-fixed and earth-fixed coordinate systems.
Figure 1. AUV body-fixed and earth-fixed coordinate systems.
Jmse 10 01002 g001
Figure 2. The structure of the LSTM neural network.
Figure 2. The structure of the LSTM neural network.
Jmse 10 01002 g002
Figure 3. One-step AUV dynamic model based on LSTM.
Figure 3. One-step AUV dynamic model based on LSTM.
Jmse 10 01002 g003
Figure 4. The basic structure of reinforcement learning.
Figure 4. The basic structure of reinforcement learning.
Jmse 10 01002 g004
Figure 5. The Structure of hyperparameter optimization method based on Q-Learning.
Figure 5. The Structure of hyperparameter optimization method based on Q-Learning.
Jmse 10 01002 g005
Figure 6. Block diagram of AUV model identification.
Figure 6. Block diagram of AUV model identification.
Jmse 10 01002 g006
Figure 7. The result of loss before optimization.
Figure 7. The result of loss before optimization.
Jmse 10 01002 g007
Figure 8. Loss results at different stages.
Figure 8. Loss results at different stages.
Jmse 10 01002 g008
Figure 9. Error results at different stages.
Figure 9. Error results at different stages.
Jmse 10 01002 g009
Figure 10. Identification result of AUV position.
Figure 10. Identification result of AUV position.
Jmse 10 01002 g010
Figure 11. Identification result of AUV velocity.
Figure 11. Identification result of AUV velocity.
Jmse 10 01002 g011
Figure 12. Identification result of AUV angular velocity.
Figure 12. Identification result of AUV angular velocity.
Jmse 10 01002 g012
Figure 13. Identification result of AUV attitude angle.
Figure 13. Identification result of AUV attitude angle.
Jmse 10 01002 g013
Figure 14. Sailfish AUV platform during the experiment.
Figure 14. Sailfish AUV platform during the experiment.
Jmse 10 01002 g014
Figure 15. The training loss curve of the proposed LSTM neural network.
Figure 15. The training loss curve of the proposed LSTM neural network.
Jmse 10 01002 g015
Figure 16. Identification result of AUV position.
Figure 16. Identification result of AUV position.
Jmse 10 01002 g016
Figure 17. Identification result of AUV velocity.
Figure 17. Identification result of AUV velocity.
Jmse 10 01002 g017
Figure 18. Identification result of AUV angular velocity.
Figure 18. Identification result of AUV angular velocity.
Jmse 10 01002 g018
Figure 19. Identification result of AUV attitude angle.
Figure 19. Identification result of AUV attitude angle.
Jmse 10 01002 g019
Table 1. The parameters of the AUV model.
Table 1. The parameters of the AUV model.
DefinitionSymbolUnitNumerical Value
Massmkg73
Diameterdmm210
Lengthlm2.2
Center of gravity x G , y G , z G mm 0 , 0 , 0.02
Center of buoyancy x B , y B , z B mm 0 , 0 , 0
Moment of inertia I x , I y , I z kg·m 2 0.4 , 24.5 , 24.5
Table 2. The optimized hyperparameters and variation ranges.
Table 2. The optimized hyperparameters and variation ranges.
HyperparameterVariation Range
The number of neurons (1) ( 50 , 100 , 150 , 200 )
The number of neurons (2) ( 50 , 100 , 150 , 200 )
Batch size ( 8 , 16 , 32 , 64 )
Time step ( 5 , 10 , 15 , 20 )
Table 3. AUV Model Identification Algorithm.
Table 3. AUV Model Identification Algorithm.
AUV Model Identification Algorithm
Given a training datasets, including input and output datasets
do for episode in 1 to count
      Network initialization: Initialize neural network hyperparameters
      do for t in 1 to T
            Choose optimized hyperparameters
            Calculate network output
            Calculate the error between the true output and the output of the network
            Update neural network
            Update Q table
      end for
end for
Table 4. Results of LSTM neural network hyperparameter optimization.
Table 4. Results of LSTM neural network hyperparameter optimization.
EpisodeHyperparameter CombinationMean Absolute ErrorMean Squared ErrorRoot Mean Square Error
0 ( 50 , 50 , 8 , 10 ) 0.001784 7.167763 × 10 6 0.002677
25 ( 100 , 100 , 32 , 15 ) 0.001693 7.037789 × 10 6 0.002652
50 ( 100 , 150 , 8 , 20 ) 0.001503 6.737789 × 10 6 0.002302
75 ( 150 , 50 , 32 , 15 ) 0.001304 6.397751 × 10 6 0.002129
100 ( 50 , 100 , 8 , 10 ) 0.001277 4.396948 × 10 6 0.002096
Table 5. Error comparison of different methods.
Table 5. Error comparison of different methods.
MethodMAEMSERMSE
Unoptimized 0.001784 7.167763 × 10 6 0.002677
DR 0.001802 7.203546 × 10 6 0.002825
Optimized method 0.001277 4.396948 × 10 6 0.002096
Table 6. Error comparison of different methods.
Table 6. Error comparison of different methods.
MethodMAEMSERMSE
Proposed method 6.250734 206.830181 23.469771
DR 17.808035 550.071027 37.705304
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Share and Cite

MDPI and ACS Style

Wang, D.; Wan, J.; Shen, Y.; Qin, P.; He, B. Hyperparameter Optimization for the LSTM Method of AUV Model Identification Based on Q-Learning. J. Mar. Sci. Eng. 2022, 10, 1002. https://doi.org/10.3390/jmse10081002

AMA Style

Wang D, Wan J, Shen Y, Qin P, He B. Hyperparameter Optimization for the LSTM Method of AUV Model Identification Based on Q-Learning. Journal of Marine Science and Engineering. 2022; 10(8):1002. https://doi.org/10.3390/jmse10081002

Chicago/Turabian Style

Wang, Dianrui, Junhe Wan, Yue Shen, Ping Qin, and Bo He. 2022. "Hyperparameter Optimization for the LSTM Method of AUV Model Identification Based on Q-Learning" Journal of Marine Science and Engineering 10, no. 8: 1002. https://doi.org/10.3390/jmse10081002

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop