NN-Based Parallel Model Predictive Control for a Quadrotor UAV

: A novel neural network (NN)-based parallel model predictive control (PMPC) method is proposed to deal with the tracking problem of the quadrotor unmanned aerial vehicles (Q-UAVs) system in this article. It is well known that the dynamics of Q-UAVs are changeable while the system is operating in some speciﬁc environments. In this case, traditional NN-based MPC methods are not applicable because their model networks are pre-trained and kept constant throughout the process. To solve this problem, we propose the PMPC algorithm, which introduces parallel control structure and experience pool replay technology into the MPC method. In this algorithm, an NN-based artiﬁcial system runs in parallel with the UAV system to reconstruct its dynamics model. Furthermore, the experience replay technology is used to maintain the accuracy of the reconstructed model, so as to ensure the effectiveness of the model prediction algorithm. Furthermore, a convergence proof of the artiﬁcial system is also given in this paper. Finally, numerical results and analysis are given to demonstrate the effectiveness of the PMPC algorithm.


Introduction
In the past decade, with the development of sensor and network technologies, Q-UAVs have attracted more and more attention from researchers. Due to the characteristics of efficient deployment and flexible mobility, Q-UAVs are widely used in the fields of nuclear power plant inspections [1,2], forest fire fighting [3,4] and land surveying [5,6]. However, it is challenging to design an effective control for the Q-UAVs system, because of the problems of under-actuating, strong non-linearity, and coupling. Recently, numerous algorithms have been developed by researchers under the framework of traditional control, such as feedback linearization control [7,8], fuzzy logic-based control [9,10], and sliding mode control(SMC) [11,12]. In [13], a non-linear PID controller is designed to deal with the tracking control problem of the Q-UAVs system, where the system energy is considered. A fuzzy-based backstepping SMC algorithm is developed to deal with the tracking problem of the UAVs with parameter uncertainties and external disturbances in [14]. In [15], a secondorder SMC algorithm has been introduced to design the Q-UAV controller; however, the optimization characteristics, as an essential part of the control domain, are not considered in the above-mentioned traditional control algorithms.
It is well known, MPC is widely used in many fields as an effective optimal control method, such as the robot control [16,17], autonomous driving [18,19], and energy management [20,21], etc. Under the MPC algorithm scheme, the future system states and behaviors are predicted by the model of the plant, and the control law is optimized based on the analysis and evaluation of the prediction data. Due to obtaining the optimal control by minimizing the performance cost function which is subject to constraints, MPC also has a good performance in the robustness control. Many researchers have also introduced the MPC algorithm into the field of Q-UAV control. In [22], a novel non-linear MPC is proposed to deal with the navigation problem where obstacle avoidance is considered. According to the MPC algorithm scheme, it can be seen that the implementation of the MPC algorithm relies on the accuracy of the prediction model. MPC algorithm is applied to find the optimal paths in the atmosphere with the maximum of the UAV's energy [23]. In [24], a novel adaptive MPC scheme is established for the angular rate and thrust control of a Q-UAV, where the thrust bound and the disturbances are considered; however, it is difficult to accurately model the dynamics of Q-UAVs due to the high complexity of the system.
Neural networks are widely used to approximate the unknown function for its universal approximation characteristics [25][26][27][28]. Under the MPC scheme, NN technology is introduced to approximate the dynamics model of the Q-UAVs. Many researchers have also made many contributions in this field. In [29], a novel reinforcement learning-based MPC algorithm is developed to deal with the tracking control problem with thrust vectoring capabilities. An offline learning approach is applied to learning the dynamics model of the hybrid Q-UAV in [30]. A Q-UAV controller is designed based on the learning-based MPC algorithm, which is used to provide levels of guarantees about safety, robustness, and convergence in [31]. Alessandro developed an active learning algorithm to deal with the control problem where the model of the Q-UAV is uncertainty-aware [32]. Several novel meta-learning approaches have been developed to approximate the distribution over different "tasks" [33,34]. In the research mentioned above, the NN model is pre-trained before the system runs, and the weight matrices are kept constant during the system operation. However, in some specific environments, such as nuclear radiation and forest fire environments, the dynamics model of the Q-UAVs changes as the system operates. The algorithms mentioned above are unreliable in this case.
Motivated by the above-mentioned problems, we propose a novel NN-based PMPC algorithm. The main contributions are summarized as follows: (1) Different from the exiting results [29], the model NN is not only used to approximate the position model of the Q-UAVs but also the whole dynamics model of the Q-UAVs. With our PMPC algorithm, the dynamics model of the Q-UAVs is fully unknown and which will make the optimal problem more difficult to be solved. (2) Different from the exiting results [30], the model NN is executed in parallel with the system and is continuously updated with the system runs in the PMPC algorithm. Compared with the unchangeable NN in [30,33,34], our algorithm can be applied in the condition where the dynamics model is changeable during the system running. (3) The experiment replay technology is introduced to the PMPC algorithm, which is used to maintain the accuracy of the reconstructed model to ensure the effectiveness of the model prediction algorithm.
The remainder of this article is organized as follows. The problem is formulated in Section 2. In Section 3, the parallel structure and the details of the PMPC algorithm are introduced. Numerical results and analysis are given to demonstrate the effectiveness of the PMPC algorithm in Section 4. Finally, the conclusion is presented in Section 5.

Dynamic Models of Q-UAVs
The dynamic models of Q-UAVs have been studied as a system with six degrees of freedom. Under the low speeds assumption, a simple, rigid-body model [35] of Q-UAVs is defined asẍ ] T indicate the state vector and control input of the system, respectively. Furthermore, as shown in Figure 1, P = [x, y, z], and Θ = [φ, θ, ψ] denote the position and Euler angles in the inertial frame Γ E , respectively; V = [ẋ,ẏ,ż] and V a = [φ,θ,ψ] denote the velocity of the three axes and the angular velocity of three Euler angles, respectively; I x , I y , and I z are the moments of inertia of the Q-UAV around three axes, respectively; and m is the mass of the Q-UAV; L is the length from the rotors to the center of mass; J R and Ω R are the moments of inertia and angular velocity of the propeller blades. Moreover, U 1 , U 2 , U 3 , and U 4 are the forces generated by the four propellers.

Model Predictive Control
Consider a non-linear system defined as where x k and U k are denoted the state and the control input of the system at the time step k, respectively. Select the n p as the finite prediction horizon steps, MPC is applied to minimize the performance cost over the n p steps at each time step k. The optimized control sequence U = [U k , U k+1 , · · · , U k+n p −1 ] can be obtained by solving the optimal function. The first term in the optimized control sequence is applied in the process and the rest will be applied to solve the optimal control function at the next time step k + 1, and so on.

Problem Statement
Under the traditional MPC scheme, the performance of the MPC algorithm is highly dependent on the accuracy of the model. However, the above model is constructed without consideration of the influences of some other sources, such as air drag, the gyroscope effect, and the hub force. To overcome the difficulty of the inaccurate model, a novel NN-based parallel control method is proposed in this paper.

Parallel MPC Method
In this section, the PMPC algorithm is introduced to obtain the optimal controller to deal with the tracking control problem for the Q-UAVs with a dynamic dynamics model.

Algorithm Structure
In the PMPC algorithm, an artificial system is introduced to expand the real problem, which runs in parallel with the real system. The artificial space, between the artificial and real system, is introduced to solve the expanded real problem. The structure of the parallel MPC method can be shown in Figure 2.
From Figure 2, the parallel MPC method can be divided into three steps. Firstly, an artificial system is introduced to rebuild the dynamic model of the real system by learning the data observed from the actual system. The artificial system can keep the accuracy of the model with periodic learning of the data in the experiment pool. This is used to ensure the accuracy of the predictive experiments throughout the actual system run. Secondly, based on the artificial system, predictive experiments are performed to analyze the behavior of the Q-UAV system and evaluate the performance of the control laws. Based on the evaluated results, the optimal control law is updated. Thirdly, the appropriate control is applied to the real system with the interactional execution between the artificial and real systems.

Real Controller
Real Dynamics

Artificial System
Due to it being difficult to obtain accurate dynamic models of Q-UAVs, an artificial system, based on neural networks, is introduced to rebuild the dynamic model of the real system.
The discrete-time dynamic model of the Q-UAVs can be generally written as [36] where X k and U k are the system state and control input at k instant. According to the universal approximation property of the neural network, the ideal neural network representation of the system (3) can be written as where T k = [S k , U k ] T , W * m and mk are the neural network input, the ideal weight matrix, and the reconstruction error, respectively. Furthermore, Ψ(·) and Φ(·) are the activation functions.
The artificial system can be defined as: which also can be written in the form of a neural network whereŴ m is the approximation weight matrix of W * m . The training process of the artificial system is expressed as Algorithm 1.

Algorithm 1 Neural Networks-Based Artificial System
Initialization: 1: Collect the data set as where N is a large positive integer. 2: Create a neural network as (6). 3: Select the accuracy of modeling ξ. Training: 4: Calculate the training error as 5: Adjust the weights to minimize the following error 6: Update the weight matrix with the following function: with l m denoting the learning rate. 7: Until the following function is satisfied, return the weight matrix W m .
Next, we will prove the convergence of the artificial system. Before proceeding, the following assumptions are necessary. Theorem 1. Let the artificial system be defined as (6), and let the weight matrixŴ m be updated according to (10). If Assumptions 1 and 2 hold, then the system identification error X k is asymptotically stable and the error matrices W m and both converge to zero, as k → ∞.
Proof. Consider the artificial system (6), and select the Lyapunov function as The difference of (13) can be written as Substituting (6) and (10) into (14), we can obtain According to the Cauchy-Schwarz inequality, we have Based on the Assumption 2, we can obtain Select l m to satisfy the condition as Then, we have ∆L m < 0, which means the system identification error S k is asymptotically stable and the error matrices W m and both converge to zero, as k → ∞.

Predictive Experiments
After reconstructing the actual system, predictive experiments are executed to gain the optimal control of the artificial system (5). In the predictive experiments, the neural networks are applied to predict the state of the artificial system in future N steps. Thus, the system (5) can be rewritten as where Z k denote the predictive state of the system (5). Let the control sequence U N−1 k defined as U N−1 k = {U k , U k+1 , · · · , U k+N−1 } to be the predictive control in future N steps. Furthermore, the utility function is defined as where e k = S k − S d k with S d k denote the target state at k instant; Q and R are semi-positive definite matrices with suitable dimensions. Thus, the cost function can be defined as Then, the optimal control sequence can be obtained as However, to maintain the safety and stability of the flight process, the following constraint should be applied in the optimization.

Input Constraint
In the parallel control scheme, the limits of the input of the Q-UAV is considered for the safety of the flight process. The constraint for the Q-UAV can be defined as where the U min and U max denote the minimum and maximum of the input vector at the instant k.

Velocity Constraint
In the algorithm proposed in this article, the velocity of the Q-UAV is limited, which can be defined asẋ min ≤ẋ ≤ẋ maẋ y min ≤ẏ ≤ẏ maẋ z min ≤ż ≤ż max (24) whereς max andς min denote the minimum and maximum of the velocity of the Q-UAV in ς axis.

Angle Velocity Constraint
In this article, the limits of the angular velocities are also considered in the proposed algorithm, which can be defined as follows.
Then, the parallel predictive experiment method under the constraints can be expressed as Algorithm 2.

Initialization:
1: Select the semi-positive matrices P and Q to construct the cost function as (21). 2: Select the positive constant N as the control horizon. Rolling Prediction: 3: Collect data with the artificial system (6) based on current control law. 4: Calculate the cost function by (21). 5: Obtain the optimized control sequence U according to the function (22) with the constraint (23). 6: Implement the first element of the control sequence into the artificial system (6).

Parallel Execution
In the above section, the predictive experiments are executed to gain the optimal control of the artificial system (5), which aims to control the discrete-time dynamic model (3) of the Q-UAVs. However, for the real Q-UAV system, it is difficult to rebuild the system dynamics in the whole time horizon by a single artificial system, because the system function is complex and unknown. Thus, to keep the accuracy of the model, the dataset in (7) is updated with time, which means the dataset always stores the latest N data.
Combining the above sections, the Parallel MPC method can be expressed as Algorithm 3.

Initialization:
1: Select the positive integer N as the maximum length of the experience pool. Store the last state X k−1 , control input U k−1 and the system state X k , as a data pair,into the experience pool. 9: else 10: Remove the first data pair from the experience pool. 11: Store the newest data pair in the experience pool. 12: end if 13: Training the artificial system with the experience pool until (Ŵ m,k−1 − W m,k ) T (Ŵ m,k−1 −Ŵ m,k ) + X T k X k ≤ ζ is satisfied. 14: Predict the system state based on the artificial system in the control horizon N. 15: Calculate the cost value based on the function (21) for every predictive instant. 16: Calculate the optimal control based on the cost value. 17: Applying the first optimal control into the actual system.

Simulation Results
In this section, to demonstrate the effectiveness of the PMPC algorithm for UAVs, a comparison with traditional MPC is performed. The initial mass of the Q-UAV is chosen as m = 6.98 × 10 −2 kg; the moment of the inertia of the Q-UAV around the axes are chosen as I x = I y = I z = 3.4 × 10 −3 kg·m 2 , respectively; the moments of inertia of the propeller blades are chosen as J R = 1.302 × 10 −6 kg·m 2 ; the length from the rotors to the center of mass is chosen as L = 1.17 × 10 −1 m; and the acceleration of gravity is chosen as 9.8 m/s 2 .
The parameters in this simulation are chosen as Q = 10 0 0 10 R = 1 0 0 1 (26) and the learning rate of the modelNN is chosen as lr = 0.01, the activation function Ψ(·) and Φ(·) are defined as sigmoid function. To reflect the internal dynamic uncertainties of the UAV dynamic model, 5% parameter inaccuracies are assumed during the model prediction process. Some constraints in this simulation are shown as Figure 3.
The target trajectory is chosen as It is worth mentioning that the target Euler angles are defined to keep track of the target trajectory in this simulation. Furthermore, the constraints of the position velocity in this simulation are chosen as 1 m/s and the angular velocity is chosen as π/15 rad/s. The rotating speed of rotors omega is bounded in the range of [110, 860] r/s. As shown in Figure 4, under the PMPC algorithm, the UAV tracked the target trajectory in 15 seconds. However, when it is under traditional MPC control, it cannot follow the target trajectory if the dynamics model is changeable. This means the PMPC algorithm has better performance when dealing with the tracking control problem with a dynamic dynamics model. To demonstrate the robustness of PMPC algorithm, the external disturbances are added at 20 s, and the results are shown in Figure 5. From Figure 5, the system can track the target trajectory in several seconds under the PMPC algorithm.

Conclusions
In this article, a novel NN-based PMPC optimal tracking control algorithm is proposed for the UAV under a dynamic dynamics model. Due to the dynamics model of the UAV being changeable, which is difficult to deal with for traditional MPC algorithms, a neural network with experiment reply technology is applied to approximate the dynamic model as the system runs. Then, a parallel structure is constituted by the NN-based artificial system and the real system. Under the parallel structure, the MPC algorithm is applied to predict the future states of the artificial system and obtain the optimal control for the tracking control problem. Finally, a simulation is applied to demonstrate the effectiveness of the PMPC algorithm.

Conflicts of Interest:
The authors declare no conflict of interest.

Nomenclature
All symbols and their meanings are shown in the following Table   Symbols Meanings of the symbols x The position o f the U AVs on the axis x y The position o f the U AVs on the axis y z The The actual state o f the real system U k The actual control input o f the real system S k The state o f the arti f icial system T k The input o f the arti f icial system Ψ(·),Φ(·) The activation f unctions o f the NN Z k The prediction state o f the arti f icial system N The length o f the experiment pool