Robust Control ofUnderwaterVehicle-Manipulator SystemUsing GreyWolf Optimizer-Based Nonlinear Disturbance Observer and H-Infinity Controller

)is paper proposes a new trajectory tracking scheme for the constrained nonlinear underwater vehicle-manipulator system (UVMS). For overcoming the unmodeled uncertainties, external disturbances, and constraints of control inputs in the operation of UVMS, a modified constrained H∞ controller with a basic computed-torque controller (CTC) and a new designed nonlinear disturbance observer (NDO) are proposed. )e CTC gives the nominal model-based control. )e NDO is designed based on the system dynamics and used to online provide the estimation of the lumped disturbances. However, the designed NDO is an observer of biased estimation, i.e., it has a blind domain of disturbance estimation which cannot be rejected. In order to reject the biased estimation, the modified constrained H∞ controller is designed but with new features. To the best of our knowledge, the conventional H∞ robust controller is generally designed by calculating the Riccati equation offline and ignoring the constraints of control inputs made by the physical actuators, which are poor in handling the time-varying environment. In order to solve these issues, the modified constrained H∞ robust controller online optimized by grey wolf optimizer (GWO) is designed to ensure the control system has a compensation of the biased estimation, a satisfied constrained control input, and a fast calculation. In this paper, we modify the prior method of offline calculating the Riccati equation of the conventional H∞ robust controller to be an online optimization scheme and proposed a new constrained evaluation function. )e new constrained evaluation function is online optimized by the GWO, which can both find out the constrained suboptimal control actions and compensate the biased estimation of the NDO for the UVMS. )e whole system stability is proved. )e effectiveness of the fast online calculation, tracking accuracy, and lumped disturbances rejection is shown by a series of UVMS simulations.


Introduction
Nowadays, UVMS is widely used in various scenarios, especially in dangerous situations such as deep seas and oceans, which has its advantages of independent and autonomous operation. As the UVMS has the arm around it, it can finish many tasks such as underwater grasping, transportation, and salvage applications [1]. But, these applications are inseparable from the control schemes of UVMS. erefore, designing an underwater robust controller for the UVMS is an urgent need. As we all know, modeling is the most difficult problem in the design of the control system. It is impossible to model accurately. Besides, the external disturbances such as underwater undercurrent always exist in the working environment, which will be more challenging for the robust controller design [2][3][4]. All these factors will cause serious positives to the overall stability of the control system. For overcoming these unmodeled uncertainties and disturbances, it is necessary to design an adaptive robust control scheme for the UVMS.
On control schemes of underwater robot currently, many scholars have proposed a variety of robust control methods. In [5][6][7], the improved robust nonlinear proportional-integral-derivative (PID) methods are used to control the underwater robots with unknown system dynamics. In [8][9][10][11][12][13], the schemes based on the improved chattering-free sliding mode control (SMC) are used to control the underwater robot system with the unmodeled uncertainties and external disturbances. In [14][15][16][17][18][19][20][21][22][23], the adaptive control schemes designed on the particular structure of the underwater robot are used to realize the trajectory tracking control with system disturbances. In [24,25], the adaptive backstepping control schemes are designed by an inverse procedure of making system robust stable which is used to make the UVMS track the desired trajectory. In [26][27][28][29], the neural network control schemes based on the deep learning have been used to robustly control the underwater robots with environmental disturbances. Overall, however, most of the above theories generally ignore that the system needs to reach its optimal or suboptimal control when facing the disturbances. Simultaneously, the constraints in the control inputs are needed to be considered due to the ultimate capacity of actuators in practice.
Robust control is a kind of optimal control method developed in 1980s [30], which is especially designed for control systems with the disturbances. Currently, there are many research studies on robust control of robots such as Rigatos et al. [31] have proposed the adaptive H ∞ controller for controlling the robotic manipulators, Zhang et al. [32] applied the H ∞ controller to drive the underwater vehicle, Makarov et al. [33] have proposed a H ∞ control scheme for motion control of multiple-link elastic-joint robots with motor sensors in presence of model uncertainties, Alfia et al. [34] designed a robust H ∞ controller to control a container ship in a way-point tracking, and Chen et al. [35] applied a robust H ∞ controller to control micro-electro-mechanical systems (MEMS). However, in the underwater environment, a single H ∞ controller cannot complete controlling with the time-varying disturbances. erefore, a NDO is necessary to assist the H ∞ controller. In real practical applications, the NDOs are all on a based estimation [36], i.e., the bias from the disturbance compensation always exists. Meanwhile, the prior H ∞ control methods are in trouble of computing the Riccati equation online [37]. Since the Riccati equation is nonlinear, it is usually difficult to directly solve, especially for large-size matrices. For overcoming the complexity of calculation, many efficient algorithms have been developed to numerically offline approximate the solution of the Riccati equation, such as [32][33][34]. One of such typical algorithms was developed by Kleinman [38]. But, the Kleinman approximation solution of the Riccati equation is based on the offline reinforcement learning, which has the policy iteration problem of slow convergence [39]. Moreover, constraints in the control inputs are less considered in the design of the control scheme; practical actuators are often physically limited. Overall, to the best of our knowledge, there are few theories for the trajectory tracking control of UVMS considering both an online calculating constrained H ∞ controller and the biased estimation of the NDO into the whole system. Different from our prior work [4], in which the conventional H ∞ robust controller was simply designed by calculating the Riccati equation offline and ignoring the constraints of control inputs made by the physical actuators. Inspired by the above documents, a novel adaptive robust control scheme which consists of a computed-torque controller (CTC), an online modified constrained H ∞ controller, and a designed nonlinear disturbance observer (NDO) is proposed. e main contributions of this paper are summarized as follows: (1) A nominal dynamic model-based CTC controller is used to give the basic control of the UVMS. (2) A NDO is designed based on the internal dynamics of UVMS, which can be used to online estimate the time-varying disturbances. (3) As the estimation of the NDO has a bias from the real disturbances, we modify the evaluation function of the conventional H ∞ controller to be a new one (i.e., a modified H ∞ controller), which can be used to reject the compensation bias of disturbances by its optimization. Simultaneously, the constraints in the control inputs are considered into the new evaluation function. (4) e new constrained evaluation function of the modified H ∞ controller is online optimized by a recently developed GWO algorithm [40][41][42]. e motivation of using the GWO is to provide a fast online calculation and stable convergence of the proposed control scheme, making the UVMS resist all kinds of disturbances in the working environment. is paper is organized as follows. In Section 2, the definitions used in the following sections are defined. In Section 3, the dynamic control system of the UVMS is established. In Section 4, we design a NDO for the UVMS. In Section 5, we formulate our proposed GWO-based online optimization control scheme for the UVMS. In Section 6, the stability analysis of our proposed control scheme has been proved. In Section 7, the GWO algorithm is introduced in detail. In Section 8, the detailed algorithm of the online optimizing robust control scheme for the UVMS is given, and the control system structure of the online robust control scheme optimized by the GWO algorithm is depicted. In Section 9, simulations are verified. Finally, the conclusions are summarized in Section 10.

Notations
Some definitions used in the following sections are summarized as follows. e identity matrix of an arbitrary dimension is denoted by I. A block diagonal matrix with matrices X 1 , X 2 , . . . , X n on its main diagonal is denoted by diag X 1 , X 2 , . . . , X n . Denote the Euclidean norm ‖x‖ 2 ≔ x T x and the weighted norm ‖x‖ 2 W ≔ x T Wx. λ min and λ max , respectively, mean the smallest and largest eigenvalue of the corresponding matrix.

Construction of the Dynamic Control
System of UVMS where M(q) is the symmetric and positive-definite inertia matrix including added mass terms, C(q, _ q) is the Coriolis and centripetal forces, D(q, _ q) is the damping forces, G(q) is the gravity and buoyancy forces, and F(q, _ q) is the force of interaction between the vehicle and the manipulator. Also, we have the following property.
Moreover, τ d is the lumped disturbances including the unmodeled uncertainties and external disturbances, τ is the forces/moments/torques acting on the vehicle as well as joint, which is always physically bounded by τ ≔ τ‖τ j | < T max , j � 0, 1, . . . , 6 + n| (T max is a constant). e general notation of the positions q in the UVMS is described by the following vectors: where each vector consists of underwater vehicle q v and underwater manipulator q m , and their velocities and accelerations are _ q and € q, respectively.

The Design of the NDO for UVMS
e designed NDO is based on the results in [43]. For overcoming the using of acceleration measurement € q which is generally not available by sensors, the auxiliary variable z is defined as where the vector p(q) is obtained by the observer gain matrix L(q): Considering (1), (5), and (6) and letting H(q, _ q) refer to erefore, instead of using the acceleration measurement, the modified disturbance observer is Overall, using of the NDO (8) mainly relies on the determination of the observer gain matrix L(q). In [43], the following disturbance observer gain matrix is given: where X is a constant invertible n × n matrix to be determined. e disturbance tracking error Δτ d � τ d − τ c , and then according to (1), (5), (6), and (8), we have the relationship Figure 1: Frames of a UVMS.

Complexity
In this paper, we use the NDO to the UVMS and provide a stability analysis of the disturbance estimation. (8) to the UVMS in (1) subjected to unknown disturbances, the disturbance tracking error Δτ d � τ d − τ c of the UVMS converges to zero if the following conditions hold:

Theorem 1. Using the NDO in
(1) e matrix X is invertible (2) ere exists a positive definite and symmetric matrix Π such that

the change rate of the unknown disturbances is bounded by a constant κ.
Proof. Considering the following candidate Lyapunov function, then, we have and we obtain where θ ∈ (0, 1). erefore, □ Remark 1. It is clear that the NDO is an observer of biased estimation, i.e., it has a blind domain (16) of disturbance estimation which cannot be rejected. In real practical applications, the NDOs are all in a biased estimation, i.e., the bias from the disturbance compensation always exists. For solving the problem of biased estimation in the NDO, Section 5 is introduced.

Formulation of an Online Optimizing Robust Control Scheme for the UVMS
Given by the desired position q d generated by the inverse kinematics (IK) algorithm (see [1]) for the UVMS, the trajectory tracking errors of positions, velocities, and accelerations can be defined as respectively. Let the unknown maximum possible biased estimation of the NDO ‖M(q) − 1 ((2κλ max (M(q))‖X‖)/(θλ min (Π)))‖ 2 � w 2 max . Considering that there still exist the incomplete compensation error of the unmodeled uncertainties and underwater environment disturbances, as well as the constrained dynamical control inputs, the proposed GWO-based online optimization control scheme is where it consists of the CTC, the NDO in (8), and a modified e online tuning matrices P, V, X, W are from the following per-sampling period real-time optimization: subject to _ e] T , and e reasons for the selection of the evaluation function (18) are as follows: first, the optimization problem (18) clearly shown that the more smaller c 2 is, the better disturbance attenuation performance of our proposed GWO-based online optimization control scheme works with, and second, the optimization is considered with constraint equation (21), which can make the controlled system (1) work in the proper limits of each Dof's physical/ mechanical conditions. Remark 3. As the estimation of the NDO has a bias from the real disturbances, the new evaluation function in the modified H ∞ controller is proposed, which can be used to reject the compensation bias of disturbances by its optimization. Simultaneously, the constraints in the control inputs are considered into the new evaluation function. e proposed GWO-based online optimization control scheme is used to eliminate the reference trajectory tracking error in the optimal control, and its advantage is that the receding horizon optimization is used to compensate the noises/bias with unknown distribution (not done by the NDO) while making system inputs bounded in the actuator constraints.

Stability Analysis of Our Proposed Control Scheme
Theorem 2. e suboptimal GWO-based online optimization control scheme (17) can drive dynamic equation (1) of the UVMS with unknown disturbances τ d to be asymptotically stable.
Proof. Substituting the proposed torque control law to equation (1), we can obtain the closed-loop system: Given by then equation (19) can be and multiplying by According to Schur complement theorem [30], we have Let the Lyapunov candidate function Substituted by u � Kx, then □ Remark 4. Equation (28) is multiplied by x with two sides, and we have en, according to Remark 4 and (20), we can finally reformulate equation (30) to clearly, and dynamic equation (1)

Grey Wolf Algorithm
e biomimetic swarm intelligence has become the focus of interdisciplinary research in recent years. It provides new ideas for solving the large-scale complex problem and has been widely used in robot controlling due to its many advantages of self-organization, parallelism, distribution, flexibility, and robustness. Currently, humans have developed many swarm intelligence algorithms by imitating the biological groups and their genetic evolution process in nature, such as particle swarm optimization (PSO) algorithm, ant colony optimization (ACO) algorithm, shuffled frog-leaping algorithm, artificial fish-swarm algorithm, and cuckoo search algorithm [44][45][46][47][48]. Although many scholars have improved their theory and made achievements, the improved methods still have the disadvantages of slow convergence, computational complexity, and falling into its local optimum.
Grey wolves' algorithm is a new kind of metaheuristic biological intelligence algorithm proposed by [40], imitating wolves' hierarchical leadership and hunting disciplines in nature. Wolves' hunting process is shown in Figure 2. e wolves are divided into 4 types in Figure 3. Alpha wolves are leaders, whose main task is choosing habitats and making schedules of hunting, rest, and so on. Beta wolves, as the subordinate, help alpha wolves make decisions and arrange others. Delta wolves obey alpha and beta wolves, but they can rule omega wolves who are the lowest hierarchical in wolves. Mirjalili et al. have proved that the searching performance of the basic GWO is better than PSO, ACO, and so on. GWO has a simple principle with a fast searching speed and good precision and is easy to be implemented with practical engineering. erefore, the GWO is applied to optimize the online optimization control scheme (17). e basic grey wolves' hunting system consists of one leader wolf, a group of searching wolves, and a group of encircling wolves. e leader wolf commands any wolves, searching wolves look for prey, and encircling wolves attack the prey. e wolves' hunting can be abstracted into 3 kinds of intelligent behaviors (searching behavior, calling behavior, and encircling behavior), and the wolves' production rule is "winner takes all," and the updating mechanism is "strong survived." e specific algorithm is listed as follows: (1) e leader wolf production rule: in searching the space, the wolf with the currently optimal evaluation function value is called the leader wolf and denoted by Y lead . e leader wolf just performs calling behavior and directly goes into the next iteration until it is replaced by the other stronger wolves. (2) Searching behavior: let i be one of the searching wolves, and i will record the odor concentration; Y i (evaluation function value) is perceived by each step step d a towards the prey. In the searching process, for the t (t � 1, 2, . . . , T max ) directions that i has walked, the successor d-th dimensional position vector x i,d of each i is updated by where the searching stops until Y i perceived by a wolf is Y i > Y lead , or the searching maximum iteration number T max is reached.

Complexity
(3) Calling behavior: let j be one of the encircling wolves, and hearing the calling made by the leader wolf, the encircling wolf j will run to the location of the leader wolf in a relatively larger step step d b . In the running process, for the k (k � 1, 2, . . . , K max ) steps that j has ran, the successor d-th dimensional position vector x j,d of each j is updated by where g k d is the leader wolf position and K max is the running maximum iteration number. On the way of running, if Y j perceived by a wolf is Y j > Y lead , then Y j � Y lead , and the encircling wolf is changed into the leader wolf to start calling behavior. Otherwise, it will continue running until K max is reached, or the distance d near between itself and the leader wolf is in the range of [d min , d max ]. d near can be defined by where ω is the distance determinant factor and D is a positive constant. (4) Encircling behavior: after the encircling wolves running process, attacking begins. Let the position of the prey in the d-th dimensional space be G e d ; for the e (e � 1, 2, . . . , E max ) steps that j has attacked, then the successor d-th dimensional position vector of each j is updated by where E max is the attacking maximum iteration number, λ is an uniformly distributed random number in [− 1, 1], and step d c is an attacking step. If the odor concentration of the prey perceived by a wolf is Y j > Y lead , then Y j � Y lead , and the attacking wolf is changed into the leader wolf to start calling behavior. Otherwise, it will continue attacking until E max is reached. Generally, step d a , step d b , and step d c satisfy the following relationship: where S is the step length factor.

The Detailed Algorithm of the Online
subject to (19)- (22). Clearly, the GWO is used to online find the optimal matrices P, V, X, W, and c to make evaluation function (37) as minimum as possible, realizing suboptimal H ∞ robust control performance of the UVMS. Matrices required to be selected are P � diag P 1 , P 2 , . . . , P n , For further reducing the complexity of the GWO to optimize, we combine the matrices to one independent variable P, V, X, W { }, which means that a wolf position x n (n � 1, 2, . . . , N, N is the quantity of wolves) corresponds to a group of candidate matrices P, V, X, and W. e flowchart of Eval function (37) optimized by the GWO is shown in Figure 4. Its detailed algorithm is listed as follows: Step 1. Parameters initialization: initialize the maximum global iterations NUM max , the quantity of wolves N (consists of the quantity of searching wolves N s and the quantity of encircling wolves N e ) and their initial random positions x n (n � 1, 2, . . . , N), the maximum local iterations (K max , T max , E max ), searching wolves scaling factor α, distance determinant factor ω, and step length factor S Step 2. e optimal wolf (the optimal means the present minimum value in Eval function (37)) is selected to be the leader Y lead , and make the searching wolves look for the prey by formula (32) until i one of N s searching wolves detects a prey odor concentration Y i which is bigger than the wolf leader's Y lead or reaches its T max , then go to Step 3.
Step 3. Hearing the calling, the encircling wolves will run to the prey according to formula (33); if j is one of N e encircling wolves, which perceives prey odor concentration Y j > Y lead when running, then Y lead � Y j and replaces the leader wolf to continue to make calling behavior; if Y i ≤ Y lead , then the encircling wolves continue to run to the prey until d near ∈ [d min , d max ] or K max is reached, go to Step 4.
Step 4. According to formula (35), update the wolves' position which participate in the attack. If the odor concentration of the prey perceived by a wolf is bigger than Y lead or E max is reached, update the position of the leader wolf; if not, do not update themselves.
Step 5. Judge whether it has achieved the optimization goal or the maximum number of iterations NUM max ; if it achieves, then output the leadership wolf position which will be the optimal solution of Eval function (37); if not, go to Step 2.

e Whole Structure of the Proposed GWO-Based Online
Optimization Control Scheme for the UVMS. In Figure 5, the working flow of the proposed control scheme is demonstrated, which consists of three parts. e first part is the CTC controller, which is computed from the internal system dynamics to give the basic control actions. e second part is the NDO, which is working online for providing the compensative control actions for rejecting the unmodeled uncertainties and external disturbances. e third part is the modified H ∞ controller, which is used to overcome the biased estimation of the NDO, making the whole system convergent.
e whole structure of the proposed control

Simulation Results
e control simulator [1] of a 6DOF vehicle-6DOF manipulator is carried out. e structure and physical parameters of the UVMS are shown in Appendix (Tables 1-3). Before the GWO algorithm is evaluated to online find the optimal matrices P, V, X, W { } for the online robust control scheme of the UVMS, we randomly initialize artificial wolf positions x n (n � 1, 2, . . . , N) and GWO parameters in Table 4. System disturbances are given by the step signal in Figure 6. e modeled

Compensation control τc
Measured states q q
9.1. Case 1: Comparative Performances of Antidisturbances with and without the NDO or H ∞ Controller. In this case, the proposed control scheme is used with and without the NDO or H ∞ controller to track the desired trajectory. e process of tracking errors of X, Y, and Z is recorded. In Figure 7, it can be seen that the proposed control scheme with the NDO and H ∞ controller do well in rejecting the external disturbances and realize the convergent trajectory tracking. However, the control scheme with the NDO but without H ∞ controller fails to track the desired trajectory with a small tracking error, and the control scheme without the NDO but with H ∞ controller is not convergent. erefore, it is necessary to use the NDO to assist the H ∞ controller of the UVMS, which is indispensable. In addition, the NDO has a biased estimation, which can be eliminated by the H ∞ controller, so trajectory tracking with both NDO and H ∞ controller can be successful in Figure 7.

Case 2: Comparative Computational Efficiency with Previous Optimization Methods.
For evaluating the performance of the GWO algorithm in rapidly tuning the online robust control scheme, we compare it with other online iteration algorithms like particle swarm optimization (PSO) [46], ant colony algorithm (ACO) [47], genetic algorithm (GA) [48], and Kleinman method [38]. e comparison is shown in Figure 8. It is shown that the GWO algorithm has the advantage of obtaining a better optimization result and makes its optimization process less than 100 ms, which means that it can provide a real-time online optimization for evaluation function (18) of the modified H ∞ controller. However, traditional methods like PSO, ACO, and GA cannot do well in handling complex evaluation function (18), which all take longer optimization time than the GWO. e Kleinman method cannot work efficiently in the online optimization either.

Case 3: Comparing the Robustness Performance of Trajectory Tracking with Previous Methods.
For evaluating the robustness performance of controlling the UVMS with the proposed control scheme, we compare it with other traditional controllers such as SMC [10], PID [7], and the conventional H ∞ controller [4] based on Kleinman. e SMC controller [10] used to compare in this paper is shown by where K s1 and K s2 are the gain matrices of the SMC, which are selected as the positive diagonal matrices in this paper. s � _ e + K s e represents the vector of the first-order sliding surface, and K s represents the positive diagonal matrix. e PID controller [7] used to compare in this paper is shown by τ(n) � τ(n − 1) + K p (e(n) − e(n − 1)) + K i e(n) where τ(n) is the control signal, e(n) � q d (n) − q(n) is the position tracking error, are K p , K i , and K d are the proportional gain, integral gain, derivative gain, respectively. Here, n is the sample time.
e conventional H ∞ controller [4] used to compare in this paper is shown by where its designed control law is similar to our proposed controller in this paper, but it is noted that the control law in [4] is optimized by the Kleinman method [38]; however, our proposed method is based on the GWO. Given the vehicle initial position (0, 0, 0) and its manipulator initial configuration q m � [0, − 45, − 45, 0, 0, 0]/  180π, set the grasping object position to be (− 0.6, 1.2, 4), which is shown in Figure 9. rough the simulations of grabbing the object by these methods shown in Figures 10 and 11, we find that only our proposed control scheme can meet the design requirements of UVMS in rapid response, tracking accuracy, and disturbance attenuation, which is       more robust than others. In Figure 11, the green lines show the target optimal paths of grasping the object by the end effector, and the black lines represent the actual moving paths of the end effector made by these controllers. Simultaneously, the detailed variation process of the speed, acceleration, and generalized forces in controlling the UVMS by our proposed method is recorded in Figures 12-14, which can be seen that the control inputs of the forces/moments/torques can be satisfied to be bounded in constraints (40).

Conclusions
For controlling the nonlinear UVMS in presence of unmodeled uncertainties and external disturbances, an adaptive robust control scheme which consists of a CTC, a modified constrained H ∞ controller, and a new designed NDO is proposed and successfully applied to control the UVMS with an online GWO optimization. In the simulation, the GWO has a faster convergence than the conventional PSO, ACO, GA, and Kleinman method. Also, the GWO-optimized H ∞ controller can overcome the biased estimation of the NDO. And, the proposed control scheme completes the desired trajectory tracking of the end effector better than the prior control methods, such as SMC, PID, and the conventional H ∞ controller based on Kleinman. Overall, the proposed control scheme designed in this paper can provide a feasible method for online robust suboptimal controlling the nonlinear UVMS. Data Availability e data that support the findings of this study are from the book "G. Antonelli, Underwater Robots, Springer Tracts," which provide the 6DOF vehicle-6DOF manipulator opensource simulation tool Simurv4.0 (http://www.eng.docente. unicas.it/gianluca_antonelli/simurv).

Conflicts of Interest
e authors declare that they have no conflicts of interest.

Authors' Contributions
Y. Dai designed the main idea and wrote the article. S. Yu supervised the whole work. D. Wu and Y. Yan contributed to modify the structure of the article and proofread the manuscript.