Composite learning tracking control for underactuated marine surface vessels with output constraints

In this paper, a composite learning control scheme was proposed for underactuated marine surface vessels (MSVs) subject to unknown dynamics, time-varying external disturbances and output constraints. Based on the line-of-sight (LOS) approach, the underactuation problem of the MSVs was addressed. To deal with the problem of output constraint, the barrier Lyapunov function-based method was utilized to ensure that the output error will never violate the constraint. The composite neural networks (NNs) are employed to approximate unknown dynamics. The prediction errors can be obtained using the serial-parallel estimation model (SPEM). Both the prediction errors and the tracking errors were employed to construct the NN weight updating. Using approximation information, the disturbance observers were designed to estimate unknown time-varying disturbances. The stability analysis via the Lyapunov approach indicates that all signals of unmanned marine surface vessels are uniformly ultimate boundedness. The simulation results verify the effectiveness of the proposed control scheme.


INTRODUCTION
In recent years, with the development of the marine economy, marine transport vehicles have gained much attention (Shen et al., 2020;Yu, Guo & Yan, 2019). Marine surface vehicles (MSVs) have been widely used in marine exploration, marine transportation, marine survey and other fields (Liu et al., 2016;Shao et al., 2019). To accomplish these tasks, the trajectory tracking control of MSVs plays a significant role. Due to the influence of the external environment, the kinetics of MSVs inevitably have unknown dynamics and unknown time-varying environmental disturbances.
To address the underactuation problem of MSVs, several control methods are introduced, such as additional control method (Do, 2010;Park, Kwon & Kim, 2017;Chen et al., 2020), output redefinition control (Shojaei & Arefi, 2015;Shojaei, 2017), line-ofsight (LOS) (Shojaei, 2015;Gao et al., 2016;Jia, Hu & Zhang, 2019;Liu, 2019), etc. Three additional control terms were adopted to address the underactuation problem of MSV in Do (2010), Park, Kwon & Kim (2017), Chen et al. (2020). To achieve the design of trajectory tracking control laws, the output redefinition control approach in Shojaei & Arefi (2015) and Shojaei (2017) was introduced to handle the underactuation problem, the combination of adaptive technique, NNs and saturation function to solve the unknown disturbances, unknown dynamic and input saturation, respectively. In Shojaei (2015), Gao et al. (2016), Jia, Hu & Zhang (2019) and Liu (2019), the LOS method was utilized to solve the underactuation problem of MSVs, the combination of parameter adaptive technology and NN approximation are used to successfully solve the time-varying external disturbance and parameter uncertainty.
For the sake of navigation safety, the output constraint problem is inevitably in practice. In practice, the navigable water areas are restricted, and then surface vessels should navigate in the navigable water areas. When the position error is too large, it may lead to collision accident of MSVs. When the yaw angle errors become excessive, the actuator will be damaged due to overload. Therefore, it is necessary to further study the MSVs trajectory tracking system with output constraints. Several methods have been presented to solve the output constraint problem, such as moving-horizon optimal control (Mayne & Michalska, 1990), artificial potential field (Sun & Ge, 2014), barrier Lyapunov function (BLF) (Tee et al., 2011) and output error transformation method (Zheng et al., 2020;Zhu, Du & Kao, 2020). In Zheng et al. (2020) and Zhu, Du & Kao (2020, the output constraint problem is transformed into a tracking error constraint problem by using the coordinate transformation. Coordinate transformation ensures that the tracking error always stays within predefined boundaries. Duo to the structure of Lyapunov function can be constructed by a barrier function, the BLF-based approach can solve the problem of trajectory tracking control for MSVs under the output constraint (Zhu, Du & Kao, 2020;Zhao, He & Ge, 2014). In simultaneous consideration of unknown dynamics and time-varying disturbances, Zhu, Du & Kao (2020) use a log-BLF method to solve the constant symmetric output constraint, Zhao, He & Ge (2014) utilize the asymmetric BLF method to deal with the asymmetric output constraints.
All the literature mentioned before have concentrated on the tracking and stability of the system. Most literature have not mentioned the precision accuracy of identifying models. In practice, the model uncertainty should be approximated as precisely as possible. In generally, the unknown dynamics of the system can be compensated by using adaptive control technique. In order to achieve better control performance, composite adaptive control scheme is developed in Patre & Bhasin (2010). It makes the system realize faster parameter convergence as well as smaller tracking error, and has been applied in various fields (Sun, Pan & Yang, 2017;Pan, Sun & Yu, 2016). By approximating the unknown dynamic items faster and more accurately to obtain better control performance, the prediction errors can be constructed by the serial-parallel estimation model (SPEM) (Peng, Wang & Wang, 2017). Then, the updating law of the neural network is designed by using the prediction error, which improves the transient performance effectively. To update the laws and optimize the system's transient performance, Yucelen & Haddad (2013) presented an adaptive control modification. An error feedback term was included in the reference model in Pan, Sun & Yu (2016) and Stepanyan & Krishnakumar (2010) to improve the transient performance of the model. In Xu & Sun (2018), both the prediction errors and the tracking errors were applied to construct the updating law of NNs weights. The index of learning performance is introduced in the update rate, some literature focus on constructing composite learning laws by introducing auxiliary filter (Na et al., 2015;Huang et al., 2018) or using time interval data (Xu et al., 2019;Xu et al., 2018).
In this paper, we propose a composite learning control strategy for underactuated MSVs subject to unknown dynamics, ocean environmental disturbances, and output constraints based on the discussion above. The main contributions can be summarized as follows.
• Position error and yaw angle error constraints are addressed by employing the BLF-based method. The dynamic surface control approach is used to decrease the computation of the explosion problem that exists in the backstepping method.
• The composite NNs are employed to approximate the unknown dynamics of MSVs. Different from the traditional NN in which only the tracking errors are used to update the NN weights, both the tracking errors and prediction errors are used to update the NN weights. Therefore, the unknown dynamics can be approximated faster and more accurately.
• Using the approximation to the unknown dynamics of MSVs, the NDOs are constructed to estimate time-varying disturbances. By combining the dynamic surface control technique with disturbance observers and composite NNs, a trajectory tracking control system is developed. Compared with the control scheme based on neural networks, the proposed control scheme can effectively improve the transient and steady-state performance of MSVs trajectory tracking control.
The rest of this paper is arranged as follows. In Section 2, the mathematical model of MSVs and the problem formulation are introduced. In Section 3, the principle of intelligent approximation using NN is presented. In Section 4, proposes the details of controller design procedures. In Section 5, the simulation results are given to show the effectiveness of the controller. In Section 6, the entire work is summarized.

PROBLEM FORMULATION AND PRELIMINARIES MSV kinematic and dynamic models
The mathematical model of underactuated MSVs with 3 degrees of freedom can be described aṡ where [x,y,ϕ] T denotes the position and heading angle in the inertial reference frame.
[u,v,r] T denotes surge, sway and angular velocity in the body-fixed frame. The m ii , i = 1,2,3 represent the inertia including added mass. The d ii , i = 1,2,3 stand for the hydrodynamic damping in surge, sway and yaw. The d j , j = u,v,r denote unknown environmental disturbances. f u , f v and f r represent unknown dynamics of the MSVs. τ u and τ r are the control force and moment in the surge and yaw directions. Assumption 1: The environmental disturbances d j are unknown bounded and there exists ḋ j ≤d j , j = u,v,r,d j are unknown positive constants.
Remark 1: The ocean disturbances include slowly changing disturbances caused by second-order waves, currents, winds and unknown dynamics, as well as norm-bound disturbances caused by ocean uncertainties. The energy in the marine environment is finite. The rate of change of ocean disturbance is unknown bounded.
Remark 2: Since these parameters of MSVs are affected by operational conditions and marine environment. These factors change frequently, which makes these parameters of MSVs are uncertainties. where m ii and d ii , i = 1,2,3 represent nominal values of the inertia including added mass and the hydrodynamic damping, respectively. Where f j , j = u,v,r represent unknown dynamics includes uncertain parts of the model parameters.
Assumption 2: The desired smooth reference signal x d , y d and its first two time derivatives are bounded.
The position errors and orientation tracking errors will be defined in the body-fixed frame The time derivative of Eqs. (3a) and (3b) can be expressed aṡ In engineering practice, the MSV position, heading, velocities in surge and sway, and yaw rate can be measured by the global positioning system, the gyro compass, the Doppler log, and the rate gyro, respectively. Then, we define the tracking position error ρ s and yaw angle error θ as By combining Eqs. (3a)-(3b) and Eqs. (5a)-(5b) we can get To avoid the possible singularity of the virtual control law, a positive constant ρ 0 is introduced. Considering Assumption 1 and Assumption 2, the control objective is to construct the composite intelligent learning control law τ u and τ r for MSVs to make sure the ρ s and θ can converge to arbitrarily small errors under unknown dynamics, time-varying disturbances and output constraints.

Radial basis function neural network (RBFNN) approximation
In this paper, the RBF NNs are employed for approximation. For an arbitrary continuous function f (ς) over a compact set (ς) → R n , there exists an RBF NN with the following form: where f (ς ) ∈ R p denotes the output vector of the RBF NN, ς ∈ R q denotes the input vector of the RBF NN. ψ(ς ) is Gaussian basis function. c j is the center of the basis function and b j is the width of the Gaussian function. ξ w is the approximation error that satisfies |ξ w | ≤ξ , ξ is an unknown positive constant. According to Eq. (43), ω is the ideal weight parameter that satisfies ω = argmin ω∈R sup ς∈ (ς) f (ς) − ω T ψ(ς) represent NN weights parameter. However, it is very difficult to determine the ideal weight parameter.ω is the estimate of the NN weights parameter. However, it is very difficult to determine the ideal weight parameter. The estimate of the NN weights parameter is usually used to approximate the unknown nonlinear term such asf =ω T ψ in practice.

CONTROL LAW DESIGN
In this section, we can design the control law for the MSVs under Assumption 1-2. The block diagram of the trajectory tracking control system of MSVs is presented in Fig. 1. Combing Eqs. (5a) and (5b) with Eqs. (6a) and (6b), the time derivative of ρ s can be written aṡ where ζ 1 and ζ 2 are defined as follows When MSV pass through a narrow passage, it is necessary to limit the position error ρ s to prevent vehicle collisions. The BLF can be selected as the following form where log ( * ) is the natural logarithm of ( * ), k a is the constraint of ρ s , there exist |ρ s | < k a . Taking time derivative of Eq. (10) , it can be further written aṡ The virtual control law can be designed as where k ρ is a positive constant.
In the surge direction, Let α u pass through a first-order filter with a time constant T u > 0 to get a new state variable β u .
Then, the filter error and velocity error can be defined as λ u and u e , respectively. So, it can be expressed as The time derivative of λ u can be calculated aṡ where B u is a continuous function and has a maximum value H u . Then, V 2 can be further chosen as The time derivative of Eq. (16) can be written aṡ According to Eqs. (2a) and (12), we can obtain the time derivative of as The unknown term can be approximate using NN. We have m 22 vr − d 11 u + f u = ω u T ψ u +ξ u . Here, let D u = ξ u +d u . The ξ u is the approximation error that satisfies the time derivative of ξ u is bound. With Assumption 1, we can get where χ u0 and χ u are unknown positive constants. Therefore, the time derivative of V 2 can be further written aṡ Then, we can design the control law as where k u is a positive constant.ω u is the estimation of the ω u .D u is the estimation of the D u .
From Eq. (21) along Eq. (20), we can geṫ Then, we can define z u as prediction error u can be defined with SPEṀ The prediction error is employed to construct the weight updatinġ where γ u , γ zu and ϑ u are the positive constants to be designed. The approximation information is employed to construct the NDO in the following form According to Eqs. (2a), (27a) and (27b), the derivative ofD u can be expressed aṡ Then, theḊ u can be calculateḋ Combining Eqs. (5a)-(5b) with Eqs. (6a)-(6b), the time derivative of θ can be written aṡ It is also necessary to restrict θ in practice, there exist |θ | < k b . Similar to the above, we select the following BLF candidates as Taking time derivative of Eq. (31), it can be further written aṡ According to Eq. (32), we can get virtual control law α r for the yaw direction where k θ is a positive constant. Remark 3: From Eq. (33), it can be seen α r is undefined when ρ e = 0. The positive constant ρ 0 is designed to make ρ e −ρ 0 can converge to the neighbor of zero. It means that ρ e can converge to the neighbor of ρ e . Therefore, the singularity of α r can be avoided.
Let α r pass through a first-order filter with a time constant T r > 0 to get a new state variable β r .
Then, the filter error and velocity error can be defined as λ r and r e , respectively. So, it can be expressed as The time derivative of λ r can be calculated aṡ where B r is a continuous function and has a maximum value H r . Then, V 4 can be further chosen as The time derivative of Eq. (37) can be written aṡ According to Eqs. (2c) and (35), we can obtain the derivative of r e as The unknown term can be approximate using NN. We have (m 11 −m 22 )uv −d 33 r + f r = ω r T ψ r +ξ r . we can define D r = ξ r +d r , The ξ r is the approximation error that satisfies the time derivative of ξ r is bound. With Assumption 1, we can get where χ r0 and χ r are unknown positive constants. Then, the time derivative of V 4 can be further written aṡ Then, we can get where k r is a positive constant.ω r is the estimation of the ω r .D r is the estimation of the D r .
From Eqs. (41) along (40), we can geṫ Then, we can define z r as prediction error wherer(0) = r(0), φ r is a positive constant. The prediction error is employed to construct the weight updatinġ where γ r , γ zr and ϑ r are the positive constants to be designed. The approximation information is employed to construct the NDO in the following form According to Eqs. (2a), (48a) and (48b), the derivative ofD r can be expressed aṡ Then, theḊ r can be calculateḋ Remark 4: From Eqs. (26) and (47), it can easily obtain the weight updating of composite NN is designed by employing tracking error and prediction error. The prediction error can provide extra information for learning NN weight updating. Thus, better tracking performance can be achieved.
Remark 5: In Eqs. (26) and (47), γ u and γ r are positive constants used to optimize the learning rate. Theω u andω r mainly tuned by the prediction errors if and are chosen larger, while if γ zu and γ zr are chosen smaller, theω u andω r mainly tuned by the tracking errors.
The compound unknown terms consist of unknown dynamics and time-varying disturbances are expressed as u and r .
Remark 6: The disturbance observer and neural network contain each other's information. If compound unknown terms can be perfect follow byω T u ψ u +D u and ω T r ψ r +D r , the system's estimation of unknown information can be more accurate. As a result, the objective of composite learning combining NN and NDO is accomplished.
Remark 7: Through trial and error, we first choose the appropriate design parameters k ρ , k θ , k u , and k r to ensure that the system is stable. Furthermore, we properly regulate the other design parameters γ u ,γ zu ,γ r ,γ zr ,ϑ u , ϑ r , φ u and φ r to get the satisfactory control performance. A large number of simulations in many cases show that the larger k ρ , k θ ,, k u , k r , γ zu , γ zr , φ u and φ r are, the MSVs can obtain higher tracking accuracy.

SIMULATION RESULTS
In this section, to demonstrate the effectiveness of the proposed control system, the dynamic model of an MSV in Do, Jiang & Pan (2004) is considered.
The proposed control scheme is marked as τ CL . The control strategy without considering the prediction error is denoted as τ NN .
The simulation results are depicted in Figs

CONCLUSIONS
In this paper, a composite learning trajectory tracking control scheme is proposed for underactuated MSVs in the presence of unknown dynamics, time-varying disturbances and output constraints. The underactuation problem of the MSVs is addressed by the LOS approach. The barrier Lyapunov function is introduced to deal with the problem of output constraint. The composite learning control scheme is utilized to approximate unknown dynamics. The prediction errors and the tracking errors are adopted to construct the NN weight updating. Using approximation information, the disturbance observers are designed to estimates unknown time-varying disturbances. The Lyapunov method is used to demonstrate the stability of a closed-loop system. The simulation results demonstrate the effectiveness and superiority of the proposed control scheme.
Furthermore, the finite-time control can be further considered. The control scheme in this paper can be easily combined with event-triggered control.