Adaptive Finite-Time-Based Neural Optimal Control of Time-Delayed Wheeled Mobile Robotics Systems

For nonlinear systems with uncertain state time delays, an adaptive neural optimal tracking control method based on finite time is designed. With the help of the appropriate LKFs, the time-delay problem is handled. A novel nonquadratic Hamilton–Jacobi–Bellman (HJB) function is defined, where finite time is selected as the upper limit of integration. This function contains information on the state time delay, while also maintaining the basic information. To meet specific requirements, the integral reinforcement learning method is employed to solve the ideal HJB function. Then, a tracking controller is designed to ensure finite-time convergence and optimization of the controlled system. This involves the evaluation and execution of gradient descent updates of neural network weights based on a reinforcement learning architecture. The semi-global practical finite-time stability of the controlled system and the finite-time convergence of the tracking error are guaranteed.


Introduction
Adaptive intelligent control algorithms have developed rapidly with the advancement of intelligent approximation technology, especially in neural networks (NN) and fuzzy logic systems (FLS), and have achieved a series of excellent research results [1][2][3][4][5][6][7][8][9].This has also significantly motivated many scholars to explore the adaptive control algorithm, laying a solid foundation for using the corresponding control theory algorithm in the field of practical engineering applications.
Considering that control and decision-making problems are essentially optimization problems, and optimal control plays a key role in engineering applications, the research on intelligent majorization control algorithms in this paper has a certain role in promoting practical engineering applications.In view of the importance of optimal control, many scholars have conducted extensive research on optimal control algorithms and have obtained certain achievements, mainly including two optimization methods [10,11], adaptive dynamic programming (ADP) methods [12] and reinforcement learning (RL) methods [13].
The ADP approach could realize the online approximation of the optimal target by the recursive numerical method without relying on the control algorithm of the model [14][15][16][17][18][19].Using NNs, the performance function, designed control laws, and the uncertain part of the nonlinear system could be approximated, which helps solve the HJB function; then, the optimal stability is guaranteed.Similar to the learning mechanism of mammals, the reinforcement learning mechanism aims to regulate both the critic and action adaptive laws in order to control the long-term interaction cost of the environment.The action NNs could modify the action laws, while the critic NNs reduce the virtual energy of the long-term storage function.Thanks to the interoperability of the operating mechanism, refs.[20][21][22][23][24] have made outstanding contributions to online optimization control and model-free optimization control.
Although previous ADP-based methods perform well for non-linear systems without time delays, achieving the ideal control effect on time-delayed non-linear systems is often challenging.Therefore, research on this topic has generated interest among experts and scholars and has achieved preliminary results.However, the time delay in the form of nonlinear interference is a major obstacle to applications of control theory algorithms.Some scholars have paid attention to this and have achieved certain results.Regarding existing methods, there are two main forms of system delay: state and input [25].
State time delays are mainly found in intricate engineering systems, for example, wheeled mobile robot (WMR) systems and chemical engineering, which are hysteresis induced by internal propagation of signals during system motion.With assistance from the Lyapunov-Krasovskii functional (LKF) [25][26][27], the influence caused by the state time delay is overcome, and superior control algorithms are designed.
Due to the contradiction between the convergence characteristics of existing optimization algorithms in infinite time and the fast convergence requirements of actual engineering systems, it has greatly inhibited the practical application and promotion of intelligent optimization algorithms.Therefore, in recent years, some scholar award research has focused on the study of convergence speed and convergence domain equilibrium.The existing breakthrough theoretical research results on infinite convergence algorithms [28][29][30][31][32][33] have promoted the research process of finite-time convergence control algorithms to a certain extent and also reflect the necessity of studying the algorithm from the side.At the same time, they also point out the key and difficult issues faced by finite-time convergence control research.
To meet the finite-time or finite-horizon domain convergence characteristics of actual engineering requirements, some scholars have begun relevant research.For nonlinear discrete systems, researchers use the ADP-based approach to solve the finite-time domain convergence problem [34,35], which greatly stimulates the authors' research passion for finite-time convergence optimization control algorithms.
Different from finite-horizon convergence, besides guaranteeing the time domain of system convergence, finite-time convergence also increases the speed and accuracy of system convergence.However, the existing research is not perfect and is still in its infancy, but some studies with excellent performance have been obtained [36][37][38][39][40][41][42][43][44][45].Up to now, the finite-time optimization algorithm, considering both convergence speed and convergence precision as well as considering energy consumption, is basically absent.Therefore, based on the previous research, this paper not only considers the state delay, but also considers the input delay, and uses the ADP method to effectively resolve the finite-time optimal tracking control problem of the controlled target.
An adaptive finite-time online optimal tracking control method based on neural networks is designed for uncertain nonlinear systems with state time delays.Firstly, the initial nonlinear system is extended to an augmentation system, which contains tracking error and target expectation information, and a novel discounted performance function is presented.Secondly, a Hamiltonian function is constructed, and the appropriate LKFs are used to resolve the problem of state delay.Then, for the solution of the ideal HJB function, this paper introduces the method of integral reinforcement learning (IRL).Finally, by designing the optimal control strategy and optimizing the control adaptive law, the semi-global practical finite-time stability (SGPFS) lemma, not only is the influence of time delays eliminated, but the stability of uncertain nonlinear systems is guaranteed.The main innovative work includes: (1) The time-delay effect is incorporated into the strategy design process to address the finite-time convergence issues.(2) The problem caused by the state time delay is solved simultaneously in the optimal control process.
(3) The optimal control policy guarantees that the target control system achieves optimal control within a finite time.

System Description and Preliminaries
Considering the state time-delayed nonlinear system as .
where the delayed dynamics h(β(t − t 1 )) is one known function vector with an unknown time delay t 1 .For the sake of simplicity in subsequent expressions, except for the hysteresis term β(t − t 1 ), t and other variables are omitted.g(t) denotes the input function, p(t) denotes the state function, u(t) denotes the system control input, and ω(t) denotes the external perturbation function.
Considering the state and the input time delays in system (1), the appropriate LKF is introduced to deal with the state time-delay problem, respectively.And according to Remark 1 in [26], only when the delayed dynamics α(t) are known, one can obtain The following scientific assumptions are made, and corresponding lemmas are given to ensure that the subsequent design process achieves the expected control objectives.Assumption 1.Both function p(t) and g(t) are continuously differentiable.For the timedelay function p(•), its Jacobi matrix ∂p(β)/∂β satisfies the Lipchitz condition ∥∂p(β)/∂β∥ ≤ η with η ≥ 0. Assumption 2. The boundedness of the unknown input transfer function g(t) can be obtained as g < g ≤ g.Similarly, σ min ≤ ∥σ(•)∥ ≤ σ max ; φ min ≤ ∥φ(•)∥ ≤ φ max can be used to present the boundedness of the activation functions in hidden layers of NNs φ(•) and the functional approximation error σ(•).

Lemma 2 ([39]
).For the nonlinear system where L(x) is a smooth positive definite function, ι > 0, 0 < b < 1, σ > 0, one can further obtain that the nonlinear system In this paper, by designing an adaptive NN-based optimal controller u(t) such that β(t), the output of the system could track β d (k) well in a finite time.The two main types of neural networks used in this paper include critic neural networks and action neural networks.The critic neural network is used for the estimation of the long-term utility function, while the action neural network is used to ensure the stability of the system and the solution of the optimal control inputs of the system.

Controller Design and Stability Analysis
This section is divided into subheadings, which provide a concise and precise description of the experimental results and their interpretation, as well as the experimental conclusions that can be drawn.Depicted in Figure 1, in this section, we design an optimal controller, which ensures the optimal control of the system and converges within a finite time.By transforming the initial system into an augmented system, which tracks errors and targets expected information, a novel discounted performance function is presented.Furthermore, a Hamiltonian function is constructed, and the time-delay problem will be solved by using the appropriate LKFs.Then, by introducing the IRL method to the Hamiltonian function, a finite-time optimal tracking controller based on neural networks is designed.Finally, the adaptive law of the appropriate evaluator and the adaptive law of the action NN are designed; the target system's SGPFS can be ensured.

System Transformation
Considering the nonstrict nonlinear system (1), we developed a controller using a neural network to enable the system to follow the desired trajectory.Firstly, the tracking error system can be design as Then, find the (4) derivative, and we can obtain Assumption 3. The target-given trajectory β d with the initial state as β d (0) = 0 is bounded, and .
β d (t) can be rewritten into the form of ( 6) by a command generator function that satisfies the Lipschitz continuity property. .
The algorithm is expected to adopt a new type of discounted performance function, which includes both tracking error terms and expected trajectories and time-delay terms.Therefore, we constructed the following widening system. . where Furthermore, the novel discounted performance function is where Γ = [ψ(t), ψ(t − t 1 )] T , χ is the discount factor, with χ > 0 is a constant, and where Q i is a matrix that is positive definite, and t 1 satisfies t ≥ t 1 , and the semi-global uniform convergence in ( 7) can be ensured with t ≥ t 1 .

Virtual Control
In this part, based on the Hamiltonian function, which is established based on the discounted performance function, the virtual optimal controller u * (t) will be designed.
To obtain the tracking Bellman equation, we used the Leibniz rule and ( 9) to obtain .
Then, we moved the right-hand side of Equation ( 10) to the left-hand side of the equation and substituted it into Equation ( 8) to finally obtain Equation ( 11) In addition, we designed the optimal cost function in (12) from the following conditions should be guaranteed Based on (11) and [53], and the finite-time convergence theory [34], the optimal control input defined as According to (13) and [44,53], the ideal optimal control input is abbreviated as where Then, together with ( 14), ( 9) can be written as the following form where The Hamiltonian function can be written as the following form Furthermore, ( 17) can be written as To deal with the challenges brought by online tracking control, the optimal value L * 1 should be solved using (17).Furthermore, the optimal control policy u(L * 1 ) is shown in (14).

State Time Delay
Choosing the appropriate LKFs solves the problem caused by the state time delay, which laid the foundation for the application of the IRL algorithm.
According to Assumption 2 in [36] and Remark 5 in [15], the IRL method can be used to solve the L * 1 , only when the function ∥G∥ ≤ b 3 (21) where b 1 , b 2,θ , and b 3 are positive constants, with 0 < θ ≤ n and ∆t = t 1 /n.Considering the function F(ψ) and the known function H(ψ) satisfy the Lipchitz condition, and Assumption 2, ( 19) and ( 21) can be guaranteed.However, the state time delay t 1 is uncertain, and the boundedness of (20) cannot be obtained.In addition, because of the uncertain state time delay, the uncertain function H(ψ(t − t 1 )) cannot be approximated using NN.
In order to better complete the controller design, the problem caused by the state time delay will be handled first.Defining the new function Θ 2 (t) as and ( 22) can be written in the following form where ∆t = t 1 /n, and both i and n are positive integers.
By Assumption 1, the mean-value theorem is introduced to H(ψ).Therefore, one obtains where The error function caused by ∆t can be obtained as Defining the augmented system states as then, we can write system (24) as follows To guarantee that system ( 27) is uniformly ultimately bounded (UUB), the following lemma is proposed.Lemma 3. If the dimension of the state vector matches that of the function Π(∆ψ), where Π(∆ψ(0)) = 0. ∆ .
converges to a compact set exponentially, where the Lyapunov function satisfies .

Proof.
Inspired by the research in [26], the following proof process is given.Defining the initial state of (28) as ∆ .ψ(0) = ∆ψ(t), we obtain If the system exponentially converges to a compact set, then where positive constants satisfy a 1 > 0, 0 < a 2 < 1, and a 3 ≥ 0.
Based on, ( 35) and ( 36), one has where Furthermore, we can obtain the Lyapunov function (23), and guarantee the UUB of system (23), which is composed by n subsystems similar to (28) .
Similarly, one has .
when n and a 1 are selected large enough, the ultimate boundedness of L 0 (∆ψ(t)) can be assured for any initial condition L 0 (∆ψ(t − t 1 )) within a bounded set, guaranteeing the UUB of system states are guaranteed.The proof is competed.□

Critic NN and Value Function Approximation
In summary, the boundedness of ( 19)-( 21) can be obtained.The IRL method will be extended to the solution of L * 1 in the following section.When the IRL interval is choose as T > 0, (8) can be written as the following form Assuming that ( 8) is a continuous smooth function, L 1 and its gradient ∂L 1 /∂ψ are approximated as where ω c ∈ R l c is the constant-target online estimate parameter vector, in which l c is the quantity of neurons within the neural network, and φ c and ε c are the activation function of the critic NN and approximate error, respectively.
When the IRL interval is T > 0, the Bellman equation induced in the critic NN estimated value, can be expressed as where The constraint on ( 46) can be derived based on Assumption 4, i.e., ∥z B ∥ ≤ z B .
To derive the approximate tracking Bellman function, the approximation of the neural network is evaluated to obtain where ωc is the estimation of the critic law ω c .Therefore, the estimation of ( 46) is where the reinforcement learning reward is denoted by To reduce the approximation error, we give a function in (51) from We can obtain the following expression using the Gradient descent method. .
where α c represents the learning rate of the critic neural network.

Action NN and Controller Design
According to (45), the optimal control input, i.e., ( 14) is as follows To solve the issue in the tracking HJB induced ∂ε c ∂Ψ , we obtain In addition, we obtain Equation ( 16) becomes Then, ( 46) can be rewritten as where To obtain the limitation of the HJB proximity error, we can use the boundary proximity error.In addition, when NN is selected, the construction cannot be changed.We can only solve this problem by uncertain weights of NN.
Approximating the control input ( 55) by critic NN, we have where ωc is the estimated value of ω c .However, the function of ( 61) is only to estimate the current critical NN weight, which fails to keep the system (1) stable.Hence, to guarantee the stability of the system and solve the optimal control strategy, we introduce another NN as the action NN.
where ωa represents the weight vector of the action neural network, denoting the present evaluation value of ω c .Then, the interval IRL Bellman equation error is estimated as where Therefore, (52) can be rewritten as defining the input assessment error as To minimize (66), we use the following formula We can utilize the gradient descent method to derive the following equation where Ξ ′ = RΞ, and η is a positive design variable.

Stability Analysis
According to the proposed lemmas and assumptions, the following theorem is given to analyze the effectiveness of the proposed algorithm.
Theorem 1.Based on the definition in [43], Lemma 1-4, Assumptions 1-4, and the design of the proposed control policy (62) and the control laws, (65) and (68), the proposed optimal tracking control algorithm ensures that the partially uncertain nonlinear system (1) is SGPFS.
Proof of Theorem 1.The candidate function of the Lyapunov function is designed as where as the optimal value function.Then, the provided expressions are applicable: Based on ( 24) and ( 31), the first derivative of L 2 can be given as .
Then, (76) can be written as .
Based on (54) we have Then, the approximate of ( 54) is .
And, for the first difference of (71) .
Then, the first derivative of L 3 is . where By using Cauchy's mean value theorem, we have .
Above all, the first difference of (69) . where To make the finite-time convergence, we deal with the equation and add and subtract several terms on the right side To make the system (1) stable, and the finite time, we consider Lemma 1.Therefore, the constant must be greater than zero.
Furthermore, by using (62), we obtain the optimal control strategy, which guarantees that the target nonlinear system system's state and the stability of input delays are maintained, and that the tracking error converges to a sufficiently small neighborhood around zero.
The proof is completed.□

Results of Simulation Example
The WMR system [54] in Figure 2 illustrates the effectiveness of the proposed algorithm.
where m represent the robot mass, and I denotes the rotational inertia around the motion center.β w is the angle between the robot speed and the x m axis, and o is the angle of inclination of the ground environment where the robot is located.Then, rewrite (102) as a vector form where, according to [55], we have M = m cos β w 0 0 Considering the symmetry of the quality matrix and incorporating the time delay in the state, the state-space form can be used to represent the dynamic system of WMR.Depending on the actual WMR system, the initial values are β w (0) = [0, 0] T , rand(1, 4) and rand(1, 4), and α c = 0.13, α a = 0.12, λ = 0.05, γ = 0.10 and R = 1, Q = [1, 0; 0, 1] in this simulation.Then, the following simulation results are presented.With the state time delay handled by appropriate LKFs, the impact of delay is successfully suppressed.From Figures 3 and 4, we can obtain that the tracking performance of the proposed algorithm has good tracking performance.In addition, the adaptive update of the critic and the action can be reflected in Figures 5 and 6, which ensures the boundedness of the adaptive law.Moreover, the tracking trajectory of the WMR is shown in Figure 7.According to the process, the signal in the wheeled mobile robotic system is SGPFS.Compared with previous work [56], with similar control effects, this paper additionally considers finite-time control and the final simulation results achieve finite-time convergence, reflecting the control advantages of the proposed algorithm.

Conclusions
A finite-time adaptive online optimization tracking control algorithm was suggested for nonlinear systems incorporating state time delays.By using appropriate LKFs, the issue arising from time delays in both state and input variables has been resolved.Then, a novel nonquadratic HJB function was defined, where finite time was selected as the upper limit of integration, which contains information of the state time delay on the premise of containing the basic information.With the premise of meeting specific requirements, the ideal HJB function was solved by using the IRL method.Furthermore, the SGPFS was guaranteed with the definition of the optimal control policy and the update of the adaptations of the critic and action NNs.

Figure 1 .
Figure 1.Finite time convergence adaptive optimal tracking control algorithm structure drawing.

Figure 2 .
Figure 2. The structure of the wheeled mobile robot.

Figure 3 .
Figure 3. Tracking trajectories of the states.

Figure 5 .
Figure 5.The adaptive laws of the action NNs.

Figure 6 .
Figure 6.The adaptive laws of the critic NNs.

Figure 7 .
Figure 7. Tracking trajectories of the position.