Online approximate solution of HJI equation for unknown constrained-input nonlinear continuous-time systems☆
Introduction
Over the past several decades, H∞ optimal control problems for nonlinear systems have attracted intensive attention. Many remarkable results have been obtained in this filed [3], [4], [5], [10], [30], [31], especially the results reported in [4], [31]. In [4], Basar and Bernhard showed that the H∞ optimal control problem is equivalent to the minimax optimization problem, which is termed as two-player zero-sum games where the controller is a minimizing player and the exogenous disturbance is a maximizing one. In [31], by using the theory of dissipative systems, van der Schaft transformed the H∞ optimal control problem to the L2-gain optimal control problem. Nevertheless, the bottleneck for applying theories of the H∞ optimal control in practice still exists. This is mainly because the solutions of two-player zero-sum games and L2-gain optimal control problems are often required to solve the Hamilton–Jacobi–Isaacs (HJI) equations. It is well-known that Hamilton–Jacobi–Isaacs (HJI) equations for nonlinear systems are actually nonlinear first-order partial differential equations (PDEs), which are difficult or impossible to solve by analytical approaches.
Since accurate solutions of HJI equations are intractable to obtain, an increasing number of researchers pay their attentions to deriving approximate solutions of this kind of equations. In the past few years, adaptive dynamic programming (ADP) methods have been successfully used to solve HJI equations. The ADP approach was first introduced by Werbos [36]. After that, various ADP methods were proposed (see surveys [15], [35]). A distinct feature of the ADP approach is that it employs neural networks (NNs) to derive the approximate optimal control forward in time. Due to this feature, the curse of dimensionality can be avoided while applying ADP approaches to solve the Hamilton–Jacobi–Bellman/HJI equations [27]. In light of this advantage, ADP methods have been extensively utilized to solve HJI equations.
For discrete-time nonlinear systems, Mehraeen et al. [20] presented an offline ADP-based iterative approach to solve the HJI equation for zero-sum two-player games. By using the proposed method and Taylor series expansion, a sufficient condition for the convergence to the saddle point is obtained. After that, Liu et al. [16] developed a greedy iterative ADP algorithm to solve the HJI equations associated with zero-sum two-player games. Based on the algorithm, three NNs referred to as action NN, critic NN and disturbance NN can approximate the optimal control, the optimal value and the worst disturbance, respectively. Later, Zhang et al. [45] proposed an online ADP-based algorithm to learn the solution of the HJI equation for a class of H∞ control problems. By the algorithm given in [45], the prior knowledge of the nonlinear system is not required.
For continuous-time (CT) nonlinear systems, the HJI equations are often approximately solved by using reinforcement learning (RL), which is considered as a special case of ADP approaches by Werbos [37]. Abu-Khalaf et al. [2] introduced an offline RL-based algorithm to give the approximate solution of the HJI equation for constrained-input nonlinear systems. After that, Luo et al. [19] proposed an off-policy RL method to solve the HJI equation of H∞ control problems. Differing from [2], the algorithm given in [19] generated the system data by arbitrary policies rather than evaluating policies. Recently, Vamvoudakis and Lewis [33] introduced an online RL-based algorithm to solve the HJI equation for two-player zero-sum games. By using the algorithm, the actor, critic and disturbance NNs were tuned simultaneously. Distinct from the above online RL-based algorithm, Dierks and Jagannathan [7] developed a single online approximator-based scheme to solve the HJI equation. Based on the algorithm, only a single critic NN is employed to learn the solution of the HJI equation and the initial stabilizing control is not required. It should be mentioned that, prior knowledge of system dynamics is required to be available in [7], [33]. After that, Luo and Wu [38] presented a simultaneous policy update algorithm to solve the HJI equation arising in nonlinear H∞ control problems. By the proposed algorithm, the internal dynamics of nonlinear system is not required. Later, Liu et al. [17] employed the simultaneous policy update algorithm to obtain the approximate solution of the HJI equation for multi-player nonzero-sums with completely unknown dynamics. More recently, Johnson et al.[12] developed a projection algorithm to give the approximate solution of the coupled HJI equations for uncertain nonlinear CT systems.
To the best of authors’ knowledge, there are still no ADP-based algorithms proposed to solve the HJI equation for constrained-input nonlinear CT systems with unknown dynamics. In this paper, we develop a novel online ADP-based algorithm to learn the solution of the HJI equation for unknown constrained-input nonlinear CT systems. The present algorithm is implemented via an identifier-critic architecture, which consists of two neural networks (NNs): an identifier NN is utilized to estimate the unknown system dynamics and a critic NN is constructed to obtain the approximate solution of the HJI equation. An advantage of the present architecture is that the identifier NN and the critic NN are tuned simultaneously. With introducing two additional terms, namely, the stabilizing term and the robustifying term to update the critic NN, no initial stabilizing control is required. Meanwhile, the developed critic tuning rule not only ensures convergence of the critic to the optimal saddle point but also guarantees stability of the closed-loop system. In addition, Lyapunov’s direct method is utilized to demonstrate the uniform ultimate boundedness of the weights of the identifier NN and the critic NN.
It is significant to point out that, though our methodology in this work is in a similar spirit as [7], this paper extends the work of [7] to give an online approximate solution of the HJI equation for constrained-input nonlinear CT systems with unknown dynamics. Solving the HJI equation of unknown constrained-input nonlinear CT systems is more intractable than those with the knowledge of system dynamics regardless of control constraints.
The rest of the paper is organized as follows. Section 2 provides preliminaries of H∞ optimal control problems for constrained-input nonlinear CT systems. Section 3 presents the design of identifier NNs for unknown controlled systems with stability proof. Section 4 develops a single critic NN to approximate the solution of the HJI equation. Section 5 shows the stability analysis. Section 6 presents two numerical examples to verify the effectiveness of the developed method. Finally, Section 7 gives several concluding remarks and potential future extensions.
Notations: represents the set of all real numbers. denotes the Euclidean space of all real m-vectors. denotes the space of all n × m real matrices. In represents the n × n identity matrix. is the transposition symbol. Cm represents the class of functions having continuous mth derivative. When denotes the Euclidean norm of . When denotes the 2-norm of A, where represents the maximum eigenvalue of .
Section snippets
Preliminaries and problem statement
Consider the nonlinear CT system described by
where is the state, is the control input, and κ > 0 is the saturating bound. is the exogenous disturbance, and is the fictitious output, with and is the equilibrium point of the system.
Assumption 1 f(x), g(x), and k(x) are unknown smooth functions defined on . ω(t) ∈ L2[0, ∞), and it implies that there
Identifier design via dynamic NNs
According to [42], the first equation of system (1) can be represented by a dynamic NN as where is a Hurwitz matrix, and are ideal NN weight matrices, and is the NN function reconstruction error. The vector function is assumed to be n-dimensional with the elements increasing monotonically. The matrix function is assumed to be where is a constant matrix and ρi( · )
Approximate solution of the HJI equation via a single critic NN
According to the universal approximation property of NNs, the value function given in (30) can be represented by a single-layer NN on a compact set Ω as where is the ideal NN weight vector, is the activation function with and the set is often selected to be linearly independent, N0 is the number of neurons, and is the NN function reconstruction error. Meanwhile, the derivative of
Stability analysis
Before demonstrating the main theorems, we present several required assumptions. These assumptions have been used in [18], [21], [22], [24], [32], [34].
Assumption 5 The ideal NN weight Wc is bounded by a known positive constant WcM, i.e., ‖Wc‖ ≤ WcM. There exist known constants and such that for every . In addition, there exist known constants and such that for every . Assumption 6 There exist known constants bσ > 0 and such
Simulation results
In this section, two examples are provided to illustrate the effectiveness of the developed theoretical results.
Conclusions
In this paper, we have presented a new ADP-based algorithm which solves the HJI equation for constrained-input affine nonlinear CT systems in the presence of unknown dynamics. The algorithm employs an identifier-critic architecture. Based on the present algorithm, the identifer NN and the critic NN are tuned simultaneously. Meanwhile, no initial stabilizing control is required. A limitation of the present algorithm is that the system state is required to be available. In our future work, we
References (46)
- et al.
A novel actor-critic-identifier architecture for approximate optimal control of uncertain nonlinear systems
Automatica
(2013) - et al.
Universal approximation of an unknown mapping and its derivatives using multilayer feedforward networks
Neural Netw.
(1989) - et al.
Neural-network-based zero-sum game for discrete-time nonlinear systems via iterative adaptive dynamic programming algorithm
Neurocomputing
(2013) - et al.
An iterative adaptive dynamic programming algorithm for optimal control of unknown discrete-time nonlinear systems with constrained inputs
Inf. Sci.
(2013) - et al.
Integral reinforcement learning and experience replay for adaptive optimal control of partially-unknown constrained-input continuous-time systems
Automatica
(2014) - et al.
A single network adaptive critic (SNAC) architecture for optimal control synthesis for a class of nonlinear systems
Neural Netw.
(2006) - et al.
Multi-player non-zero-sum games: Online adaptive learning solution of coupled Hamilton–Jacobi equations
Automatica
(2011) - et al.
Neural-network-based robust optimal control design for a class of uncertain nonlinear systems via adaptive dynamic programming
Inf. Sci.
(2014) Intelligence in the brain: a theory of how it works and how to build it
Neural Netw.
(2009)- et al.
An iterative adaptive dynamic programming method for solving a class of nonlinear zero-sum differential games
Automatica
(2011)
Policy iterations on the Hamilton–Jacobi–Isaacs equation for state feedback control with input saturation
IEEE Trans. Autom. Control
Neurodynamic programming and zero-sum games for constrained control systems
IEEE Trans. Neural Netw.
Nonlinear H∞ Control, Hamiltonian Systems and Hamilton–Jacobi Equations
H∞ Optimal Control and Related Minimax Design Problems
Successive Galerkin approximation algorithms for nonlinear optimal and robust control
Int. J. Control
Optimal control of affine nonlinear continuous-time systems using an online Hamilton–Jacobi–Isaacs formulation
Proceedings of the 49th IEEE Conference on Decision and Control, Atlanta, GA, USA
The Method of Weighted Residuals and Variational Principles
An algorithm to solve the discrete HJI equation arising in the L2 gain optimization problem
Int. J. Control
Nonlinear two-player zero-sum game approximate solution using a policy iteration algorithm
Proceedings of IEEE Conference on Decision and Control and European Control Conference, Orlando, FL, USA
Approximate N-player nonzero-sum game solution for an uncertain continuous nonlinear system
IEEE Trans. Neural Netw. Learn. Syst.
Neural Network Control of Robot Manipulators and Nonlinear Systems
Reinforcement Learning and Approximate Dynamic Programming for Feedback Control
Reinforcement learning and adaptive dynamic programming for feedback control
IEEE Circuits Syst. Mag.
Cited by (68)
Reinforcement learning for robust stabilization of nonlinear systems with asymmetric saturating actuators
2023, Neural NetworksCitation Excerpt :Thus, the ADP methods (or rather, the PI algorithms) proposed in Abu-Khalaf and Lewis (2005), Vamvoudakis et al. (2016), and Wang et al. (2017) might not be suitable for unknown (or partially unknown) nonlinear systems with symmetric input constraints. To tackle the issue, the identifier NNs were employed to rebuild the controlled systems’ models (Yang et al., 2016; Zhang et al., 2020). Owing to the identification errors, the control performance was often degraded.
Event-driven H<inf>∞</inf> control with critic learning for nonlinear systems
2020, Neural NetworksObserver-Based H Tracking Control Scheme and Its Application to Robot Arms
2020, IFAC-PapersOnLine
- ☆
This work was supported in part by the National Natural Science Foundation of China under Grants 61034002, 61233001, 61273140, 61304086, and 61374105, in part by Beijing Natural Science Foundation under Grant 4132078, and in part by the Early Career Development Award of the State Key Laboratory of Management and Control for Complex Systems (SKLMCCS).