Event-triggered constrained control using explainable global dual heuristic programming for nonlinear discrete-time systems

This paper develops an event-triggered optimal control method that can deal with asymmetric input constraints for nonlinear discrete-time systems. The implementation is based on an explainable global dual heuristic programming (XGDHP) technique. Different from traditional GDHP, the required derivatives of cost function in the proposed method are computed by explicit analytical calculations, which makes XGDHP more explainable. Besides, the challenge caused by the input constraints is overcome by the combination of a piece-wise utility function and a bounding layer of the actor network. Furthermore, an event-triggered mechanism is introduced to decrease the amount of computation, and the stability analysis is provided with fewer assumptions compared to most existing studies that investigate event-triggered discrete-time control using adaptive dynamic programming. Two simulation studies are carried out to demonstrate the applicability of the constructed approach. The results present that the developed event-triggered XGDHP algorithm can substantially save the computational load, while maintain comparable performance with the time-based approach.


Introduction
Optimality is one of the most significant properties of a control system. The optimal control problem can be solved using the Hamilton-Jacobi-Bellman (HJB) equation. However, until now, there is no effective way to analytically solve the HJB equation for nonlinear systems [1,2]. Nevertheless, adaptive dynamic programming (ADP) offers a promising tool to attain satisfying numerical solutions by incorporating artificial neural networks (ANNs), which has been applied to a wide range of nonlinear industrial applications [3][4][5][6][7]. As a branch of reinforcement learning (RL), ADP approximately addresses optimal control problems by iterations between policy improvement and policy evaluation [8,9]. When dealing with the discrete-time (DT) optimal control problem, obtaining the current control policy usually relies on the control performance at the next time step [7,9]. This bootstrapping property [9] can be addressed by the actor-critic scheme using two separate ANNs that respectively improves and evaluates the policy.
When multiple ANNs are involved, ADP is often called adaptive critic design (ACD) [7]. Based on the information utilized by the critic network, ACDs can be categorized into heuristic dynamic programming (HDP), dual HDP (DHP), and global DHP (GDHP) [4,7]. GDHP combines the information of cost function and its derivatives, and recently has attained much attention [10,11,4,12]. The most common architecture of GDHP is the straightforward form that approximates the cost function and its derivatives simultaneously [10,11]. However, as claimed in [4], in this structure two kinds of outputs of the critic network share the same input and hidden layers, making them strongly coupled. Without analytical calculations, the approximated cost function and its derivatives can suffer from inconsistent errors. With the development of artificial intelligence (AI), there is an emerging need for understanding how strategies are made by AI methods, which arouses explainable AI (XAI) [13]. Following this idea, [4] introduces explicit analytical calculations to the GDHP technique, which makes it more explainable to designers because the approximate cost derivatives are explicitly computed from the approximate cost function. This explainable GDHP (XGDHP) algorithm has shown its applicability in aerospace control systems [4,12,14]. However, matrix dimensionality transformations, a.k.a. tensor operations, are involved in these studies, making it complicated to implement.
Besides, in practical applications, due to physical limitations or safety considerations, handling input constraints is a common demand for control systems [15]. A classic approach is to design a non-quadratic cost function, such that the control inputs obtained by solving the HJB equation is limited by a symmetric bounded function [16]. However, although there are many researches aiming at dealing with symmetric input constraints in nonlinear optimal control problems [15,17,12,16,18], little attention has been paid to the situation subject to asymmetric input constraints. Motivated by the industrial need, Yang et al. [19] managed to cope with the asymmetric input constraints by adjusting the cost function with the mean and range of the control input constraints. However, there are two limitations in their proposed method: 1) when the system states go to zero, the control inputs are still non-zero values, specifically, the mean values of the constraint range; 2) when the control inputs go to zero, the cost caused by the control inputs is not zero. Consequently, this approach is not applicable for the stabilization problem with an origin equilibrium point, which inspires our study.
Furthermore, in order to maintain the system stability, a significant number of iterations within a sampling interval are normally required for ACDs, which result in a high computational cost [20]. To enhance the resource utilization and reduce the computational burden, event-triggered control (ETC) has been evolved as an alternate control paradigm and acquired more attentions in recent days [2,20]. ETC is originally proposed in the networked system to deal with the limitation of communication bandwidth [21][22][23][24]. These researches target solving communication issues such as synchronization, time delays and disturbances, rather than pursuing optimality or tackling control input constraints. A cross fertilization of ETC and ACD leads to the event-triggered ACD that targets for solving optimal control problem in an event-driven manner [2,8]. Most event-triggered studies focus on continuous-time systems [25][26][27] and only a few articles discuss the DT system. The HDP algorithm is combined with the ETC in [20,[28][29][30]18,2] describe the eventtriggered DHP algorithm. Although [11] applies the eventtriggered GDHP algorithm to a network control scenario, till now there is no related research on event-triggered XGDHP. Among them, only [18] attempts to deal with symmetric input constraints merely using the non-quadratic cost function, which however is not rigorous because the control input is directly generated by the actor network that is not bounded. Furthermore, the essence of the ETC scheme lies in that a task is executed only if a predefined triggered condition is satisfied. Therefore, defining a sound triggering condition is always the primary task for the ETC scheme. For the nonaffine system, the same triggering condition is employed in [2,28,18,11] and among them [2,28,18] provide the stability analysis regarding the triggering condition. However, in [2] an extra assumption that the state norm is bounded by the supremum of control input norm is required, whereas in [28,18] the input-tostate stability (ISS) Lyapunov function is directly assumed to exist without pointing out its specific form and additional hyperparameters are involved in [18]. These limitations prevent the proposed triggering condition from wider applications.
Motivated to tackle the limitations existing in literature, we conduct this research by concentrating on the event triggered XGDHP algorithm subject to asymmetric input constraints. The contributions are summarized as follows: 1. XGDHP is developed to solve optimal control problems online.
Compared to [4], the XGDHP approach developed in this paper simplifies the calculation by eliminating matrix dimensionality transformations. 2. To the best of our knowledge, it is the first time that the asymmetric control input constraints are overcome for zeroequilibrium-point stabilization problems. The combination of a novel segmented utility function and the bounding layer of the actor network guarantees strictly bounded inputs without affecting stability.
3. An event-triggered mechanism is introduced to save computational and communication's load. It is the first time that ETC is combined with XGDHP for DT systems. Compared to existing literature, fewer assumptions are required to guarantee the stability of the triggering condition and a more specific proof is provided, which demonstrates the advantage for wider applications.
The remainder of this paper is organized as follows: Section 2 states the event-triggered optimal control problem with asymmetric input constraints for the general nonlinear DT system. The triggering condition and the stability analysis of the system are provided in Section 3. Section 4 introduces the iterative XGDHP algorithm with the facilitation of three ANNs. The simulation verification is presented in Section 5 by applying the proposed approach to two nonlinear DT systems and Section 6 summarizes this paper and discusses further research.

Problem description
Consider a general nonlinear DT system described by: where t denotes the time instant, x t 2 X & R n is the state vector, and u t 2 X u is the control input vector.
. . . ; mg, with u min < 0 and u max > 0 denoting the minimum and maximum constraint of u i , respectively. ju min j -ju max j, i.e., the input constraints are asymmetric. Assumption 1. System (1) is controllable and observable. f : R n Â R m ! R n is a Lipschitz continuous function and assumed unknown. The origin x t ¼ 0 is the unique equilibrium point of the system (1) under u t , i.e., f 0; 0 ð Þ¼0.
Assumption 1 implies that there exists a continuous state feedback control policy u t ¼ l x t ð Þ; l : X ! X u that can stabilize system (1) to the equilibrium point.
Considering the event-triggered scheme, we define a sequence of triggering instants s k f g 1 k¼0 , with s k satisfying s k < s kþ1 ; k 2 N. The control input is only updated at the triggering instant when a certain triggering condition is satisfied, and remains constant during the time interval s k ; s kþ1 ½ Þby involving a zero-order hold (ZOH) [28,29]. Therefore, a gap function can be defined using the event error: where x t is the current state and x s k is the triggering state held by the ZOH. Subsequently, the feedback control policy can be represented as: Accordingly, system (1) takes the form: Considering the characteristics of system (4), we introduce a discounted cost formulated as: where c 2 0; 1 ð is the discount factor, and U x t ; l x s k À Á À Á is the utility function. For the regulation task, U x t ; l x s k À Á À Á is supposed to satisfy U x; l ð ÞP 0 and U 0; 0 ð Þ¼0. Therefore, we define U x t ; l x s k À Á À Á as followings: where Q 2 R nÂn is a symmetrical positive definite matrix, and Y l x s k À Á À Á is a positive semi-definite function that satisfies Remark 1. The discount factor c indicates the extent to which the short-term cost or long-term cost is concerned [4,12]. For the regulation task, given Assumption 1, c 6 1 can hold because of the origin equilibrium point, whereas for tasks where the equilibrium point is not the origin, c < 1 must be satisfied to guarantee the cost function is finite [31,32].
The input constraints are asymmetric, which cannot be handled by the integrand function utilized in [15,33,16], and for the regulation task, the modified function proposed in [19] is not applicable.
Inspired by these studies, we design Y l x s k À Á À Á Þ as the following novel piece-wise integrand function: where tanh ÀT Á ð Þ stands for tanh À1 Á ð Þ T , and tanh À1 Á ð Þ is the inverse function of the hyperbolic tangent function tanh Á ð Þ, both of which are monotonic odd.
and that only when l x s k Our target is to search for a feedback control law l to minimize the designed discounted cost function (5). On the basis of Bellman's principle of opitmality [34], the optimal cost function J Ã x t ð Þ conforms to the DT HJB equation: The optimal control law l Ã x s k À Á at time instant t is accordingly defined as: It is worthy mentioning that l Ã x s k À Á is the optimal feedback control law for the sampled state x s k at the triggering instant s k , instead of the current state x t . To obtain appropriate triggering instants for system (4), we define a triggering condition as follows: where e Thr is the threshold to be determined. Therefore, it is a primary task for event-triggered control to design a sound threshold, which will be discussed in the next section.

Event-triggered system analysis
In this section, the triggering condition for the DT system is developed and the ISS analysis is carried out. First of all, the following assumption is necessary [28,2]: and jje t jj satisfies jje t jj 6 jjx t jj.
Lemma 1. If Assumption 2 holds, the triggering condition can be defined as follows: Proof. Regard s k as the last triggered instant. According to Assumption 2, for each t 2 s k ; s kþ1 ½ Þ , we have: Substituting (11) into (13) yields: With (2), (14) can be rewritten as: Therefore, by conducting back-forward recursion, we obtain the following inequality: By solving (16) with initial condition e s k ¼ 0, we attain: If (17) is violated, i.e., (12) is satisfied, the event is triggered. This completes the proof.
It is noted that the threshold value e Thr is not unique since it is influenced by the triggered state x s k and the designed constant C that is usually chosen experimentally. Subsequently, inspired by [30], we proceed to prove the system (4) is asymptotically stable under the triggering condition (12).

Definition 1. [2]
A continuous function V : R n ! R þ is called an ISS Lyapunov function for the system (4), if there exist K 1 functions a 1 ; a 2 , and a 3 , and a K function q, such that: hold for all x t 2 R n and e t 2 R n .
Theorem 1. With Assumption 2 and the triggering condition (12), the event-triggered system is input-to-state stable and is asymptotically stable.
Proof. The following proof only takes the situation that the event is not triggered at the time instant t þ 1 into consideration, because when the event is triggered, the control input will be updated, and it will be equivalent to the time-based control at t þ 1. According to the optimal control theory, the stability can be guaranteed at this single instant. We firstly define a Lyapunov function as followings: Then, we define a series of functions as follows: where Q 1 and Q 2 can be selected to satisfy (18), and the vector q in (23) and (24) can be determined from (6) as Q ¼ qq T . Therefore, Subsequently, the proof is conducted by presenting that (20) is an ISS Lyapunov function and it is non-increasing, i.e., For all t 2 s k ; s kþ1 ½ Þ , according to the ETC mechanism, , and therefore, we have Substituting (11) into (25) yields: According to the Cauchy-Schwarz inequality, (26) becomes: Consequently, referring to Definition 1, (20) is an ISS Lyapunov function. According to [18,2], a system is input-to-state stable if it admits a smooth ISS Lyapunov function.
Then, considering (2) and (17), (27) continues as: Since C < 0:5 and 4C 2 À 1 ¼ 2C À 1 ð Þ2C þ 1 ð Þ , the last inequality in (28) can be rewritten as: where DV ¼ 0 if and only if jjx s k jj ¼ 0, which implies that the system has been stabilized since the time instant s k . Overall, we can conclude that the event-triggered system (4) is input-to-state stable and is asymptotically stable with the triggering condition (12), which completes the proof.
Remark 2. The triggering condition (12) has the same form of that in some existing literature [28,18,2,11]. Nevertheless, different from them, fewer assumptions are required to guarantee the asymptotic stability. Furthermore, [28,18] assume the existence of an ISS Lyapunov function without providing its specific formula, whereas in this paper the ISS Lyapunov function is specifically defined by (20)- (24).
The simple diagram of the ETC scheme is illustrated in Fig. 1.
Only when an event is triggered, will the XGDHP algorithm be activated and the control input be updated. In the next section, the detailed implementation of the XGDHP algorithm will be presented.

Event-triggered iterative ACD with The XGDHP technique
In this section, according to the universal approximation property of ANNs [20], we first construct a model network, represented by subscript m, to identify the system dynamics. Then, the eventtriggered iterative adaptive critic algorithm is introduced, and the actor and critic networks, respectively represented by subscript a and c, are built to facilitate the implementation. The XGDHP technique is developed based on explicit analytical computations in the critic network, and the asymmetric input constraints are addressed by modifying the output layer of the actor network.
All ANNs are constructed with the full-connected feed-forward architecture, and their hidden layers respectively has l m ; l a , and l c neurons, all of which adopt a sigmoid function as the activation function: whose derivative is r0 s ð Þ ¼ 0:5 1 À r s ð Þ ð Þ 2 .

The model network
Since the system dynamics is unknown, a model network is built and trained in advance before implementing the XGDHP technique. The model network is constructed offline to identify the dynamics and predict the next state as follows: in which w m2 2 R lmÂn ; w m1;x 2 R nÂlm , and w m1;u 2 R mÂlm are ideal weight matrices of the model network, and e m;t 2 R n is the reconstruction error. Subsequently, by defining the identification scheme is described as: wherex tþ1 ;ŵ m1 , andŵ m2 are the estimations of x tþ1 ; w m1 , and w m2 , respectively. The model network is supposed to minimize the identification errorx m;tþ1 ¼x tþ1 À x tþ1 , and therefore the target performance measure is defined as: The weight tuning law is designed obeying a gradient-descent algorithm: where g m > 0 is the learning rate, and Dŵ m1 and Dŵ m2 are the differences of two subsequent updating steps. After a sufficient training session, the model network can achieve a satisfying precision, with the weight matrix converging to a constant value. It is important to note that after training, the model weight matrix is kept unchanged for controller design. With the model network, the necessary partial derivative information can be obtained for training critic and actor networks.
Considering the event-triggered framework, by replacing u t with l x s k À Á in (32) and taking the partial derivative with respect to x t and l x s k À Á , respectively, we get: By denoting F t ¼ @x T tþ1 =@x t and G t ¼ @x T tþ1 =@l x s k À Á , we can approximate (4) as a new affine system: With (37), the optimal event-triggered control law l Ã x s k À Á can accordingly be approximated as: where / Á ð Þ is a one-to-one piece-wise function defined as: andD Ã x s k À Á is described by: in whichk Ãx is the costate function. It can be found that / Á ð Þ is at least second-order continuous. Accordingly, the approximate DTHJB equation takes the form:

Iterative adaptive critic algorithm
Through (38) and (41), it can be found that the computation of b J Ã x t ð Þ andl Ã x s k À Á requires the future information. Clearly, although the system dynamics has been identified, this bootstrapping phenomenon [9] makes it intractable or impossible to obtain the analytical solution of the DTHJB equation for nonlinear systems. Consequently, we introduce an iterative adaptive critic algorithm with the XGDHP technique to iteratively solve it.
The procedure of the DT iterative adaptive critic algorithm is briefly depicted in Algorithm 1, where b DJ > 0 is a designed threshold and i 2 N denotes the iteration index. It is worth mentioning that only the situation that jje t jj > jje Thr jj, i.e., t ¼ s k , is considered, because the control input is updated only at the triggered instant.
The main idea is to construct two iterative sequences J i ð Þ x s k À Á n o and l i ð Þ x s k À Á È É to perform the value iteration process so as to achieve approximately optimal values [2,9].

Algorithm 1:Iterative Adaptive Critic Algorithm
The convergence analysis of the DT iterative adaptive critic algorithm has been carried out in [34,17,35] and thus is omitted here. The core procedure is to prove that J i ð Þ x s k À Á n o is a nondecreasing sequence with an upper bound b J , i.e., Accordingly, we can further derive that the iteration between the sequences (42) and (43) guarantees the convergence to the optimal values for both sequences, i.e., Nevertheless, in practical implementation, the satisfying convergent results can already be observed when the iteration index i is sufficient large, rather than infinite.
Subsequently, for carrying out the iterative adaptive critic algorithm, the actor and critic networks are constructed to respectively approximate the control law and the cost function in the following subsections. The derivation presents the calculations in one iteration step and therefore the superscript is omitted for simplicity.

The actor network
For building a direct differentiable mapping from the state to the control input, the actor network is constructed, whose output is directly introduced to the model network and the real system. Inspired by [31,12,14], to guarantee the asymmetric input constraints, a bounding layer is connected to the original output layer of the three-layer network. In this bounding layer, the aforementioned function / Á ð Þ that is defined in (39) is adopted as the activation function. Fig. 2 illustrates the architecture of the actor network, and its output is presented as: whereŵ a1 2 R nÂla andŵ a2 2 R la Âm are the estimations of the ideal weight matrices w a1 2 R nÂla and w a2 2 R laÂm , respectively. Based on (42) and (45), the performance to be minimized for the actor network can be defined as: Similarly, with a learning rate g a > 0, the weight matrices are updated by: Remark 4. The combination of the segmented utility function and the bounding layer of the actor network is one of the highlights of this paper. With the segmented utility function, a target policy within the designed asymmetric range is provided to the actor network to learn. Besides, the bounding layer is necessary because the signall x s k À Á , which is an output of the actor network, is directly utilized to control the system.

The critic network
For the conventional GDHP technique, the critic network outputs the approximation of cost function and its derivatives simultaneously [11,10], whose description is as follows: whereŵ c1 2 R nÂlc ;ŵ c2;J 2 R lc , andŵ c2;k 2 R lc Ân respectively denotes the estimation of the ideal weights w c1 2 R nÂlc w c2;J 2 R lc , and w c2;k 2 R lc Ân . However, due to the inevitable approximation error, b J x s k À Á andk x s k À Á approximated in this way cannot exactly provide the derivative relationship, which is called suffering from the inconsistency error [4].
Therefore, inspired by [4,12], a novel XGDHP technique that takes the advantage of explicit analytical calculations is developed, with the critic network only approximating the cost function as follows: whereŵ c2 2 R lc is the estimation of the ideal weight matrix w c2 2 R lc . By taking the explicit analytical calculations, we obtain k x s k À Á as: where is the Hadamard product. XGDHP makes use of the cost function and its derivative information, so recalling (43), the critic network is expected to minimize the following performance measure: where b is a scalar within a range of 0; 1 ½ . If b ¼ 1, it becomes pure HDP, whereas if b ¼ 0, then the weight matrix is tuned merely based on the computed derivativesk x s k À Á , and consequently it is equivalent to DHP [12].
Different from [2,11], we also take the partial derivative of l x s k À Á with respect to x s k into consideration in the critic network updating procedure for more precise calculations. According to the chain rule, (52) can further be derived as: where @l x s k À Á =@x s k is computed with the facilitation of the actor network, while @x s k þ1 =@x s k and @x s k þ1 =@l x s k À Á are computed through the model network.
Given a learning rate g c > 0, the weight updating algorithm is conducted by: and @Ec;s k where @k x s k À Á =@ŵ c2 and @k x s k À Á =@ŵ c1 are the second-order mixed gradients of the cost function b J x s k À Á . To compute @k x s k À Á =@ŵ c2 and @k x s k À Á =@ŵ c1 , Kronecker product and thus tensor operations are involved in [4,12,14], which result in the need for matrix dimensionality transformation. In this paper, we develop a simpler computation method as follows: Through mathematical derivation, it can be found that (57) is equivalent to the method proposed in [4].
The closed-loop stability and the convergence of weights of ANNs can be found in [28]. Note that the weights between the input layer and the hidden layer of these networks are also updated, which is the same as in [2,18,10,11,31] but different from [29,28] where they are fixed after the initialization. Nevertheless, the update behaviours in these methods obey the same gradient descent logic and similar rules. The update is proved successful through the simulation studies in this paper and others [2,18,10,11,31]. In practice, one can manually set a constraint for the weights to guarantee the boundedness and safety.
Overall, the structural diagram of the present XGDHP implementation is depicted in Fig. 3 to clarify the design procedure, where DER is given by:

Simulation studies
In this section, two simulation studies are carried out to illustrate the feasibility of the developed approach and compare the performance of the event-triggered XGDHP with the time-based approach.

Example 1
Consider the following nonlinear affine mass-spring system [17]: where x t ¼ x 1;t ; x 2;t ½ T 2 R 2 and u t 2 X u ¼ u t ju t 2 R; À0:5 < u t < 0:2 f g . The parameters in the utility function are selected as Q ¼ I 2 and R ¼ 1, and the forgetting factor is chosen as c ¼ 0:995.
In what follows, we perform the proposed event-triggered XGDHP algorithm with the facilitation of ANNs. All ANNs are con-structed with 8 hidden neurons, i.e., l m ¼ l c ¼ l a ¼ 8. Their weight matrices between the input layer and the hidden layer are initialized within À1; 1 ½ , and the weights between the hidden layer and the output layer are randomly initialized with the uniform distribution within À0:01; 0:01 ½ . The initial weights of the actor and the critic networks utilized to present results are provided in AppendixA. The learning rates are experimentally set as First of all, we employ 500 data samples to train the model network for 500 times, and then utilize another 500 data samples for testing. The identification errors of the model network of testing samples are illustrated in Fig. 4, from which, we can see that the mean sum of squares of the identification errors is below 1:4 Â 10 À3 . Therefore, we can say that the model network with high accuracy has been obtained. After training, the weights of the model network are kept unchanged for controller design.
Next, we start the controller design procedure. It is noted that the simulation of the control algorithm is conducted in an online manner, which means that the control policy improves as it is applied to the real system. Through setting C ¼ 0:12, we can accordingly obtain the triggering threshold e Thr as: If the condition jje t jj 2 > jje Thr jj 2 the controller will be updated, and the triggering state x s k is reset with current state. Before the occurring the next triggering event, the control input u t is remained by ZOH as u s k . Different from all other works that applies ETC to DT systems using ADP algorithms [29,18,2,28,30,11], the actor and critic networks are not updated until an event is triggered in this paper so as to further reduce computational burden. For the XGDHP technique, we set b ¼ 0:5 to combine the information of the cost function and its derivatives. For ensuring sufficient learning, the prespecified accuracy b DJ is set to be 10 À4 , and during each time step that is triggered, at most 1000 internal cycles for training the critic and actor networks are included to achieve satisfying performance.
With the initial state chosen as x 0 ¼ 1; À1 ½ T , we conduct the proposed event-triggered XGDHP algorithm in comparison to time-based XGDHP algorithm. Both control algorithms share same settings and parameters except for the triggering mechanism. The simulation results corresponding to the systems state and the control input are depicted in Figs. 5 and 6, respectively. Due to the event-triggered mechanism, the event-triggered XGDHP algorithm presents a stair-steeping control input signal. With the piece-wise integral function (7) and the bounding layer in the actor network, it can be observed that the control input is bounded within the range of À0:5; 0:2 ð Þ . Therefore, we can say that the asymmetric control input constraints have been addressed. The evolution of the weights of ANNs is depicted in Fig. 7, where solid lines denote the weights between the input layer and the hidden layer while the weights between the hidden layer and the output layer are represented by dashed lines. It can be observed that all weights eventually converge to constant values during the online learning process. The evolution of the one-step cost and the accumulative cost in the learning process is demonstrated in Fig. 8. Utilizing fewer data samples, the event-triggered XGDHP requires 3 more steps to control the system achieving the stage where the onestep cost is kept below 0:05, and 7 more steps below 0:01. Because of the delayed control, the event-triggered XGDHP shows a greater overshot, which results in larger accumulative cost. Nevertheless, in many practical scenarios where saving computational resource is preferred, this depletion of the control effectiveness is acceptable.
In addition, the evolution curve of triggering threshold is depicted in Fig. 9, which converges to around zero along with the event error. The inter-execution time is illustrated in Fig. 10, which presents the time interval between two triggered instants. It is worth mentioning that the time-based controller requires every samples in this 100-step task, whereas the proposed eventtriggered approach only utilizes 60 samples. Since at each triggered instant, the critic and actor networks are trained for 1000 steps, the event-triggered approach greatly reduces the computational burden up to 40%.
Remark 5. The simulation results show that the system is gradually stabilized. Nevertheless, it is noted that due to the asymptotic stability property, the system states may not exactly converge to zero, which makes the ETC scheme keep working during the whole presented time range, as depicted in Figs. 9 and 10. This phenomenon is because the triggering condition described by (12) is dependent on the system states. As the stabilization continues, the triggering threshold will accordingly adapt to a stricter value to guarantee the precision. In practice, a threshold can be set for the controller, such that, when the system states reach a certain range, the controller can be deactivated to further save resources.

Example 2
The second numerical example considered is a nonlinear multiple-input-multiple-output nonaffine system [2] described by: x 2;tþ1 ¼ À0:17 sin x 1;t ð Þþ0:98x 2;t þ 0:1u 1;t ; x 3;tþ1 ¼ 0:1x 1;t þ 0:2x 2;t þ x 3;t cos u 2;t ð Þ; where The settings for the second system are similar to those in Example 1. The parameters in the utility function are chosen as Q ¼ I 3 and R ¼ 0:01I 2 , and the forgetting factor is set as c ¼ 0:95. Accord- Fig. 5. Evolution of the system state in the online learning process for Example 1. Controllers aim at stabilizing system states initially from x0 ¼ 1; À1 ½ T to 0.   7. Evolution of the weights of ANNs in the online earning process for Example 1. Subscripts c and a denote the critic and the actor networks, respectively. Subscripts 1 and 2 denote the weights between the input and the hidden layers and the weights between the hidden and output layers, respectively.
ing to the dimensions of x t and u t , the model network is established with the structure of 5-10-3 while both of the critic and actor networks are built as 3-10-2, i.e., l m ¼ l c ¼ l a ¼ 10. The weights of all the three networks are initialized within À0:1; 0:1 ½ . The initial weights of the actor and the critic networks utilized to present results are provided in AppendixA. Letting g m ¼ 0:01, we train the model network for 1000 times using 1000 data samples and examine its performance on a testing data set of another 500 samples. From Fig. 11, it is evidently observed that the mean sum of squares of the identification errors has been decreased to less than 6 Â 10 À4 , which indicates the high accuracy of identification.
For the implementation of the XGDHP technique, we set b ¼ 0:5 and b DJ ¼ 5 Â 10 À6 , and train the critic and actor networks for at most 1000 steps with the learning rates of g c ¼ g a ¼ 0:001 at the triggered instant. The triggering threshold is obtained by setting C ¼ 0:15 as: By initializing the system state as x 0 ¼ 0:5; 0:5; 0:5 ½ T , we carry out the online control simulation to verify the performance of the proposed event-triggered XGDHP algorithm. The state trajectories of the event-triggered approach and the time-based approach are displayed in Fig. 12. Comparing the event-triggered and timebased approaches, we can observe that, although the eventtriggered XGDHP algorithm involves fewer calculations, the state eventually converges to the equilibrium point without obviously deteriorating the converge rate. The control inputs are bounded within À4; 2 ½ , whose curves are depicted in Fig. 13. As depicted in Fig. 14, the weights of both critic and actor networks are initialized randomly and updated as the controller works, and all weights eventually converge to constant values.
As to the optimal control performance, the event-triggered XGDHP takes 10 more steps to keep the one-step cost below 0:1 and 7 more steps below 0:01. Different from that in Example 1, although the time based approach converges faster, the accumulative cost of the event-triggered XGDHP is less than the time-based approach. This phenomenon is because the second example requires more oscillations before be stabilized. Since the timebased approach exerts control in each time step, due to the nearoptimal property, it shows more aggressive strategies and leads to larger accumulative cost, as illustrated in Fig. 15. Remarkably, Fig. 9. Evolution of the triggering condition in the online learning process for Example 1.    the control input is only updated 64 times in a total of 200 simulation steps with the event-triggered approach, saving up to 68% of computational load, which improves the resource utilization.
The evolution curves of triggering threshold and the interexecution time are illustrated in Figs. 16 and 17, respectively. All the simulation results uniformly verify the effectiveness of the event-triggered XGDHP control algorithm proposed in this paper.   Subscripts c and a denote the critic and the actor networks, respectively. Subscripts 1 and 2 denote the weights between the input and the hidden layers and the weights between the hidden and output layers, respectively.

Conclusion
In this paper, we develop an event-triggered optimal control algorithm that can deal with asymmetric input constraints for unknown nonlinear discrete-time systems. The stability of the event-triggered control system is analyzed based on the triggering condition with fewer assumptions than existing literature. Besides, the asymmetric input constraints are coped with by the combination of a piece-wise integral function and a bounding layer of the actor network. In addition, with the facilitation of artificial neural networks, the explainable global dual heuristic programming (XGDHP) algorithm is developed to online solve the nonlinear optimal control problem, and the calculations for the derivative of the cost function are simplified without matrix dimensionality transformations.
Two numerical studies are included to illustrate the feasibility and effectiveness of the proposed method. The experimental results present that the nonlinear system can successfully be stabilized with the asymmetric input constraints handled. Furthermore, compared to the conventional time-based approach, the developed event-triggered approach can stabilize these nonlinear systems with at most 10 time steps delayed, while significantly reducing computational burden up to 40% and 68%, respectively. The communication's load between the controller and the plant can also be saved. The results collectively demonstrate the applicability of the proposed approach.
This paper utilizes a triggering condition that is derived based on the state feedback scheme. However, in many practical systems, full-state feedback is infeasible. Therefore, further investigation into output-feedback control approaches is highly recommended.