Event-triggered intelligent critic control with input constraints applied to a nonlinear aeroelastic system

Article history: Received 4 January 2021 Received in revised form 6 September 2021 Accepted 7 December 2021 Available online 11 December 2021 Communicated by Z. Weiwei


Introduction
Aeroelastic systems exhibit a variety of unstable phenomena, such as flutter and limit-cycle oscillations (LCOs), which can significantly degrade the flight performance of an aircraft [1][2][3]. For this reason, stable controller design for aeroelastic systems has been receiving considerable attention for decades in aerospace engineering research groups [2,[4][5][6]. Most current controllers are designed based on feedback linearization approaches [4]. However, with the rapid development of aviation technologies, these traditional methods show their limitations in dealing with stronger nonlinearities. Input nonlinearities such as saturation constraints commonly exist in real systems [7,8], but they are rarely investigated for aeroelastic systems in existing literature. Furthermore, complex systems usually involve multiple control loops closed through some communication mediums, which brings a growing interest in enhancing resource utilization [9]. Motivated by the demands for tackling these challenges, this paper aims at developing a constrained-input optimal control approach with reduced computational and communication's cost for nonlinear aeroelastic systems.
Optimal control problems pursue optimal control policies for dynamical systems by maximizing/minimizing a pre-defined performance function that captures desired objectives [10]. When dealing with optimal control problems, it is common to solve the Hamilton-Jacobi-Bellman (HJB) equation, but there are few effective approaches to obtain its analytical solutions for nonlinear systems [10,11]. Adaptive dynamic programming (ADP) provides a promising method to acquire numerical solutions of general HJB equations. By incorporating artificial neural networks (ANNs), ADP acquires a more powerful generalization capability and has been successfully applied to a variety of aerospace systems [12][13][14][15][16][17]. As a branch of reinforcement learning (RL), the principle of ADP lies in the effective iterations between policy improvement and policy evaluation [10,18], which are sometimes approximated by an actor network and a critic network, respectively [12,13,15]. However, for continuous-time (CT) systems, by solving the HJB equation, the single critic network (SCN) architecture is able to perform ADP with lower computational cost and eliminating the approximation error introduced by the actor network [8,19]. Different from the actorcritic architecture, where the input saturation constraints are addressed by the bounded output neurons of the actor network [13], the SCN structure ordinarily utilizes a non-quadratic cost function, such that the control inputs derived from the solution of HJB equation can be bounded by a hyperbolic tangent function [8,20,21].
Although time-based ADP approaches provide a mature and normative solution to nonlinear optimal control problems, the need for reducing transmitted data is not fully satisfied. Arising from networked control systems [22,23], event-triggered control (ETC) has attained a lot of attention in recent years because of its ability to reduce computational and communication's load [10]. A cross fertilization of ETC and ADP produces event-triggered ADP, which has successfully been implemented for optimal stabilization of both discrete-time systems [24,25] and CT systems [26,27].
The key attribute of the event-triggered mechanism lies in that the control signals are updated only when a certain condition is triggered [24]. Therefore, designing a sound triggering condition is the principal task of ETC. For CT systems adopting ETC methods, the inter-execution time can be zero, resulting in the accumulation of event times. This is the infamous Zeno phenomenon that must be avoided in the controller design. The related analysis is conducted in [28,29] without incorporating ANNs and in [9] without taking the input constraints into account. Based on these studies, the closed-loop analysis is carried out to ensure that the Zeno phenomenon is inapplicable to the proposed method.
Developing from the time-based ADP, event-triggered ADP methods inherit and continue most of the properties and techniques of the time-based ADP, including the technique for handling input constraints with a non-quadratic cost function [29,30]. However, in most existing literature, the triggering-condition is derived by involving a Lipschitz constant of the inverse hyperbolic tangent function without effectively narrowing its domain. Although satisfying experimental results can be obtained in certain circumstances, this derivation is not mathematically rigorous. According to [31], in which the actor-critic structure is adopted, this paper replaces this Lipschitz constant using meticulously mathematical transformations with the SCN architecture.
In addition, the initial admissible control is a requirement for both time-based and event-triggered ADP methods, which weakens their application, especially for closed-loop online learning control. Inspired by [9,32,29], an improved weight updating rule is designed by adding a stabilizing term based on the Lyapunov stability theory, such that the requirement of initial admissible control is eliminated.
The contributions of this paper are summarized as follows: 1. It is the first time that an ADP-based controller is developed for a nonlinear aeroelastic system. This paper develops a general control method that can be applied directly without making coordinate transformations. 2. A novel triggering-condition incorporating input constraints is derived without requiring the Lipschitz assumption on the inverse hyperbolic tangent function. 3. The demand for the initial admissible control is relaxed by an improved critic weight updating criterion. 4. The Zeno phenomenon is analysed and avoided regarding the closed-loop system with the event-triggered control strategy.
The remainder of this paper is organized as follows: Section 2 states the constrained-input optimal control problem for a CT nonlinear aeroelastic system under the event-triggered framework. Section 3 provides the implementation of the event-triggered controller using ANN, and analyses the closed-loop stability as well as avoids the Zeno phenomenon. The simulation verification is presented in Section 4, and Section 5 summarizes this paper and states further research.
The main notations used in what follows are listed. N is the set of all natural numbers. R denotes the set of all real numbers. R n indicates the Euclidean space of all n-dimensional real vectors. R n×m is the space of all n × m real matrices. | · | is the scalar absolute value and || · || is the norm of the corresponding vector or matrix. (·) − denotes the left continuity and (·) T represents the transpose operation. I n denotes the n × n identity matrix and 1 is a column vector with all elements equal to one. λ(·) and λ(·) respectively represents the maximal and minimal eigenvalues of a matrix. Denote as a compact subset of R n , u as a compact subset of R m , and A ( ) as the set of admissible controllers on . The symbol ∇(·) ∂(·)/∂x stands for the gradient operator.

Problem description
A typical aeroelastic wing section plant with two degrees of freedom is modeled in this section. Then, we describe the constrained-input optimal control problem of general nonlinear systems, and present the event-triggered control mechanism.

Aeroelastic wing section model
With the wide usage of composite materials, high aspect-ratio aircraft wing can suffer from aeroelastic instability phenomena, including the LCOs [3,33]. If not suppressed by active control, LCOs can lead to structural failure and even flight accidents [2]. The schematic of an aeroelastic wing section controlled by a single trailing-edge flap is illustrated in Fig. 1 [4], where c.m. is the abbreviation of center of mass. It has two degrees of freedom: the plunge displacement h and the pitch angle θ . In this problem, it is assumed in the undisturbed case, that the freestream is along the airfoil chord, and thus pitch angle θ is equal to angle of attack α.
Consequently, the governing expressions of motion are presented as [4,6]: where k h (h) and k α (α) respectively represents the plunge and pitch stiffness, which can be formulated by nonlinear polynomials as [2]: and the remaining constant parameters are listed in Table 1; L and M respectively denotes the aerodynamic force and moment, which are formulated in a quasi-steady form as [4]: where a = 0.5 − a, and β is the control surface deflection. As presented by (1) -(3), the motion dynamics of the aeroelastic system are nonlinear. To describe the system more profoundly, the properties of a simplified linear system are provided. By neglecting the nonlinear terms in (2), the flutter speed of the resulting linear aeroelastic system is 12.41 m/s. The natural frequencies of the corresponding linear undamped aeroelastic system are 9.11 rad/s and 13.28 rad/s. The complete flight control system often involves the actuator, which can be described as a first-order component [13]: where β c is the deflection command directly generated by the controller. In this case, the complete state vector is x 5 ] T = [h, α, ḣ , α, β] T , and the control input is u = β c . Besides, due to mechanical limitations, the control surface deflection always has constraints, which should be taken into consideration in the controller design process.

Optimal control design with input constraints
To provide a general description, we consider a class of nonlinear CT systems formulated by: where x(t) ∈ ⊂ R n is the state vector and u(x(t)) ∈ u is the control signal vector, and The initial state at t = 0 is x(0) = x 0 , and x = 0 is the equilibrium point of the system. System (5) is generally assumed to be controllable. For simplicity, we denote x(t) by x hereafter.
For system (5), an infinite-horizon cost function can be defined as: where Q ∈ R n×n is positive semi-definite and is set to be a diagonal matrix in this paper, and Y (u) is a positive semi-definite integrand function utilized to handle control input constraints.
where tanh −T (·) stands for (tanh −1 (·)) T , and tanh −1 (·) is the inverse hyperbolic tangent function, which is a monotonic odd func- Admissible control is a prerequisite of optimal feedback stabilization, such that the cost function J (x) is guaranteed to be finite. Choosing an admissible control law u(x) ∈ A ( ), and accordingly the Hamiltonian is defined as: The optimal value of the cost function given in (6) is: (9) and it satisfies the HJB equation: where tanh(·) denotes the hyperbolic tangent function, and D * is given by: The control input u * is bounded by u b , and the nonquadratic cost (7) regarding u * is: where ∇ J * T (x) denotes (∇ J * (x)) T and R = [r 1 , · · · , r m ] T . Substituting (11) and (12) into the HJB equation produces: with J * (0) = 0 that leads to H(x, u * (x), ∇ J * (x)) = 0.

Event-triggered scheme design
Considering the event-triggered scheme, we define a sequence of triggering instants {s k } ∞ k=0 , where s k satisfies s k < s k+1 with k ∈ N. The output of the sampled-data module is x(s k ) x k for all t ∈ [s k , s k+1 ). Subsequently, we define the gap function using the event error: We denote e k (t) briefly by e k hereafter. Everytime when a certain triggering condition is satisfied, the event-triggered state vector is updated and the event error e k is reset to zero. At every triggering instant (instead of time instant), the state feedback control law u(x(s k )) = u(x k ) is accordingly updated. By introducing a zeroorder holder (ZOH), the control sequence {u(x k )} ∞ k=0 actually turns to be a piecewise signal that remains constant during the time interval [s k , s k+1 ), ∀k ∈ N. Based on the control signal u(x k ), system (5) takes the form: Considering the event-triggered framework, combined with (12), the feedback control function (11) becomes: where D * k is given as: For system (5), with the infinite-horizon cost function represented by (6), we define a triggering condition as follows: where e T is the threshold to be determined. We say the event is triggered if (19) is satisfied, and in the following section, we present the details of how to determine the threshold.

Intelligent critic control implementation
Since (14) is a nonlinear partial differential equation intractable to be solved analytically, in this section an ANN with an improved updating rule is used to approximate the optimal control policy. Then, the system stability is analysed and the Zeno phenomenon is avoided regarding the closed-loop system.

Improved neural control implementation
In light of the powerful generalization property of ANNs, the optimal cost function can be reconstructed as follows: where w c ∈ R l c stands for the ideal weight, l c is the number of neurons, σ (x) c ∈ R l c denotes the activation function, and ε c (x) ∈ R represents the neural approximation error. Accordingly, the gradient vector of the optimal cost is: Since the ideal weight vector is unavailable in advance, a critic network is constructed to approximate the cost function with an estimated weight vector ŵ c ∈ R l c such that: (22) Similarly, we determine: Considering the ANN formulation (21), (18) can be rewritten as: Based on the mean-value theorem [21], we can accordingly rewrite (17) as: Hence, according to (17) and (23), the event-triggered approximate optimal policy can be formulated as: (26) and D k is modulated as: Substituting (25) is the residual error brought by the ANN. Utilizing (23), the approximate Hamiltonian is presented as: Defining the critic error vector as w c = w c −ŵ c and combining (28) with (29), we obtain an equivalent expression of e c : Hence, the aim of training the critic network is to obtain an appropriate weight vector ŵ c such that the objective function E c = (1/2)e T c e c is minimum. It is worth mentioning that the actual control law utilized during the learning process is the approximated control (26). In [30], a direct gradient-descent method is applied to adjust critic weight vector: where η c > 0 is the learning rate parameter, and φ =∇σ The admissible control is essential for the general ADP-based optimal control design but is intractable to obtain in advance. To overcome this challenge, inspired by [9,29,32], we bring in an extra stabilizing term to improve the direct gradient-descent method and adopt it to enhance the ANN weight updating. Similar to [9,21,32], we make the following assumption: Assumption 1. Consider system (5) with the cost function (6) and its closed-loop form governed by the event-triggered optimal controller (17) and (24). Let J s (x) be a continuously differentiable Lyapunov function candidate satisfying: Then, there exists a positive definite matrix M ∈ R n×n such that the following inequality holds: When adopting the event-triggered approximate optimal control (26), we should exclude the following case to guarantee the system stability: Hence, the learning performance is reinforced by adjusting the time derivative of J s (x) along the direction of the negative gradient, which is modulated as follows: where η s > 0 is the designed learning rate. By combining the stabilizing term (33) and the traditional rule (31), we established the improved ANN learning criterion as follows: where (x, û(x k )) is a sign function utilized to eliminate the effect of the reinforced term when the system is already stable, which is defined as: Remark 1. The improved updating rule (34) with the reinforced term relaxes the demand for initial admissible control, which implies that the critic weight vector can initially be set as any random vector.

Closed-loop stability analysis
We firstly construct the error dynamics of the critic network by defining w c = w c −ŵ c and finding that ẇ c = −ẇ c . Consequently, the critic error dynamics is presented as: Remark 2. The persistent excitation (PE) assumption is required. If the PE condition holds, we easily derive λ(φφ T ) > 0 [26], which is of great significance for stability analysis. A common approach to achieve PE is introducing a probing noise to excite the system [29,13,32].
Subsequently, we study the closed-loop stability based on the approximate event-triggered feedback control incorporating the weight estimation dynamics. Before proceeding, the following assumptions are required, which are commonly employed in ADP literature, such as [21,23,9,29].

Assumption 2. g(x)
is Lipschitz continuous rendering ||g(x) − g(x k )|| ≤ L g ||e k ||, and is upper bounded as ||g(x)|| ≤ b g , where L g are b g are positive real constants.
Proof. We construct a Lyapunov function candidate as: where . The proof consists of two situations conforming to whether the event is triggered or not.
Computing the time derivative of the Lyapunov function, the second term is L x k = 0.
Considering the closed loop system using the approximate feedback control (26), and the optimal HJB equation (14), the first term can be derived as: According to the definition of utility function, the second term of (39) is converted into: Besides, (11) and (12) imply that: Therefore, the last term in (39) can be rewritten as: Substituting (40) and (42) into (39) yields: Letting υ = −u b tanh(ω), the last term in (43) is written as: Therefore, (43) satisfies: where ∇Ĵ (x k ) = ∇σ T (x k )ŵ c . According to Assumptions 2 and 3, we obtain: in which Therefore (45) continues as: Then, we investigate the last two terms in (38). By taking the definition of (x, û(x k )) into consideration, two scenarios are examined separately.
II: (x, û(x k )) = 1. We combine L J s with the stabilization term of Lw c as: By taking the first-order Taylor series expansion of tanh(D k ), we obtain: where o[(D k −D k ) 2 ] has a bound, which is denoted by b oD [21].

Analysis of Zeno phenomenon in the closed-loop system
For nonlinear CT systems with event-triggered control inputs, the inter-execution time is denoted as s = s k+1 − s k , and the minimal inter-execution time s min = min k∈N {s k+1 − s k } might be zero, which can lead to the accumulation of event times, a.k.a., the Zeno phenomenon. Hence, the condition of s > 0 should be guaranteed such that the undesired Zeno phenomenon is avoided.

Theorem 2.
Considering the closed-loop form of the nonlinear system (5) governed by the event-triggered approximate optimal control (26), the k-th inter-execution time s k determined by (37) has a lower bound as: where k f is a positive constant.
Proof. We apply the approximate optimal control (26) to formulate the closed-loop dynamics as follows: By noticing the fact that û(x k ) is upper bounded by u b , and according to Assumption 2, we can derive that: Considering (15), we can further derive that: Since e k (s k ) = x k − x(s k ) = 0, by employing the comparison lemma [34,9] to solve (59), for any t ∈ [s k , s k+1 ), we have: Therefore, the k-th inter-execution time s k satisfies: According to (37), ||ê T || > 0. In summary, s k > 0 for any x k = 0, i.e., s min > 0, which ends the proof.
Overall, the structural diagram of the present control implementation is depicted in Fig. 2 to clarify the design procedure.

Simulation study
Finally, we verify the effectiveness of the proposed control approach through the numerical simulation experiments based on the nonlinear aeroelastic system demonstrated above. Considering the cost function (6) from t = 0, we choose Q = I 5 and R = 1 as a trade-off between fast stabilizing and avoiding aggressive control, and set the deflection constraint as u b = 10 deg. For purpose of simulation, we set the simulation frequency as 1 kHz whereas the sensing frequency as 100 Hz. Let the initial state vector be A critic network is constructed to approximate the optimal cost function. The number of neurons and the nonlinearity of the activation function positively correlate with the approximation precision. However, more neurons with higher nonlinearities can also increase computational load and cause the overfitting that harms the control robustness [13]. For balancing control accuracy and computational complexity, we choose the activation function We choose J s (x) = 0.5x T x to enhance stability and experimentally set η c = 0.05, η s = 0.001, η = 0.1, and C 1 = 250. As claimed in Remark 2, an exploration noise u e is introduced to satisfy the PE condition. The probing noise is designed as a composition of decaying sinusoidal functions, whose formula is u e = −0.05e −20t (sin 2 (100t) cos(100t) + sin 2 (2t) cos(0.1t) + sin 2 (1.2t) × cos(0.5t) + sin 5 (t)) deg. Only at the instant when it is triggered, will u e really be added to the control command. Since the critic network has different hidden neurons, the weights can initially be set as zero. Recalling the triggering condition in (37), we find that ||ŵ c || appears in the denominator, which can cause a large time interval of control at the beginning. Therefore, we manually set an upper bound for the inter-execution time as s max = 0.1 s. This configuration is set in the engineering sense for safety guarantee, and does not affect the theoretical completeness.
The simulation is conducted in an online manner, which means that the control policy improves in a closed-loop way. For presenting the advantage of the ETC scheme, a time-based approach is adopted for comparison, whose settings are exactly same as the proposed intelligent critic control approach except for the eventtriggered scheme, i.e., the time-based control approach updates the control input at each time instant. We can observe from Fig. 3 that the convergence of the weight vector occurs around 1 s. Subsequently, we display the trajectory of the approximated cost function in Fig. 4, which presents the direct performance of the controller. Due to the initial zero values of the weight vector, the initial approximate cost is zero, and subsequently grows as the learning continues. Then because of the convergence of the weight vector, the approximate cost function swiftly decreases to a low level. Furthermore, the triggering threshold trajectory is displayed in Fig. 5, which presents a trend to zero along with the event error. The inter-execution time is depicted in Fig. 6. It is worth mentioning that 800 samples are utilized by the time-based controller,   whereas the proposed event-triggered approach only requires 366 samples. Therefore, the event-triggered method reduces the control updates in the learning process up to 54.25%, and thus improves the resource utilization.
Figs. 7 and 8 present the aeroelastic system states trajectory divided into plunge and pitch motion, respectively. We compare the  results between the event-triggered and time-based approaches, and observe that, although the event-triggered controller utilizes fewer data samples, the state variables eventually converge to a small vicinity of zero without deteriorating the converge rate. Figs. 9 and 10 respectively presents the control command directly generated by the controller, u (β c ), and the real deflection of the control surface, x 5 (β). The developed event-triggered approach has overall comparable curve to the time-based approach. Due to the event-triggered mechanism, the control command signal is stepwise. Nevertheless, the control command signal has to go through an actuator and the real deflection is adequately smooth for the wing surface control. Furthermore, we observe that the control command (incorporating exploration noise) is bounded by the pre-designed saturation constraints, i.e., |u| < u b . Therefore, we conclude that the control input constraints problem has been overcome.
The phase portraits of plunge and pitch motions are illustrated in Figs. 11 and 12, respectively. As can be observed, the trajectories of the proposed method and the open-loop simulation almost coincide at the beginning. This phenomenon is due to the collective effect caused by LCOs and the initial unlearned policy, and disappears quickly as the weight vector updates. Then all states are stabilized to a small vicinity of the equilibrium point.
To further verify its performance, robustness tests are carried out with different freestream velocities using the proposed eventtriggered intelligent optimal control strategy. In addition to nomi-      evolution, as illustrated in Fig. 13. It can be observed that in all conditions, the controller manages to stabilize the plunge displacement within 8 s. The situations with and without uncertainties demonstrate similar performance for U = 15 m/s and U = 27 m/s, whereas for U = 12 m/s, the control performance is even better with uncertainties involved. The reason behind this phenomenon lies in that when U = 12 m/s the flutter frequency is low and it is difficult to fully excite the system. The velocity uncertainties disturb the system but meanwhile provide stronger excitation and thus speed up the learning process. In fact, this acceleration phenomenon also takes place in the other two situations though it is more obvious with lower freestream velocity, which illustrates the robustness of the proposed control method to uncertainties.
With lower incoming flow speed (U = 12 m/s), the control effectiveness is also lower, and therefore it requires larger deflection of control surface to generate sufficient control torque, leading to the aggressive but saturated control command shown in Fig. 14 (a).
When the freestream velocity is higher (U = 27 m/s), the flutter frequency is higher, which provides more excitations at the initial stage. Therefore, it can be seen from Fig. 14 (c) that the control command becomes effective earlier than for the other two conditions. However, it is remarkable that if U < 12 m/s or U > 27 m/s, the controller is capable for stabilizing the system within 8 s with current settings due to the insufficient control effectiveness or the excessive flutter frequency, respectively. Nevertheless, this can be improved by adapting hyperparameters. Fig. 15 compares the root means square (RMS) of critic weights with 3 different freestream velocities in the presence of uncertainties. Consistent with the above, the convergence speed is faster when the freestream velocity is higher because of the stronger excitation. These curves are different in that the control policy is learned online, and therefore the controller adapts to different conditions in real time, which validates the adaptability of the proposed control approach. The simulation results collectively verify the feasibility and the effectiveness of the event-triggered intelligent optimal control approach developed in this paper.

Conclusion
In this paper, we develop an event-triggered intelligent optimal control scheme, and apply it to an aeroelastic system control problem. Taking the input constraints into account, we derive a novel triggering condition without making the Lipschitz assumption on the inverse hyperbolic tangent function. The controller is conducted by adopting the adaptive dynamic programming (ADP) technique with a single critic network.
The theoretical analysis of the closed-loop system shows that, with the derived event-triggered controller, the system states can be guaranteed asymptotically stable, while the Zeno phenomenon is also avoided during the learning phase. The simulation results demonstrate that the nonlinear aeroelastic system is successfully stabilized with input saturation constraints handled. Besides, compared to the conventional time-based ADP method, the present event-triggered ADP method can achieve comparable performance with reduced control updates, which presents the advantages of the developed method in saving the computational and communication's load. Furthermore, the robustness tests demonstrate that the designed controller is able to adapt online to different situations and has the robustness to uncertainties to some extent.
At the current stage, we concentrate on the control algorithm development based on the known system dynamics and perfect measurements. Due to the fact that uncertainties generally exist in the real world, further investigation into robust control methods is recommended. Besides, due to the limitation of actuator power, the deflection rate of the control surface should also be constrained, which can be studied in the future.

Declaration of competing interest
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.