Deterministic learning­based neural network control with adaptive phase compensation

Under the persistent excitation (PE) condition, the real dynamics of the nonlinear system can be obtained through the deterministic learning-based radial basis function neural network (RBFNN) control. However, in this scheme, the learning speed and accuracy are limited by the tradeoff between the PE levels and the approximation capabilities of the neural network (NN). Inspired by the frequency domain phase compensation of linear time-invariant (LTI) systems, this paper presents an adaptive phase compensator employing the pure time delay to improve the performance of the deterministic learning-based adaptive feedforward control with the reference input known a priori. When the adaptive phase compensation is applied to the hidden layer of the RBFNN, the nonlinear approximation capability of the RBFNN is effectively improved such that both the learning performance (learning speed and accuracy) and the control performance of the deterministic learning-based control scheme are improved. Theoretical analysis is conducted to prove the stability of the proposed learning control scheme for a class of systems which are affine in the control. Simulation studies demonstrate the effectiveness of the proposed phase compensation method.


Introduction
In recent years, adaptive neural network (NN) control has been applied successfully in a variety of nonlinear systems with dynamic uncertainties (Dai, Wang, & Wang, 2016;He, Yan, Sun, & Chen, 2017;Ni & Shi, 2021). Traditional adaptive NN control can ensure the stability and good performance of the closed-loop system by updating parameters online. However, the learning of the real dynamics during the adaptive process cannot be achieved, i.e., it cannot ensure the controller parameters to converge to the true values of the system (Ge, Lee, & Harris, 1998). As a result, the adaptive NN controller requires re-learning even for the same control task (Kurdila, Narcowich, & Ward, 2006). In order to address this issue, localized radial basis function neural network (RBFNN) is investigated for learning controller design on account of its unique properties including the localized approximation capability (only a few neurons are activated simultaneously and generate the output), the linear relationship between output and weights and a relatively mature theory on the persistent excitation (PE) condition (Wang & Hill, 2006). In particular, a theory named deterministic learning was proposed to describe a learning mechanism in the context of adaptive NN control using RBFNNs 1. On the premise of a stable feedback control system, RBFNN takes the state orbits and the tracking errors of the system as the input and updates the weights online to achieve good performance of the closed-loop system. As part of the NN input, the state orbits can also be replaced by the reference orbits, which are known a priori, such that the approximation domain can be determined in advance (Pan, Liu, Xu, & Yu, 2016;Pan & Yu, 2017). Analysis in Kurdila et al. (2006) shows that as long as the controller parameters are set reasonably, the state tracking errors will converge arbitrarily close to zero for any continuous bounded orbit reference input. Note that the convergence of tracking errors does not require the PE condition to be met, but is determined by the structure of the closed-loop system itself. 2. With the satisfaction of partial PE condition, the related weights of the RBFNN will converge to their optimal values, which represent the true dynamics of the system. Thanks The difference between f (x d ) and f (x), i.e. ϵ 2 (x,

Parameters of the adaptive phase compensation
θ Phase constant (matrix) of the phase compensation, i.e. the length of the pure time delay θ ij An element of the matrix θ which represents the pure time delay applied to the jth dimension of the input signal of the ith neuron to the research about the PE condition of RBFNNs, the relationship between the accuracy and speed of weight convergence, the structure of the RBFNN and the PE levels is preliminarily revealed in Kurdila et al. (2006), Sanner and Slotine (1992) and Wang, Chen, Chen, and Hill (2009).
θ i A vector consisting of all θ ij applied to each dimension of the ith neuron θ * ,θ Optimal value of θ to approximate a function and its approximation x * i d ,x i d Compensated input signals of the ith neuron using θ * i andθ i , respectively However, due to the difficulty of quantitative evaluation of the PE levels, whether the RBFNN regressor vector satisfies the PE condition is usually analyzed qualitatively. According to Wang et al. (2009), in the case of reasonable center distribution of the RBFNN, both periodic and semi-periodic trajectories can satisfy the PE condition. 3. After the weights of the adaptive RBFNN converge to the optimal values, they can be used to design the feedforward controller of the closed-loop system. These parameter values can also be used for the repetition of the same task in the same system, i.e., learning is achieved.
It can be seen from the aforementioned three-step procedure that the speed and accuracy of the learning depend heavily on the PE condition. According to Zheng and Wang (2017), the NN structures determine the approximation potential while the PE levels determine the degree to which the approximation potential of the RBFNN is explored. Increasing the PE levels can improve the learning speed and accuracy, but the PE levels generally increase with the increase of the separation distance between neuron centers, which in turn downgrades the approximation performance. Owing to the difficulties in quantitatively evaluating the approximation capabilities and PE levels of the RBFNN, there is no systematic method to find a suitable tradeoff to guarantee the performance of deterministic learning. In other words, selecting structures of the RBFNN to improve the control performance involves a tradeoff between the PE levels and approximation capabilities, which leads to the difficulty of hyperparameter adjustment. In this perspective, the lattice distribution of the RBFNN neurons actually represents a possible tradeoff. It uses the lattice structure to guarantee the approximation capabilities of the RBFNN but under this distribution, only the partial PE condition is satisfied, i.e., only the weights of neurons satisfying the PE condition will converge to the optimal values. As proved in Zheng and Wang (2017), finite neurons with the centers reasonably distributed around the reference trajectory can achieve the same control performance as global neurons, which can be seen as lattice distribution on a large scale. Therefore, the distribution of the neuron centers requires more exploration to obtain a better performance beyond the lattice distribution.
In the general application of RBFNNs, optimization algorithms like clustering and least squares are commonly used to select the neuron centers (Gomm & Yu, 2000;Mao, 2002;Pedrycz, 1998;Sing, Basu, Nasipuri, & Kundu, 2003). Similarly, these optimization algorithms can also be used to select the appropriate deterministic learning RBFNN centers based on the reference input. In Liu, Li, Ge, and Ouyang (2021), the K-means clustering algorithm is employed to determine an optimized distribution of RBFNN centers in deterministic learning. This method significantly reduces the number of neurons, upgrades the partial PE condition in lattice distribution to the PE condition and improves the weight convergence accuracy and speed. However, the effectiveness of the K-means clustering algorithm lies in the fact that when the centers of the RBFNN are close to the reference trajectory, the PE levels tend to increase. In order to guarantee the approximation capabilities of the RBFNN, extra experiments are required to determine the appropriate number of neurons. Unlike the lattice distribution, this approach focuses on PE level enhancement rather than the approximation capabilities.
From the aforementioned analysis, it can be concluded that what limits the performance of deterministic learning in control systems is the tradeoff between the PE levels and RBFNN approximation capabilities. It means that after the approximation error is reduced to a certain limit by adjusting the RBFNN structure, it will be difficult to be further reduced. Therefore, we need to seek a method to improve the nonlinear approximation capabilities of the RBFNNs. Phase compensation is an effective method to improve the performance of various controllers, but its potential in system identification has not been well explored due to its frequency domain based form. Inspired by the frequency domain phase compensation of linear time-invariant (LTI) systems, this paper proposes an adaptive phase compensation method using the pure time delay, which effectively improves the approximation capability of the RBFNN and the performance of deterministic learning control. Based on the adaptive phase compensation, an improved deterministic learning control scheme is proposed as follows: (i) the RBFNN is employed as an adaptive feedforward controller and combined with a PD feedback controller such that the stability and performance of the closed-loop system are guaranteed, (ii) both the RBFNN weights and the phase compensation parameters will update in the tracking control process such that fast, accurate learning and good control performance are realized, and (iii) as the tracking error converges to a small neighborhood around zero, the update of the compensation parameters and weights will become very slow, and thus the slow-changing parameters can be recorded to achieve more accurate approximation of the real dynamics of the system and applied to design the feedforward controller for the reuse in a new control task. The main contributions of the paper include: 1. A new perspective is proposed to improve the performance of deterministic learning control, where the nonlinear approximation capabilities of the RBFNN are improved using adaptive phase compensation. This merit does not conflict with the method of optimizing the structures of the RBFNN to find the suitable tradeoff, so we can design adaptive phase compensation based on the RBFNN with optimized structures to achieve better performance. 2. Considering the equivalence between the deterministic learning controller and the variable-gain integrator, an adaptive phase compensation method inspired by the phase compensation of LTI systems is proposed. Based on this phase compensation, an improved deterministic learning scheme is proposed, which not only has better control performance, but also has higher identification accuracy of the system. 3. Different from the linearly parameterized function approximation using the RBFNN, the proposed adaptive phase compensation method is a nonlinearly parameterized approximation technique which has received little attention in the field of adaptive control and system identification. The adaptive phase compensation method provides an interesting example for the design of nonlinearly parameterized controller and demonstrates the potentials of nonlinearly parameterized methods in improving the learning and control performance.
The rest of this paper is organized as follows. Section 2 provides some preliminary knowledge and formulates the problems. In Section 3, an adaptive phase compensation method for feedforward controllers is proposed referring to the phase compensation of LTI systems. In Section 4, we consider a secondorder Brunovsky nonlinear system and carry out the deterministic learning control design. Section 5 applies the proposed adaptive phase compensation to the RBFNN feedforward controller designed with deterministic learning. The simulation studies about the improved RBFNN feedforward controller are carried out in Section 6. Finally, Section 7 summarizes the main conclusions of this paper.

Problem formulation
Consider the second-order Brunovsky nonlinear system as follows: is the state variable, and u ∈ R is the continuous system input. f (x) represents an unknown smooth nonlinear function, which will be approximated by the RBFNN to design the controller.
To generate the reference trajectory of system (1), consider a reference model as below: bounded.
Since f (x) is a smooth function over R 2 , it is a C ∞ function over any compact set Ω and thus satisfies the following assumption.
Assumption 2. There exists a positive constant K 0 such that the inequation holds for ∀x d , x ∈ Ω.
The uncertain system (1) is selected as the system under study to facilitate the understanding of the proposed learning control scheme. Since the adaptive phase compensation proposed in the subsequent sections is a kind of modification for the classical RBFNN, the application scope of the learning control scheme with the adaptive phase compensation is the same as that of the classical adaptive RBFNN control such as in Wang (2017).

RBFNN based function approximation
Compared with multilayer feedforward networks, RBFNNs with only one hidden layer have faster convergence and capabilities of universal and localized approximation (Park & Sandberg, 1991;Wang, 2017). The linearly parameterized form of RBFNNs can be described as: where Z ∈ Ω Z ⊂ R q is the input vector, W = [w 1 , w 2 , . . . , w N ] T ∈ R N is the weight vector, N is the number of neurons, and S(Z ) = [s 1 (∥Z − µ 1 ∥), . . . , s N (∥Z − µ N ∥)] T is the regressor vector. For regressor vector S(Z ), s i (·)(i = 1, . . . , N) is the radial basis function, µ i represents different locations in the state space (also termed as centers), and the 2-norm ∥Z − µ i ∥ represents the distance in the state space, i.e., the independent variables of the radial basis function are distances between the input vector and center. The radial basis function employed in this paper is the commonly used Gaussian function: where µ i = [µ i1 , µ i2 , . . . , µ iq ] T and η i are the center and width of the receptive field, respectively.
Property 1 (Universal Approximation). As proved in Park and Sandberg (1991), the RBFNN has the capability to approximate any continuous function F (Z ) : Ω Z → R with arbitrary precision under the premise of enough neurons and suitably selected centers µ i and widths η i . Then F (Z ) can be formulated as: where W * T ∈ R N denotes the ideal weight vector of the RBFNN, ϵ(Z) is the approximation error (simplified as ϵ), and Ω Z is a compact set.
The ideal weight vector W * T represents the value of W to minimize the absolute value error |ϵ|: Remark 2. One of the most prominent features of RBFNN is its localized approximation capability, i.e., when the input signal enters the NN, only some of the neurons close to the input trajectory are activated at the same time, and the output of the NN is only related to the activated neurons and their weights. The localized approximation capability suggests that the RBFNN can approximate any continuous function f (Z ) mentioned above with a finite number of neurons (Shilnikov, Shilnikov, Turaev, & Chua, 2001).

PE condition and PE levels for RBFNNs
PE condition is a significant concept in system identification and adaptive control, which is used to describe whether the input signal is rich enough to stimulate all dynamics of the system. A definition of PE condition, which is suitable for the RBFNN, is proposed in Kurdila et al. (2006).
holds for ∀t 0 ⩾ 0 and ∀c ∈ R N . α 1 and α 2 are referred to as the excitation level and upper bound, respectively, and they are known collectively as the PE levels.
According to the preliminaries about RBFNNs in Section 2.2, where s i (·) represents the radial basis function. As proved in Kurdila et al. (2006), a sufficient condition for regressor vector S(Z ) to satisfy PE is: the input vector Z (t) : [0, ∞) → R q visits sufficiently small ϵ-balls about each center µ i for a minimum period, which is independent of t 0 , in each time interval [t 0 , t 0 + δ]. For lattice distribution of centers, only some of the centers will be visited by the given input Z (t), and a partial PE condition is satisfied (Wang & Hill, 2006).
Lemma 1 (Partial PE Condition of RBFNN Peng, Nakano, & Shioya, 2007;Wang et al., 2009). On the premise of center lattice distribution, the regressor vector S(Z ) satisfies the partial PE condition if the input Z (t) belongs to recurrent trajectories, which include periodic, quasi-periodic, almost-periodic and some chaotic trajectories.
Remark 3. As proved in Zheng and Wang (2017), the PE levels increase with the separation distance h := 1 where µ i and µ j represent different RBF centers. The convergence rate and convergence accuracy of RBFNN weights increase with the PE levels in deterministic learning. However, we cannot always improve the performance of deterministic learning by increasing the separation distance, because the decrease of neuron density will weaken the approximation capabilities of RBFNN, i.e., there is a tradeoff between the approximation capabilities and PE levels. The lattice distribution of RBF centers is actually a compromise to guarantee the function approximation capabilities, while the K-means clustering method proposed in Liu et al. (2021) is a compromise to guarantee the PE levels.

Phase compensation designed through pure time delay
In this section, we briefly introduce the common phase compensation methods in linear systems, and explain the difficulties of phase compensation in nonlinear systems from the perspective of local linearization. Inspired by the first order inertial and differential components in linear systems, we propose a phase compensation method suitable for feedforward controller design in the time domain. The proposed method is effective for control tasks where the reference trajectory is known a priori.
In the procedure of linear system controller design, we usually select a suitable controller structure according to the characteristics of the plant and control objective, such as the frequently-used PID form. In most cases, the required control performance can be obtained by adjusting the controller parameters. However, it is sometimes impossible to fully meet the requirements through parameter adjustment as a result of the fixed controller structure. It is thus necessary to add compensation to the controller such that the performance can be improved, as illustrated in Fig. 1. The controller design of linear systems is often carried out in the frequency domain, and correspondingly, the compensation components can be described using transfer functions. The transfer function of common phase compensation components can be described as: where τ and β are positive design parameters to determine the corner frequencies in the frequency domain analysis. The compensation utilizing G c (s) belongs to phase-lead compensation when β > 1 and phase-lag compensation when β < 1.
Considering the frequency characteristics, the following first order inertial and differential components can also be used as phase compensators: where G ci (s) and G cd (s) represent the transfer functions of the first order inertial component and differential component, respectively. It is important to note that the ideal differential in G cd (s) cannot be physically realized. However, when the input of the compensators is the reference trajectory x d (t) mentioned in Assumption 1, the differential component can be implemented with high precision because its input signal is known a priori. From the perspective of local linearization of nonlinear systems (Chen & Narendra, 2004;Gan & Harris, 1999;Jung, Leu, Do, Kim, & Choi, 2015), the fixed parameter controllers suitable for LTI system control can achieve required control performance in partial state space of nonlinear systems. In order to improve the nonlinear system control performance of these fixed parameter controllers, we can design reasonable adaptive laws for the parameters (Fang & Ren, 2012;Farrell, 1998).
Inspired by the improvement from fixed parameter controllers to adaptive controllers, we propose that adaptive laws can be designed for common compensation components, such as (9) and (10), to improve the structure complexity and nonlinear characterization capability of the adaptive controllers, so as to achieve better control performance. Considering that the design process of adaptive laws is generally carried out in the time domain, corresponding time domain design methods for the phase compensation are required. The following theorem proposes a phase compensation design method in time domain which is suitable for feedforward control of nonlinear systems.

Theorem 1.
For single-input single-output (SISO) feedforward controller A(c d (t)), we can design parameters α and θ reasonably to obtain an improved controller, A(αc d (t+θ )), which shares the similar rationality with the phase-lead and phase-lag compensation in LTI systems. c d (t) : [0, ∞) → R is the controller input signal which is known a priori, α and θ are named as amplitude constant and phase constant respectively, which can be obtained with reasonably designed adaptive laws.
Proof. It can be proved by illustrating the equivalence between this adaptive compensation and normal phase compensation in LTI systems. Denote the Laplace transform of c d (t) as c d (s) and consider the following transform: Consider the following transfer function and its Taylor's series approximation: where θ represents the phase error which needs to be compensated for. In practical systems with reasonably designed controllers, |θ| is small enough so that the following approximation holds: Given that |θ| is small enough, it can be found by comparing (10) and (13) that D(s) approximates a phase-lead compensation if θ > 0 and approximates a phase-lag compensation if θ < 0. Since the input c d (t) is known a priori, the value of c d (t +θ) can be obtained even if θ > 0, which represents a prospective value. □ In (13), the phase constant θ modifies the phase-frequency characteristic of the feedforward controller while the amplitude constant α modifies the amplitude-frequency characteristic. Specially, (α, θ) = (1, 0) means that there is no compensation for the original controller and can be selected as the initial values of the adaptive laws. The principle of this phase compensation method acting on the feedforward controller is shown in Fig. 2.
Remark 4. For simplification, Theorem 1 designs a phase compensation using the time domain method for SISO feedforward controllers. It should be noted that the proposed pure time delay phase compensation method can also be applied to multipleinput multiple-output (MIMO) controllers by designing appropriate compensation parameters for each dimension of the input.

RBFNN feedforward controller design
In this section, an adaptive RBFNN controller is proposed to complete the trajectory tracking control of a second-order Brunovsky system. Assume that the distribution scope of neurons of the RBFNN covers the reference trajectory ϕ d . Under Assumption 1, the partial PE condition is satisfied and the weights of the RBFNN will converge to certain values which represent the real dynamics of the system (Wang & Hill, 2006). An RBFNN feedforward controller is proposed utilizing the weights to realize high precision tracking control of the system.

Direct adaptive RBFNN control
Considering the Brunovsky system (1) and reference model (2), the tracking error is defined as: Define the composite tracking error referring to the control law of the PD feedback controller: where K 1 > 0 is a gain design parameter.
Consider the following control law with the proportionalderivative term: where K 2 > 0 is the derivative gain. The RBFNN with lattice distributed neurons is employed to approximate f (x) in (1) with the form of the feedforward controller whose input is the reference input of the system, i.e. x d , andŴ is the estimate of the optimal weight vector W * .
The unknown nonlinear function f (x) along the state orbit ϕ d of (2) is approximated as follows: where |ϵ 1 (x d )| < ϵ * 1 is a bounded approximation error, and the optimal value W * is defined as follows: Therefore, the unknown function f (x) can be approximated as: is the approximation error.
Considering the system dynamics (1) and the error definition (14), (15), we obtain the following coordinate transformation: Thus, according to Assumption 2, there exists a positive constant K f such that the following inequation Define the weight estimation errorW = W * −Ŵ , and the weight update law is designed as: where Γ = Γ T > 0 is the diagonal adaptation gain matrix, and σ > 0 is the σ -modification parameter. Note that the control law (16) with weight updating law (22) is not strictly feedforward control because the controller output is related to the feedback tracking error signal.
Theorem 2. Consider the Brunovsky system (1), the reference model (2), the control law (16), and the weight updating law (22). With the bounded NN input x d , the weight estimation errorW and the composite tracking error z are uniformly ultimately bounded (UUB) by small intervals around zeros determined by design parameters K 1 , K 2 and Γ .
Proof. According to the definition of tracking errors (14), (15), the system (1), the reference model (2), the control law (16), and the approximation method (17), (19), the derivatives of z 1 and z 2 can be described as: Consider the following Lyapunov function: The derivative of V is: where a 1 = 1 Substitute (26) into (25), and we have: , and we obtain the following inequation: where K 1 , K 2 ∈ R + are design parameters representing the control gains such that By integrating both sides of (28), we have: , the boundaries of the tracking errors ∥z 1 ∥, ∥z 2 ∥ and the weight estimation error ∥W ∥ satisfy: The boundaries are determined by the values of design parameters K 1 , K 2 and Γ , which means that we can choose parameters reasonably to guarantee the stability of the system. □

Learning from adaptive RBFNN control
Under the premise of the PE condition, the learning of the real dynamics will take place. An RBFNN feedforward controller, which only takes the reference trajectory x d as the input, is proposed using the learned knowledge. The feedforward controller is combined with the PD feedback controller to complete the trajectory tracking control.
To facilitate the convergence analysis of system (23) under the partial PE condition, the following classical result about the linearly parameterized system is considered.

Lemma 2 (Exponential Stability of a Class of Linearly Parameterized
Systems Sastry & Bodson, 1989). Let S(x d ) ∈ R m be a bounded regressor vector, and the following linear time-varying (LTV) system Proof. The proof of this lemma has been given in Theorem 2.6.5 of Sastry and Bodson (1989). □ Theorem 3.
For the system (1), the reference model (2), the control law (16), and the RBFNN weight updating law (22), under Assumption 1, the weight estimation errorW ς and tracking errors z 1 , z 2 will exponentially converge to small neighborhoods around zeros with properly designed parameters K 1 , K 2 , σ and Γ , where the subscript ς represents the neurons close to the state trajectory ϕ d .
Proof. z 1 , z 2 andW have been proved to be UUB in Theorem 2. Consider the weight adaptation law (22) and error dynamics (23), the closed-loop system can be reformulated as: Let z = [z 1 , z 2 ] T , and (32) can be simplified as: where b = [0, −1] T , A is a Hurwitz matrix formulated as: According Lemma 2, when the reference input of the RBFNN x d (t) satisfies the PE condition, the nominal part of system (33) is exponentially stable, i.e., for the following system: the tracking error z and weight estimation errorW will exponentially converge to zeros. Since the reference input of the RBFNN x d (t) is periodic, the regressor vector S(x d ) is also periodic and satisfies the partial PE condition. However, the PE condition of S(x d ) is hard to be guaranteed in practice, and thus the partial PE condition is considered. According to Assumption 1 and Lemma 1, some of the neurons close to the trajectory are activated and satisfy the partial PE condition as a result of the lattice distribution of the neuron centers. To prove the exponential stability of system (33), we can follow the steps of Wang and Hill (2006). According to the localized approximation capability of the RBFNN, f (x) along the state trajectory ϕ d can be approximated using the neurons close to the trajectory as follows: where the subscript ς represents the neurons which are close to the trajectory and activated by the input signal. Let the subscript ς stand for the neurons far away from ϕ d , and the control law (16) can be reformulated as: Considering (36) and (37), the error system (23) can be reformulated as: is the composite approximation error along ϕ d . Therefore, the closed-loop system (33) can be reformulated as: According to Lemma 2, the nominal part of (39) is exponentially stable under Assumption 1. The weights of neurons far away from the reference trajectory are updated using the following update law: where the values of the regressor subvector Sς (x d ) are close to zero as a result of the localized approximation capabilities of the RBFNN. Therefore,Ŵς will be updated only slightly, which means thatŴ T ς Sς (x d ) will remain very small and contribute little to the approximation of f (x d ).
⏐ is very small, the composite approximation error ϵ ′ ς can be divided into a small nonvanishing term ϵ ς (x d ) − W T ς Sς (x d ) and a vanishing term ϵ 2 (x, x d ) which will converge to zero as the tracking error z converge to zero. According to Lemma 9.1 and Lemma 9.2 in Khalil (2002), with properly designed parameters K 1 , K 2 , σ and Γ , the vanishing and nonvanishing perturbation terms will not influence the exponential stability of the nominal system, i.e., the tracking error z and corresponding weight estimation errorW ς in system (39) will exponentially converge to small neighborhoods around zeros. □ Since the values ofŴ ς andŴς tend to be continuous after the transient process, we can use the average of these weights to represent the learned knowledge about the real dynamics: where [t 1 , t 2 ] is a period after the transient process. Along trajectory ϕ d , the unknown smooth nonlinear function f (x d ) is approximated as below: whereε(x d ) is the corresponding approximation error. Therefore, a controller based on the learned knowledge is designed as follows: For controller (43), K 2 z 2 andW T S(x d ) represent the output of the PD feedback controller and RBFNN feedforward controller, respectively. Fig. 3 shows the closed-loop system with the designed controller (43).

Remark 5.
To improve the performance of the RBFNN learning control, literature (Liu et al., 2021) distributes the neuron centers on the input trajectory through K-means clustering. In this method, all neurons are activated, and the partial PE condition is optimized to the PE condition while the computational burden is reduced. Note that the above proof still applies to this method and the steps after (35) can be omitted because the PE condition of the whole regressor vector S(x d ) is guaranteed.

Adaptive phase compensation for RBFNN learning control
In order to find a method to improve the controller performance, we hypothesize that the adaptive RBFNN controller is equivalent to a variable gain integrator, which means that we can apply suitable phase compensation to obtain better control performance. In this perspective, we propose the adaptive phase compensation for the RBFNN based learning control according to the gradient descent method.

The equivalence between adaptive RBFNN controller and variable-gain integrator
For the RBFNN weight updating law (22), σ is the design parameter of σ -modification, which is one of the common robustification techniques. Without σ -modification, the accurate learning of the real dynamics can still be achieved (Wang et al., 2009). For simplification of analysis, the following weight updating law is adopted in this section: which reformulates the control law (16) into the following form: where Γ = diag(γ 1 , γ 2 , . . . , γ N ) ∈ R N×N is the adaptation gain matrix, N is the number of neurons, u PD (t) = K 2 z 2 (t) is the output of the PD controller, u I (t) = S T (x d ) ∫ t 0Ẇ (τ )dτ is the increment of the RBFNN controller output, and u 0 (t) = S T (x d )Ŵ 0 is the initial value of the RBFNN controller output. Then the output of the RBFNN controller designed in Section 4 can be described as: Liu et al. (2021) proves that when the width of the RBF η i → ∞, or the input x d is constant, the adaptive RBFNN controller is equivalent to an integrator. In fact, no matter how the hyperparameters are set, the RBFNN controller (46) is equivalent to a variable-gain integrator with the initial value of S T (x d )Ŵ 0 . This variable-gain integrator can be approximated to different general fixed gain integrators at different moments, and a brief proof is given below.
From the perspective of RBFNN based learning control, the coefficients of the regressor vector, i.e. the weight vectorŴ (t), can be regarded as the current knowledge learned by the controller. Therefore, the derivativeẆ (t) = −Γ S(x d )z 2 (t) can be regarded as the learning speed at time t, where the minus sign indicates that the output of the RBFNN controller is subtracted from the output of the PD controller as the control input of the plant. Consider a general integrator with fixed gains: where t 0 ⩾ 0 represents a moment during the integrating process and the integrating speed can be expressed as: Comparing (44) with (48), we obtain the following equation: Consider the learning process of the knowledgeŴ during a small time interval close to t 0 : which indicates that in the learning process during [t 0 , t 0 + ∆t], where ∆t → 0 + is a small time interval, the variable-gain integrator u NN (t) can be replaced by a fixed gain integrator I t 0 (t) with )z 2 (τ )dτ as a result of the error accumulation. In LTI systems, phase compensation with fixed parameters like (9) can be applied to integrators to obtain better tracking performance. Inspired by this, we propose that suitable adaptive phase compensation can be applied to the adaptive RBFNN controller (46), which shares similar rationality with the variablegain integrator.

Adaptive phase compensation design with pure time delay
Different from the SISO case mentioned in Theorem 1, the RBFNN can be regarded as a multi-input single-output system, and we propose that the pure time delay based phase compensation should be applied to each dimension of the input of each neuron in the RBFNN such that the nonlinear approximation capability of the RBFNN is improved. Fig. 4 illustrates the pure time delay based phase compensation of the adaptive RBFNN controller proposed in Section 4.
The compensated RBFNN (CRBFNN) can be regarded as the original RBFNN plus the pure time delay and hidden layer weights applied to each dimension of the input of each neuron, and thus the output of the CRBFNN can be formulated as follows ] T ∈ R N×2 , and the output of the ith neuron can be formulated as , i = 1, 2, . . . , N.
(53) Remark 6. Interestingly, according to Fig. 4 and (52), the amplitude constants α ij , i = 1, 2, . . . , N, j = 1, 2 are equivalent to the weights between the input layer and hidden layer of the RBFNN which have been studied before this paper (Abdollahi, Talebi, & Patel, 2006). Therefore, in order to highlight the effectiveness of the newly proposed phase constants θ ij , the amplitude constants α ij are set to 1 in the subsequent part. Thus, the input of the CRBFNN (52) is transformed into: Apply the proposed CRBFNN to the same hybrid feedforward feedback control scheme shown in Fig. 3 and consider the same learning control problem mentioned in Section 4.
, and the optimal value of the parameter matrix Θ is defined according to (18) as follows: Therefore, the unknown function f (x d ) is approximated as: where |ϵ 3 (x d )| < ϵ * 3 is a small bounded approximation error. The control law is still in the form of the hybrid feedforward feedback control and can be described as follows: consider the system (1), the reference model (2), the function approximation (56), the control (57), and the error dynamics of the closed-loop system is formulated as: is the composite approximation error which includes the nonvanishing term ϵ 3 (x d ) and the vanishing term ϵ 2 (x, where σ 1 > 0 is the σ −modification parameter.
To guarantee the stability of the closed-loop system with the adaptive phase compensation applied to the RBFNN, suitably designed update law for θ is required. As one of the most frequently-used update methods, the gradient descent method is effective in many static updating scenarios of the RBFNN. However, when applied to update the parameters of the RBFNN in a closed-loop system, the gradient of the objective function is hard to obtain as a result of the complicated coupling between the states, and thus a static approximation of the gradient is adopted to address this problem.
Consider the following objective function of the gradient descent method: We expect that the parameterθ updates such that the objective function J(θ ) is minimized. Therefore, the gradient descent based update law is proposed: where ρ > 0 is the updating rate and σ 2 > 0 is the design parameter for the modification term σ 2θ |z 2 |. However, the partial differential term ∂z 2 ∂θ is hard to obtain in a dynamical system where there are complicated coupling connections between the state variables. To approximate the value of ∂z 2 ∂θ , the following steps are adopted: 1. Similar to the stability analysis in the proof of Theorem 2, with properly designed parameters K 1 , K 2 , the perturbation term ϵ c (x, x d ) in (58) will not influence the stability of the system (58). Therefore, to guarantee the stability of (58), we only need to design the update law of θ to stabilize its nominal part as follows: 2. According to Abdollahi et al. (2006), to obtain the static approximation of In addition, the partial derivative between any two elements of the parameter matrix Θ is assumed to be zero during the update process, and thus the following equations hold for ∀i, i 0 = 1, 2, . . . , N, ∀j, j 0 = 1, 2 and (i, j) ̸ = (i 0 , j 0 ), whereŵ i is the element of the weight vectorŴ and θ ij is the element ofθ .

Stability analysis of CRBFNN based adaptive control
In this section, the stability of the system (58) with the adaptive laws (59), (73) is analyzed and the following theorem summarizes the results.
Theorem 4. Considering the system (58) and the adaptive laws (59), (73), with properly designed parameters, we have: (i) the composite tracking error z and the approximation errorsW ,θ remain UUB in the closed-loop system; (ii) the composite tracking error z will converge exponentially to a bounded neighborhood around zero.
(ii) To prove the exponential convergence of z to small neighborhood around zero, the following Lyapunov function candidate is considered: Its derivative along the trajectory (58) is described as: Consider the following inequations: where d 1 , d 2 > 0 are design parameters for the completion of squares.
Substitute (84) and (89) into (88) and we have: Let ) 2 + 1 2 ϵ * 2 3 and design parameters K 1 , K 2 such that C 1 > 0, then we obtain: From the analysis of (i) in Theorem 4, C 2 is bounded, and thus the composite tracking error z will exponentially converge to a bounded neighborhood around zero which is determined by the gain parameters K 1 and K 2 . □ Remark 7 (Reuse Phase of the Adaptive Phase Compensation). According to the conclusions of Theorem 4, the composite tracking error z = [z 1 , z 2 ] T will exponentially converge to a small neighborhood around zero with properly designed gains K 1 , K 2 .    Consider the weight and phase compensation update laws (59), (73), since the coefficients of z 2 are all bounded values, the update rates ofŴ andθ will become very small as z 2 converges. Therefore, after the transient process, we can record the values of the slowly changingŴ andθ to approximate the uncertainties and design an effective feedforward controller for the system. The learned knowledge can be extracted using the same method in (41) as below: where [t 1 , t 2 ] is a period after the transient process. In this case, and the feedforward controller is designed as whereε(x d ) is the corresponding approximation error and S(x d ) represents the compensated regressor vector with the fixed phase compensationθ . Interestingly, although there has not been theoretical analysis for the convergence ofŴ ,θ , the simulation results in the subsequent sections indicate the possibility of the parameter convergence and thus further efforts will be made to investigate this issue.

Simulation studies
The robot manipulator system is one of the common Brunovsky systems. In order to verify the proposed RBFNN feedforward controller and phase compensation method, the following 2-DOF robot manipulator model is adopted (Craig, 1986): where M(q) ∈ R 2×2 is the inertia matrix, C (q,q) ∈ R 2×2 is the Coriolis and centripetal matrix, G(q) ∈ R 2 is the gravity vector, q = [q 1 , q 2 ] T ,q = [q 1 ,q 2 ] T ,q = [q 1 ,q 2 ] T ∈ R 2 represent the joint position, joint velocity and joint acceleration vectors respectively, and τ ∈ R 2 is the input torque vector. M(q), C (q,q) and G(q) are given by:

M(q) =
[ p 1 + p 2 + 2p 3 cos q 2 p 2 + p 3 cos q 2 p 2 + p 3 cos q 2 p 2 ] C (q,q) = [ −p 3q2 sin q 2 −p 3 (q 1 +q 2 ) sin q 2 p 3q1 sin q 2 0 ] G(q) = [ p 4 g cos q 1 + p 5 g cos(q 1 + q 2 ) p 5 g cos(q 1 + q 2 ) ] where the plant parameters p = [p 1 , p 2 , p 3 , p 4 , p 5 ] T = [1.66, 0.42, 0.63, 3.75, 1.25] T , the gravitational acceleration g = 9.8 and the initial states are set to q 1 = q 2 = 0,q 1 =q 2 = 0. It should be noted that both the input and the output of the manipulator system have two dimensions, and thus the a For the PD part of the designed controller (45), the gain matrices are K 1 = [5, 0; 0, 5] and K 2 = [20, 0; 0, 20], which are equivalent to a proportional gain K P = [100, 0; 0, 100] and a derivative gain K D = [20, 0; 0, 20] in the PD control. For the adaptive RBFNN part, the input of the RBFNNs is set to ] T ∈ R 6 and the following RBFNN structures are adopted and compared (the same setting is adopted for the two joints): 1. RBFNN-O: To improve the convergence speed and accuracy ofŴ in the learning process, instead of the lattice distribution, we can use methods such as K-means clustering to obtain an optimized center distribution on the reference trajectories (Liu et al., 2021). In this case, only 20 neurons with η i = 1.2 are distributed evenly along   the reference trajectory x d and the weight update law (22) is adopted with the parameter Γ = diag(0.2, 0.2, . . . , 0.2) ∈ R 20×20 , σ = 1 × 10 −4 .

CRBFNN-O:
To demonstrate the effectiveness of the compensated RBFNN based learning control, 20 neurons along x d with the adaptive phase compensation is adopted. In this case, the receptive field width of each neuron is η i = 1.2, the weight update law (59) with Γ = diag(0.2, 0.2, . . . , 0.2) ∈ R 20×20 , σ 1 = 1×10 −4 and the phase constant update law (73) with ρ 0 = 0.05 and σ 2 = 1 × 10 −4 are adopted. It should be noted that the hyperparameters of the CRBFNN-O are the same as the original RBFNN-O mentioned above to show the effect of the adaptive phase compensation.
Since Joint 2 has similar simulation results with Joint 1, we only illustrate the results of Joint 1 in the following simulation studies for simplification.

Simulation for RBFNN learning speed
To investigate the effect of the adaptive phase compensation to the learning speed of the RBFNN based learning control, the reference trajectory is set to q d1 = sin t, q d2 = cos t as an example. In this case, the first three dimensions of the centers is also shown as a comparison in this part.
The convergence of the tracking error e 1 = q d1 − q 1 during the first 100s of the learning phase are presented in Fig. 6, which provides an explicit comparison among the PID controller, RBFNN-O controller and CRBFNN-O controller. It can be seen from the simulation results that the PID controller can only maintain the stability of the system but cannot effectively reduce the tracking error during the simulation. By contrast, the RBFNN based controllers gradually improve their tracking accuracy, and the tracking errors become much smaller than the PID controller soon. Fig. 7 shows that the convergence speed ofŴ in CRBFNN is much faster than the original RBFNN. According to the results in Figs. 6 and 7, compared with the RBFNN based adaptive feedforward control, the CRBFNN based control has faster convergence speed of the tracking error and weight vector which indicates that the adaptive phase compensation improves the learning speed of the original RBFNN based learning control scheme. Fig. 8 shows the evolving process of a subvector ofθ corresponding to Joint 1. Interestingly, although the convergence of the phase compensation vectorθ has not been proved theoretically, from Fig. 8, it is observed thatθ converges fast in this tracking control task. The convergence ofŴ andθ means that the real dynamics of the system has been learned by the CRBFNN.
The aforementioned results demonstrate that the proposed adaptive phase compensation effectively improves both the learning speed and the tracking accuracy of the RBFNN based learning control scheme.

Simulation for the accuracy of the learned knowledge
Consider the same tracking control task mentioned in Section 6.1. After the transient process of the parameter convergence, we take the average values ofŴ andθ over the time interval   [95s, 100s] as the learned knowledge which are used to design the RBFNN feedforward controllers. Let τ d (x d ) = M(q)q + C (q,q)q + G(q) denote the unknown function which needs to be approximated by the RBFNNs in this case, then τ d ( ∈ R 2 can be approximated as follows: According to Figs. 9 and 10, the locally accurate dynamics of the robot manipulator along x d is learned by both RBFNN-O and CRBFNN-O, and it is obvious that the phase compensation improves the learning accuracy of the original RBFNN learning control scheme.

Simulation on irregular reference trajectories
In this section, simulation studies on another two tracking control tasks with more complex reference trajectories are carried out to further demonstrate the convergence ofθ in repetitive tracking tasks. The hyperparameter settings of the RBFNNs remain the same as described at the beginning of Section 6.
(i) Consider the following reference trajectory ϕ a : The first three dimensions of the centers, which are determined using K-means clustering method, are illustrated in Fig. 11. Fig. 12 shows the tracking performance of the RBFNN-O and CRBFNN-O based feedforward feedback control schemes and Fig. 13 shows the convergence of the weight vector in these two control schemes. From Figs. 12 and 13, it can be seen that the Fig. 15. Tracking performance using the learned knowledge along ϕ a .  (ii) Consider the following reference trajectory ϕ b : { q d1 = sin(sin(0.9t)) q d2 = sin(cos(0.45t)). (99) The first three dimensions of the centers along ϕ b are illustrated in Fig. 17. It should be noted that Fig. 17 only shows the threedimension projection of the six-dimension centers, so each point on the figure actually corresponds to two neuron centers whose projections coincide. Figs. 18-20 show the learning phase results along the reference trajectory ϕ b . From these results, it can be seen that the original RBFNN-O has poor learning and control performance in this complex tracking task even if it has an optimized neuron distribution along the reference trajectory. By contrast, the CRBFNN-O, which has the same hyperparameter settings as the RBFNN-O, still performs well in this case indicating that the adaptive phase compensation brings out the approximation capabilities of the original RBFNN and thus improves the learning and control performance. The control and approximation performance of the learned knowledge is shown in Figs. 21 and 22, which indicates that the introduction of the phase compensation also improves the learning accuracy.
The simulation results in (i) and (ii) further demonstrate the convergence ofθ in different tasks. Moreover, with the adaptive phase compensation, the learning speed and accuracy of the RBFNN is greatly improved.

Conclusions
In this paper, we have proposed an RBFNN learning control scheme based on deterministic learning mechanism. Referring to the phase-lead and phase-lag compensation of LTI systems, we proposed an adaptive phase compensation method using the pure time delay to improve the performance of the designed RBFNN learning controller. According to the gradient descent method and its static gradient approximation, the proper adaptive law of the phase compensation is proposed. With the adaptive phase compensation, both the learning speed and the learning accuracy of the RBFNN is improved and thus a feedforward controller with better performance is obtained using the learned knowledge.
The deterministic learning based control scheme improved by adaptive phase compensation has higher control and approximation accuracy and thus is suitable for high precision trajectory tracking control, especially for computerized numerical control systems (CNC) and industrial robots whose main task is repetitive machining. Moreover, the adaptive phase compensation provides a good example for the nonlinearly parameterized approximation which has received little attention before. Its effectiveness also indicates the great potentials of the nonlinearly parameterized approximation techniques.

Declaration of competing interest
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Data availability
Data will be made available on request.