Implementation of an Intelligent Adaptive Controller for an Electrohydraulic Servo System Based on a Brain Mechanism of Emotional Learning

In this paper, an experimental analysis of identification and an online intelligent adaptive position tracking control based on an emotional learning model of the human brain (BELBIC) for an electrohydraulic servo (EHS) system is presented. A mathematical model of the system is derived and the parameters of the model are identified. The BELBIC is designed based upon this dynamic model and utilized to control the real laboratorial EHS system. The experimental results are compared to those obtained from an optimal PID controller to prove that classic linear controllers fail to achieve good tracking of the desired output, especially when the hydraulic actuator operates at various frequencies and pressures. The results demonstrate an excellent improvement in control action, without any increase in control effort, for the proposed approach. Finally, it can be concluded from the experimental results that the BELBIC is able to respond quickly to any disturbance and variation in the system parameters, showing a high degree of adaptability and robustness due to its online learning ability.


Introduction
Recently, electrohydraulic servo (EHS) systems have become increasingly popular in many types of industrial equipment and processes. EHS systems have been used in industry for a wide range of applications due to their small size to power ratio and their ability to apply a very large force and control accuracy. Instances of the applications of these systems may be found in the control of industrial robots, the machine tool industry, milling machines, automobiles, punching presses, material testing equipment, aerospace industries, and the like. However, the dynamics of hydraulic systems are highly nonlinear; the system may be subject to non-smooth nonlinearities due to control input saturation, the directional change of valve opening, friction and valve overlap; laminar and turbulent flows, channel geometry and friction result in system equations that are highly nonlinear. The parameters of such hydraulic systems, depending on the relation between flow velocity, pressure and oil viscosity, vary heavily. Aside from the nonlinear nature of hydraulic dynamics, electrohydraulic servo systems also have many model uncertainties [1], such as external disturbances and leakages that cannot be modelled exactly of which the nonlinear functions that describe them may not be known. While an accurate modelling involving all the nonlinearities is helpful in the description of such complex dynamic behaviour, it complicates the hydraulic system model for identification and control strategy.
Much research has been conducted on the control of servo hydraulic systems. In early studies, linear models around working points and linear control strategies were applied [2]. Although linear control methods work well for some systems, for highly nonlinear systems with time varying dynamics (such as EHS systems) they may not ensure acceptable control performance. To improve control system behaviour for nonlinear time variant systems -such as EHS systems -robust and adaptive controllers have been applied in those systems [3], [4]. However, the controller is based on a linear model of the plant, which imposes certain limitations on the efficiency and robustness of the controller.
Nonlinear control strategies such as sliding mode [5], [6], feedback linearization [7] or backstepping [8] approaches depict more satisfactory results but they require a more accurate model of the system. Besides this, the controller design and experimental implementation of these controllers are somewhat complicated.
Recently, intelligent control has been widely considered due to its high flexibility in relation to feedbacks and the selection of control parameters and its low dependency on the accuracy of a dynamic model. Model-based approaches to decision-making are being replaced by data-driven and rule-based approaches in recent years. One of the most popular of these approaches is fuzzy set theory. Fuzzy control [9], [10], in using linguistic information, possesses several advantages such as its being model-free, its robustness, its universal approximation theorem and its rule-based algorithm. However, the huge number of fuzzy rules for high-order systems makes the analysis complex. New approaches, in which intelligence is not given to the system from outside but is acquired by the system through learning, have proven much more successful [11]. Because of their selflearning, self-organizing and self-adapting capability, neural networks have become a powerful tool for many complex applications, such as EHS systems [12], [13]. However neural network controllers require a predefined structure which may lead to additional time consuming computations during the control process. The other drawback of neural network controllers is their output dependency on the selection of the initial values of the neural network weights.
The successful adoption and, therefore, implementation [14]- [19] of a neuro-computing model for the intelligentadaptive control of dynamic systems investigated by Lucas et al. [20] based on a simple but effective computational model of an emotional learning scheme in the amygdala as introduced recently by Moren and Balkenius [21] motivated us for the further study of this scheme.
The model introduced by Moren and Balkenius aims to partly regenerate the same characteristics of the biological system by presenting a neurologically inspired computational model of the amygdala and the orbitofrontal cortex (OFC). In the leading research carried out by Lucas et al., a simplified version of a previously developed emotional learning model of the amygdala [21] was employed in order to present an adaptive control strategy, designated as a Brain Emotional Learning Based Intelligent Controller (BELBIC). A BELBIC possesses several important features such as online adaptation, learning ability, a straight-forward control system architecture and low online computational load.
The recently modified version of a BELBIC introduced in this paper works exactly like the model of emotional learning in human brain and mimics the amygdala, the OFC, the thalamus, and the Sensory input cortex. This BELBICs is known to be suitable for the online directadaptive-control of nonlinear dynamic systems [22] and it has shown its abilities in this research by being implemented for the design of a positioning control of a nonlinear electrohydraulic servo system. This paper is organized as follows. First, the structure of the novel emotional controller is presented in Section 2. The details of the design process of the BELBIC controller are given at the end of the paper in the Appendix. Next, Section 3 explains a complete description of the mathematical model used. Section 4 deals with the identification algorithm of the system parameters. Section 5 contains a brief description of the electrohydraulic workbench used to implement the real-time work. In Section 6, the presentation of the identification results is followed by the comparison of the experimental results obtained from applying the BELBIC to a real plant with those of a real-time PID controller. Finally, some conclusions and remarks bring this work to a close in Section 7.

Computational model of the Brain Emotional Learning (BEL) mechanism
The most important aspect of any intelligent system is its capability to learn, e.g., supervised learning (where the algorithm generates a function that maps inputs to desired outputs), unsupervised learning (which models a set of inputs -labelled examples are not available) and reinforcement learning (where the algorithm learns a policy of how to act given an observation of the world).
For a human, as a biological intelligent system, there are evidently many different areas in the brain where learning occurs. Figure 1 shows the relevant picture of the areas representing the three kinds of learning mechanisms in the human brain. In this section, we introduce the computational model of an emotional learning algorithm in the human brain as a novel reinforcement learning strategy. Reinforcement learning theory dates back to the early days of cybernetics and work in statistics, psychology, neuroscience and the computer sciences [23]. The main part the mammalian brain which is responsible for emotional processes is called the limbic system. The computational models of the amygdala and orbitofrontal cortex which are the main parts of the limbic system were recently introduced in [21] for the first time. As depicted in Fig. 2, the system consists of four main parts.
Sensory input signals first enter the thalamus. Since the thalamus must provide a fast response to stimuli, in this model the maximum over all stimuli S is sent directly to the amygdala as another input [21]: The amygdala receives inputs from the thalamus and sensory cortex, while the OFC part receives inputs from the sensory cortex and the amygdala. The system also receives a reinforcing signal (REW). For each A node in the amygdala, there is a plastic connection weight Vi. Any input is multiplied by this weight to provide the output of the node. The O nodes show similar behaviour, with a connection weight Wi applied to the input signal to create an output. The nodes' values are calculated as follows [21]: (2) There is one output node in common for all outputs of the model, called E (see Fig. 2). The E node simply sums the outputs from the A nodes and then subtracts the inhibitory outputs from the O nodes. The result is the output from the model. The E ʹ node sums the outputs from A except for the Ath and then subtracts the inhibitory outputs from the O nodes.
A Graphical depiction of the brain emotional learning process [21].
Emotional learning occurs mainly in the amygdala. The learning rule of the amygdala is given as follows: where αa is the amygdala learning rate that is constant and REW is the reinforcing signal. The term max is to make the learning changes monotonic, implying that the amygdala's gain can never be decreased. This rule is for modelling the incapability of unlearning the emotion signal (and consequently, emotional action) previously learned in the amygdala. Similarly, the learning rule in the orbitofrontal cortex is shown as: where αo is the learning rate constant in the OFC. The orbitofrontal learning rule is very similar to the amygdala rule. The only difference is that the OFC connection weight can either increase or decrease as needed to track the required inhibition, which is reflected as the discrepancy between the reinforcing signal (REW) and the E ʹ node.
The system operation consists of two levels: the amygdala learns to predict and react to a given reinforcement signal. This subsystem cannot unlearn a connection. The incompatibility between predictions and the actual reinforcement signals causes inappropriate responses from the amygdala. The OFC learns to prevent the system output if such mismatches occur. The learning in the amygdala and the OFC is performed by updating the plastic connection weights, based on the received reinforcing and stimulus signals.

Brain Emotional Learning Based Intelligent Controller (BELBIC) structure
In a biological system, emotional reactions are utilized for fast decision-making in complex environments or emergency situations. It is thought that the amygdala and the orbitofrontal cortex are the most important parts of the brain involved in emotional reactions and learning [24]. The amygdala is a small structure in the medial temporal lobe of the brain that is thought to be responsible for the emotional evaluation of stimuli [25]. This evaluation is in turn used as a basis for emotional states and reactions and is used for attention signals and laying down long-term memories [25]. The amygdala and the orbitofrontal cortex compute their outputs based on the emotional signal (the reinforcing signal) received from the environment. The final output (the emotional reaction) is calculated by subtracting the amygdala's output from the orbitofrontal cortex's output (see Fig. 3).
To use our version of the Moren-Balkenius model as a controller, it should be observed that it essentially converts two sets of inputs (sensory inputs and emotional cues or reinforcing signals) into the decision signal (the emotional reaction) as its output. Closed loop configurations using this block (BELBIC) in the feed-forward-loop of the total system in an appropriate manner have been implemented so that the input signals have the proper interpretations. The block implicitly implemented the critic, the learning algorithm and the action selection mechanism used in the functional implementations of emotionally-based (or, generally, reinforcement learning-based) controllers, all at the same time. In the implementation of the BELBIC, it should be pointed out that since this model has originally been proposed for descriptive purposes with no control engineering motivation, the model is essentially an open loop. To be used as a controller, the designer has to choose the sensory input fed back from the system response as well as the reward function, in accordance with the control engineering requirements of the problem at hand and not merely from neurocognitive insights. The design of the BELBIC is, therefore, no different from the design of any other nonlinear or adaptive control schemes [26]. The reinforcing signal (REW) comes as a function of the other signals which can be supposed as a cost function validation. Specifically, a reward (or punishment) is applied based on a previously defined cost function: The choice of external reinforcement signal provides a degree of freedom for multi-objective learning procedures. Similarly, the sensory inputs can be a function of plant and controller outputs, as follows: As is illustrated in Eqs. 8 and 9, the sensory input and the reward signal could be arbitrary functions of the reference output (yr), the plant's output (yp), the control effort (u) and the error signal (e). In general, it is up to the designer to find a proper setup for a reward signal and sensory input functions for the BELBIC. The stability of the BELBIC has been discussed in [27] by using a cell-tocell mapping method; also, in order to ensure the stability of the system, a general idea for choosing the control parameters is described in [27]. The design procedure for the BELBIC is elaborated in the Appendix.

System modelling
All hydraulic systems require a supply of pressurized fluid, which is a form of mineral oil. As shown in Fig. 4, the oil is drawn from a tank into a rotary pump, driven at a constant speed by an electric motor. The oil is driven at a constant flow rate into an adjustable pressure relief valve, which regulates the system pressure by allowing the excess oil to return to the reservoir once a pre-defined pressure threshold has been reached. The Pressurized hydraulic oil is carried to a servo valve through a system of rigid or flexible piping. The oil is returned from the servo valve to the tank through a low pressure return pipe. A servo valve is a complex device which exhibits a highorder nonlinear response (see Fig. 5); moreover, knowledge of a large number of internal valve parameters is required to formulate an accurate mathematical model. Indeed, many parameters, such as nozzle and orifice sizes, spring rates and spool geometry, etc., are adjusted by the manufacturer to tune the valve response and are not normally available to the user.
Practically all physical systems exhibit some nonlinearity: in the simplest case, this may be a physical limit of movement or it may arise from the effects of friction, hysteresis, mechanical wear or backlash. The two stage nozzle-flapper servo valve consists of three main parts: an electrical torque motor, a hydraulic amplifier and a valve spool assembly. When modelling complex servo valves, it is sometimes possible to ignore inherent nonlinearities and employ a linear model which approximates the physical system. Such models are often based on classical first-or secondorder differential equations. The relation between the servo valve opening area Av and the input voltage u can be written as [1]: where Kv is the servo valve constant and τv is its time constant.
Defining the supply pressure Ps as Ps=Pc1+Pc2, the load pressure PL as PL=Pc1-Pc2 and the load flow QL as QL=(Qc1+ Qc2)/2, the relationship between the load pressure PL and the load flow QL for an ideal critical centre servo valve with a matched and symmetric orifice, assuming a small leakage, it can be expressed as follows [1]: where Cd is the flow discharge coefficient and ρ is the fluid mass density. The sign function in Eq. 11 stands for the change in the direction of fluid flow through the servo valve.
When the continuity equation is applied to the fluid flowing in the actuator dynamics along with the oil leakage, the following expression can be derived: where CL is the total leakage coefficient of the motor, β is the fluid bulk modulus, θ is the output angular position, Vt is the total volume of the servo valve and the actuator, and Dm is the actuator volumetric displacement.
Neglecting Coulomb frictional torque, the motion equation of the hydraulic actuator is given by: where Bv is the viscous damping coefficient and Jm is total inertia of the motor and load referred to by the motor shaft.
Finally, if the state variables are denoted by: x1-hydro motor angular position (rad), x2-hydro motor angular velocity (rad/sec), and x3-load pressure differential (Pa), the system can be easily described with a third-order nonlinear state-space model: where the valve area opening (Av) can be determined from Eq. 10 and w1, w2, p1, p2 and p3 are appropriate constants given by: In this first step, the mathematical model of the electrohydraulic system was fully designed and established. The next step involves setting up an identification algorithm for the system parameters.

System identification algorithm
In the system model (Eq. 14), it can be seen that the parameters to be identified are those in the last two equations. Therefore, only parameters w1, w2, p1, p2 and p3 need to be identified in the following equations: can be rewritten in matrix form as: It can be seen that the system (Eq. 17) has linear parameters, which is a sufficient condition for those parameters to be identified through the continuous recursive least square method with a constant trace [28]. Now let: where y is the observed vector of variables, φ is the matrix of regression variables and θ is the vector of unknown parameters. The parameter should be determined so that the criterion: is minimized. For parameter α, α ≥ 0 corresponds to the forgetting factor. A straightforward calculation shows that the criterion is minimized if: which is the normal equation. The estimate is unique if the matrix: Finally, assuming that matrix P(t)=R(t) -1 , the parameter estimate that minimizes the least square error must satisfy: Through Eq. 23, we can now identify the system parameters in Eq. 15 and, consequently, begin the design of the BELBIC controller. In order to analyse the performance of the identification and controller, experimental tests have been carried out. The experimental test rig is shown in Fig. 6. All of the programs for the system were developed using the MATLAB-Simulink software and C S-Function blocks. The command signal from the computer was sent to a servo valve driver by a 12-bit interface card. The sampling time for the experimental implementation of the proposed EHS system with the emotional controller is determined as 1ms. The angular position of the hydro motor shaft is measured via an encoder which is mounted at the motor shaft end. The maximum working pressure, during all of the experiments, is set at 110 bars, which is the pressure threshold for the relief valve to open.

Experimental results and discussion
This section presents and analyses the experimental results obtained from the BELBIC controller versus those obtained from a real-time PID controller, since the PID is the most popular and widespread controller used in industry. We therefore held it as a reference for the sake of comparison. The goal is to evaluate the system transient response, the steady state behaviour, the magnitude and the uniformity of the control signal.

Identification
The online identification of the system parameters is based on the recursive least square algorithm with a constant trace. The identification procedure is achieved in an open-loop system. In fact, the input signal that is sent to the servo valve is a sine wave with an 8 rad/sec frequency and a 0.1 Volt amplitude, plus a low power white noise with a variance of 0.002 added to include all of the frequencies. During identification, the sampling time was set to 1ms. The intensity of the input noise is set so as to be effective on the states of the system.
Since the data acquired from the sensors -especially the pressure sensors -is always noisy, the appropriate filters should be used to reduce the noise. The type, order and other details of the filters are determined by trial-anderror. An important fact in filter selection is that the designed filter must be able to decrease environmental noises but not totally eliminate the effect of input noise. A first-order low pass Butterworth filter with a pass band frequency of 50 rad/sec is applied. After 6 seconds of identification, the system parameters became fairly constant and were set at the following values in Table 1

Real-time control and comparisons
To demonstrate the effectiveness of this proposed controller, a series of experimental tests were performed on the rotary actuator position for different operating conditions. The results were compared to those obtained from a real-time PID controller. To make a fair judgment, the PID controller is tuned at rated conditions to give a quick and smooth response (a settling time and undershoot/overshoot). All the tests were performed under the same conditions. The policies for PID-based controller and the BELBIC controller are the same due to the equal number of states which are needed for the feedback. The structure of the control circuit that we implemented in this study using the direct-adaptivecontrol strategy is illustrated in Fig. 7. Fig. 8 shows position tracking error and control signal for the real-time BELBIC and PID controllers. The desired position to be tracked is a sine wave with a frequency of 2 rad/sec and a 5 rad amplitude.
As understood from Figs. 8 (a) and 8 (b), the PID controller contains a constant steady-state position error, yet in the BELBIC the steady-state position error eventually decreases. Unlike the PID controller, learning the dynamics through real-time implementation causes the BELBIC to track the reference signal inaccurately at the beginning of the experiment (shown in Fig. 8 (a)). Despite the fact that the initial weights are all set to zero, the BELBIC rapidly learns the dynamics of the plant without any off-line training. This fact is shown in Fig. 8 During transient states, a slight overshoot is observed in the control signal of the BELBIC since the servo-valve draws more current; however, in the PID-based controller no such change is realized (Fig. 8 (c)). As the BELBIC passes on to a steady state, the control signal becomes uniform and smooth, which is an important advantage in practical use, especially in high power systems such as EHS systems.  Another experiment was carried out with a sudden increase in the frequency of the reference command position (shown in Figs. 9 and 10). The frequency is changed suddenly from 2 to 4 rad/sec just after 150 seconds.
The BELBIC adapts with the alteration in the reference signal, as shown in Fig. 9(a). In time, it learns how to respond to the modified reference command, maintaining the stability of the EHS system (during the relatively long experiment period). The BELBIC tracks the reference signal with very low error in comparison with the PID controller (as shown in Fig. 9).
It can be seen in Fig. 10 that the energy consumption of the BELBIC is about the same as the PID controller, whilst the BELBIC has less tracking error.  In common industrial applications, the robustness of the controller with respect to changing the operating conditions is a serious problem. To evaluate the capability of the new emotional controller in terms of handling these changes, after 150 seconds the supply pressure of the system is altered. This pressure is gradually decreased from 100 to 50 bars, which causes a change in the control signal generated by the BELBIC and the PID. As shown in Fig. 12, the control signal of the BELBIC is nearly similar to that of the PID. As concluded from Fig. 11, the BELBIC displays good robustness to a change in the dynamics of the system, an acceptable overshoot and a good tracking ability (compared to the PID).

Conclusion
In this paper, a comprehensive investigation was carried out on the modelling, identification and control of a real laboratorial EHS system. System identification was rendered by simply rewriting the system model in an LP form, allowing the use of the recursive least square algorithm for this purpose. Then, an innovative brain emotional controller was employed for position tracking. Before experimental application, the values of the parameters of this controller were determined based on the extracted mathematical model. The performance of the emotional controller has been compared with those of the conventional PID controller, especially with respect to its adaptability to frequency and pressure variations.
In the evaluation of the proposed control method, the following points have to be taken into consideration. As time passes, the BELBIC learns the dynamics of the plant, causing less tracking error than the PID controller, whilst the energy consumption of the BELBIC is similar to that of the PID controller. Another advantage of the BELBIC compared with other controllers in practical applications is in its production of a uniform and smooth control signal. Besides, the computational burden is very light, as seen in Eqs. 1 to 9. It has been reported that the use of other advanced control techniques, while also yielding better performance than PID controllers, has had some disadvantages; unlike the BELBIC they suffer from an increase in control effort, a non-smooth control signal, more difficult implementation and a high computational load.
A main advantage in the performance of the controlled EHS system is in the high degree of the adaptability of the control system and the robustness of the performance with respect to the initial error in relation to modelling and identification (even with a total lack of knowledge about the system model). The main shortcoming of model-free controllers with a learning ability -without any prior knowledge of the system's dynamics, such as through reinforcement learning-based controllers and the BELBIC -is that in the early phases of the learning process, they may cause poor performance because they may produce a wrong control signal. Future research can focus on a hybrid control to solve this problem and accelerate the learning phase.
Undoubtedly, the proposed approach is not an optimal solution; however with simple implementation, reasonable computational effort, fast response and an insensitivity to disturbances, this method proves to be both efficient and acceptable.

Appendix
In this section, the controller design procedure is presented in order to control the angular position of the rotary electrohydraulic servo system according to the intelligent control algorithm introduced in section 2. Consider the direct-adaptive-control strategy as specified in Fig. 7. It was mentioned in Section 2.2 that the BELBIC must be provided with a set of sensory input signals in addition to a reward signal. The sensory input vector selected to solve this problem consists of three signals (Eq. 24) plus a reward signal (Eq. 25). The sensory input vector opted for with the electrohydraulic servo system control weighted the feedback error signal, the reference command signal and the time integral of the commanded voltage signal as its three components. The coefficients are chosen via trial and error. The reward function (Eq. 25) is similar to a proportional-integral control scheme seeking a suitable tradeoff between the quick adjustment of error and the long-term elimination of steady-state error. The term ∫u aims to help the BELBIC to minimize its energy consumption (control effort). The learning rates in the amygdala and the OFC are set as αa=42e-5 and αo=22e-4, respectively. The art of the designer is required to cope appropriately with choosing the system's emotional condition and tuning the learning rates of the system itself in order to obtain the desired goal.