Data-driven Learning Algorithm of Neural Fuzzy Based Hammerstein-Wiener System

A novel data-driven learning approach of nonlinear system represented by neural fuzzy Hammerstein-Wiener model is presented. The Hammerstein-Wiener system has two static nonlinear blocks represented by two independent neural fuzzy models surrounding a dynamic linear block described by finite impulse response model. The multisignal theory is designed for employing Hammerstein-Wiener system to separate parameter learning issues. To begin with, the output nonlinearity parameters are learned utilizing separable signal with different amplitudes. Furthermore, correlation analysis method is implemented for estimating linear block parameters using separable signal inputs and outputs; thereby, the interference of process noise is effectively handled. Finally, multi-innovation learning technology is introduced to improve system learning accuracy, and then, multi-innovation extended stochastic gradient algorithm is obtained for optimizing input nonlinearity and noise model using multi-innovation technique and gradient search method. The simulation results display that presented data-driven learning approach has the availability of learning Hammerstein-Wiener system.


Introduction
The real industrial processes are almost nonlinear systems to some extent, and linear approximation means are usually unacceptable, and nonlinear models should be taken into account that they can present the nonlinearity successfully. For this, block-oriented nonlinear models which are composed of linear dynamic block and static nonlinear functions for instance Hammerstein model and Wiener model have been performed on account of their simple structures. The two nonlinear models can approximate nonlinear dynamics of many practical industrial processes applications [1][2][3][4][5][6][7].
In the past few years, many theoretical researchers and engineers have been performed for extensions of the Hammerstein and Wiener models to improve approximation capabilities of nonlinear systems for instance Hammerstein-Wiener system. In the existing literatures, a lot of optimization techniques have been developed to research the Hammerstein-Wiener system [8][9][10][11][12][13][14][15]. For Hammerstein-Wiener system, the least square algorithm and blind identification method are put forward by Bai in [8,9]. In literature [10], recursive parameter learning method are developed for a special nonlinear form described by a Hammerstein-Wiener nonlinear system including dead-zone input nonlinear function. Vörös [11] applied least square-based iterative technique to research Hammerstein-Wiener model parameters using measured input-output data. Xu et al. [12] used two extreme learning machine networks to approximate nonlinear functions of Hammerstein-Wiener system, and parameter estimation method of extreme learning machine-based Hammerstein-Wiener system is developed for large-scale complex nonlinear dynamic systems. The major drawback of the above-analyzed literatures is that the unmodeled dynamic or stochastic disturbances of the Hammerstein-Wiener process is not taken into account, which is an important factor for designing significant parameters learning algorithms [16,17].
The stochastic gradient estimation methods have attracted much attention due to its less computational load in parameter learning. In recent years, the stochastic gradient-based algorithms have also been implemented to optimize Hammerstein-Wiener models corrupted by stochastic disturbances. For the Hammerstein-Wiener ARMAX system, Wang and Ding [18] investigated extended stochastic gradient estimation. Mansouri et al. [19] developed parameter estimation method through extended Kalman filter theory. Based on data filtering technique, a data filtering-based generalized extended stochastic gradient algorithm is derived for estimating Hammerstein-Wiener system parameters for improving computational efficiency in [20]. It is recognizable that two main problems should be considered in these proposed methods. On the one hand, methods mentioned above assume that unknown nonlinearities in systems are modeled by polynomial functions; if these nonlinearities are not polynomial functions or nonsmooth, methods mentioned do not converge [21]. On the other hand, the parameter crossproducts of estimated system are included in learned system, thereby separating each block parameters from obtained parameter estimation of the cross-product terms is required, which increases computation load of learning algorithms [22].
Although many contributions in existing literatures have developed to learn nonlinear system represented by Hammerstein-Wiener model, the problem of stochastic disturbances is not fully considered. This paper focuses attention on a three-stage parameter learning approach of the Hammerstein-Wiener nonlinear systems with stochastic disturbances using multisignal data. In the first stage, the output nonlinearity are estimated depending on separable signal with different amplitudes. In phase two, correlation analysis method is implemented for estimating the linear dynamic block parameter according to one of separable signal inputs and outputs. In the third stage, in order to achieve a fast convergence rate of stochastic gradient algorithm, multiinnovation-based extended stochastic gradient scheme by expanding the scalar innovation to an innovation vector is used to learn parameters of input nonlinearity and noise model. The contributions of developed learning approach lies in: (1) Multisignal theory is designed to employ the Hammerstein-Wiener system to separate parameter learning issues, thereby avoiding redundant parameters (2) The unmeasurable problems of Hammerstein-Wiener system are well settled by using correlation analysis method (3) The multi-innovation-based extended stochastic gradient scheme by expanding the scalar innovation to an innovation vector is used to achieve a fast convergence rate The paper is organized as follows. Section 2 introduces problem statement of the neural fuzzy Hammerstein-Wiener system. Section 3 analyzes parameter learning based on multisignal data for the Hammerstein-Wiener systems with stochastic disturbances. Section 4 presents simulation cases of presented learning method. Lastly, the concluding remark is approached.

Preliminaries and Problem Statements
As described Figure 1, the nonlinear system represented by Hammerstein-Wiener model with disturbance is modeled by two neural fuzzy networks and finite impulse response model, which is formulated by where uðkÞ and yðkÞ denote input and output, vðkÞ and xðkÞ represent outputs of input nonlinearity and linear block, eðkÞ indicates white noise sequence, f ð⋅Þ shows input nonlinearity, gð⋅Þ is output nonlinearity, BðzÞ is finite impulse response model with BðzÞ = b 1 z −1 + ⋯+b n b z −n b , and DðzÞ = 1 + d 1 z −1 + ⋯+d n d z −n d is noise model. For the given parameter ε, the establishment of the Hammerstein nonlinear system is to seek parameters satisfying the following conditions: where "∧" is estimate and N represents length of measured data. From the perspective of easy analysis, the output nonlinearity is expressed byẑðkÞ =ĝ −1 ðyðkÞÞ. In this research, input nonlinear function and output nonlinear function are modeled using two independent neural fuzzy networks [23]. Figure 1 exhibits the neural fuzzy network, and its output is expressed aŝ where Journal of Sensors where μ l = exp ð−ððuðkÞ − c l Þ 2 /σ 2 l ÞÞ, w l represents weights of neural fuzzy network, c l and σ l are the center and width, and L is the number of fuzzy rules.
Moreover, expressions of input-output nonlinear blocks are provided where "input" refers to input nonlinearity and "output" means output nonlinearity.

Learning Approach of Neural Fuzzy Hammerstein-Wiener System with Moving Average Noise
The tasks of parameter learning method are to estimate Hammerstein-Wiener system parameters, that is, two nonlinear blocks, linear block, and noise model. Previous research [24] pointed out that the separable signals are employed to realize separation identification of nonlinear block and linear block for the Hammerstein model. Inspired by this work, the separable signals are extended to present Hammerstein-Wiener system with unknown disturbance.

Theorem 1.
Considering a type of Hammerstein-Wiener system, when the separable signals are used as input signal, then the following expression maintains.
The proof can be done by referring to previous method in [23], hence omitted here.
According to Theorem 1, cross-correlation function R vu ðτÞ is taken over by autocorrelation function R u ðτÞ utilizing separable signal. Therefore, the unknown variable vðkÞ in Hammerstein-Wiener system is solved. are learned using previous cluster method [24]. Now, a crucial problem needs to be solved for learning parameters w output l . Under the condition of two groups of separable signal with multiple relation, we can obtain following output nonlinearities: From Equation (2) to Equation (4), we derive Using u 1 ðk − τÞ and u 2 ðk − τÞ to multiply Equation (14) and Equation (15), respectively, the relation of correlation function are as below. 3
Using Equation (18) and Equation (19) acquires where β = ðλ 2 b 01 Þ/b 02 and λ = u 2 /u 1 . Equation (20) is given by According to Equation (12), Equation (13) and Equation Equation (23) Assuming τ = 1, 2, ⋯, PðP ≥ L output − 1Þ, and defining the following cost function according to Equation (20), Based on least square method, parameter θ is estimated where b θ = ½w ,⋯,w output L output T is estimation, and The correlation functions R ϕ output l,1 u 1 ðτÞ and R ϕ output l,2 u 1 ðτÞ are given by Taking the derivative of Equation (26) obtains 3.2. Learning Parameters of the Linear Block. The measured input-output data of separable signal are implemented to optimize linear block relying on correlation analysis method.

Journal of Sensors
Using Equation (4) gets According to Equation (14)-Equation (18), we have whereb j = b 01 b j , b 01 = Eðv 1 ðkÞu 1 ðkÞÞ/Eðu 1 ðkÞu 1 ðkÞÞ: Using Equation (32) gets where Defining the following criterion function: Taking derivative of Equation (43) obtains Let ∂Eðθ 1 Þ/∂θ 1 = 0, we get The correlation functions R z 1 u 1 ðτÞ and R u 1 ðτÞ are presented by For convenience, the above equation is described as below: where The quadratic cost function is defined as Based on negative search theory, minimizing It is worth emphasizing that the algorithm in Equation (44) and Equation (45) is not carried out due to unknown noise terms eðkÞ in φðkÞ. In order to solve this issue, a feasible method is to use noise estimation, that is, replacing unmeasurable noise terms eðkÞ with corresponding estimatesêðkÞ.
The estimateêðkÞ is expressed aŝ where

Journal of Sensors
As a consequence, the following algorithm is obtain: As is known to all, stochastic gradient algorithm has poor convergence rate. To improve convergence rate, an effective method is to use multi-innovation learning theory by expanding the scalar innovation to an innovation vector [25], which uses not only the current data but also past data at each recursive computation.
Set the length of p from t = k − p + 1 to t = k and define cost function as below.
Using stochastic gradient and minimizing Jðθ 2 Þ gets where p is innovation length. It is similar to extend stochastic gradient method, replacing unknown variables φðk − tÞ in Equation (50) by their estimates b φðk − tÞ, and then, the following approach combining multi-innovation theory with stochastic gradient technique is accomplished: Use input and output of separable signals: u 1 (k) and y 1 (k) Use random signals: u 2 (k), y 2 (k) Set the newest p data, j = k -p+1 to j = k Computer r (k)

Update parameter estimation 3 (k)
Computer e (kj)ˆĈ omputer s (kj), e (kj) and form (kj) Calculate R z1u1 ( ) and R u1 ( ) and get R and   Journal of Sensors From the above analysis, the flowchart of developed datadriven learning method is shown in Figure 2.
Remark 2. The proposed three-stage parameter learning approach estimates independently each block parameters of identified Hammerstein-Wiener system using designed multisignals, which avoids the redundant parameters of the system. In contrast, other algorithms like blind parameter identification method [9], extended stochastic gradient identification algorithm [18], and modified bias-eliminating least square algorithm [26] estimate parameters in the product term form, and they need another algorithms such as singular value decomposition method and average method to separate the hybrid parameters. Therefore, the computational complexity of these approaches increases.

Numerical Examples
For the developed learning approach, two kinds of multisignals are designed, and numerical cases of nonlinear system represented by Hammerstein-Wiener model with disturbance are applied into certificating the availability.
The designed multisignal data consist of two sets of Gaussian signals and random signals, including Gaussian signals with mean value of 0 and variance of 1, the mean To begin with, parameters of output nonlinear block are learned with the aid of collected input-output data of two sets of Gaussian signals using least square method. Set the parameters as below: S 0 = 0:99, ρ = 1, and λ = 0:01. The estimation of output nonlinearity is depicted in Figure 3. From Figure 3, neural fuzzy networks can well approximate the output nonlinearity by means of developed parameter learning approach.
Moreover, based on input-output data of Gaussian signals with variance of 1, the CA (correlation analysis) algorithm and RELS (recursive extended least square) algorithm [20] are implemented for optimizing linear block. Figure 4 shows error comparisons using CA method and RELS method of different noise-to-signal ratios. The CA algorithm uses cross-covariance function between input and output variables and auto covariance function of input variables to learn the model parameters, which can effectively handle noise interference and improve learning accuracy. From Figure 4, with noise-to-signal ratio increases, the CA algorithm has higher precision than RELS method.
Finally, on the basis of measured input-output data of random signals, parameters of input nonlinearity and noise model are learned adopting S 0 = 0:9, λ = 0:01, and ρ = 1. Figure 5 displays the approximation of the input nonlinearity with different innovation length. Figure 6 shows estimate of noise model. According to Figure 5, it is evident that presented learning method can effectively model input nonlinearity and obtain small approximation error with p increases. According to Figure 6, with the increase of p, the noise model estimate is closer to real value. As a consequence, the introduction of innovation length in developed algorithm can obtain fast convergence rate. This demonstrates that presented three-stage method can accurately learn the Hammerstein-Wiener system.

Numerical Example 2.
In view of a class of Hammerstein-Wiener system with disturbance whose input nonlinearity is discontinuous function: where eðkÞ is noise sequence.   8

Journal of Sensors
The designed multisignal data consist of two sets of binary signals and random signals, including the amplitudes of the two binary signals are 2 and 4, respectively, and interval of random signal is 0 to 5.
The parameters of output nonlinearity are calculated with the aid of collected input-output data of two sets of binary signals using least square method. Set the parameters as below: S 0 = 0:99, ρ = 1, and λ = 0. The estimation of output nonlinearity is described in Figure 7. From Figure 7, the neural fuzzy networks can well approximate the output nonlinearity with the help of developed parameter learning approach.
In addition, using data of binary signals whose amplitude is 4, the CA algorithm and RELS algorithm are used. Figure 8 gives error comparisons using two methods in presence of different noise-to-signal ratios. The CA method can effectively deal with the process noise disturbance, so it achieves good parameters learning results. As can be evidently seen from Figure 8, the CA method can more effectively obtain linear block parameters and have better robustness than RELS method.
Lastly, on the basis of measured input-output data of random signals, parameters of input nonlinearity and noise model are learned adopting S 0 = 0:92, λ = 0:01, and ρ = 1. Figure 9 displays the approximation of the input nonlinearity with different innovation length. Figure 10 lists estimate of moving average noise model for different innovation length.
Multi-innovation learning theory is combined with stochastic gradient technique to jointly improve convergence rate by expanding the scalar innovation to an innovation vector. According to Figure 9, it is recognizable that presented learning method can effectively model input nonlinearity and obtain small approximation error with p increases. According to Figure 10, the noise mode estimate is closer to real value with larger innovation length.
Remark 3. For more complex Hammerstein-Wiener system with unknown disturbance in example 2, its input nonlinearity is a discontinuous function; the learning accuracy of parameter learning method proposed is reduced. In addition, it is a common knowledge that convergence rate of stochastic gradient algorithm is poor, the parameter estimation results fluctuate greatly owing to the less information in data used. With data length increases, more data information are used in parameter learning; thus, the fluctuation decreases gradually.