Fuzzy Weighted Least Squares Support Vector Regression with Data Reduction for Nonlinear System Modeling

This paper proposes a fuzzy weighted least squares support vector regression (FW-LSSVR)with data reduction for nonlinear system modeling based only on the measured data. The proposed method combines the advantages of data reduction with some ideas of fuzzy weighted mechanism. It not only possesses the capability of illuminating local characteristic of the modeled plant but also can deal with the problem of boundary effects resulted from local LSSVR method when the modeled data is at the boundary of whole data subset. Furthermore, in comparison of the SVR, the proposed method only utilizes fewer hyperparameters to construct model, and the overlap factor λ can be chosen in relatively smaller value than SVR to further reduce more computational time. First of all, distilling the original input space into several regions with fuzzy partition by applying Gustafson-Kessel clustering algorithm (GKCA) is a foundation for data reduction and the overlap factor is introduced to reduce the size of subsets. Following that, those subset regression models (SRMs) which can be simultaneously solved by LSSVR are integrated into an overall output of the estimated nonlinear system by fuzzy weighted. Finally, the proposed method is demonstrated by experimental analysis and compared with local LSSVR, weighted SVR, and global LSSVRmethods by using the index of computational time and root-meansquare error (RMSE).


Introduction
It is well known that, in a large number of applications such as advanced control, process simulation, fault detection, or other research areas, a significant problem is to construct mathematical model of estimated system only based on its measured data.Some major theories or methods on identifying nonlinear system have been independently developed in various research field including fuzzy system [1], neural networks [2], and other approach [3].However, LSSVR method [4], like SVR [5], also adopts the structural risk minimization and has the better equilibrium between sparsity and modeling accuracy.Furthermore, LSSVR, by substituting a set of equality conditions for complex inequalities ones, translates complex quadratic optimization as a simple linear programming, which is greatly relieve computational load.In literature [6], the power of generalization for LSSVR is no worse than that SVR.Therefore, the LSSVR has been attracting extensive attentions and has obtained successful application like time series prediction [7,8], subspace identification [9,10], signal processing [11,12], and other applications [13,14] during the past few years.In spite of the LSSVR approach, referred to as the global LSSVR (G-LSSVR) approach, has become an effective tool in various applications, and can identify an estimated model whose modeling accuracy is guaranteed by obtaining an appropriate mathematical model [15] and a proper hyperparameter set which usually consists of the two variables: the kernel width () and the penalty factor ().Generally, it is an insignificant for global-LSSVR to derive a well local behavior.We discovers in literature [15,16] that G-LSSVR has also some defective in illuminating local behavior.
Recently, local modeling approaches, as an alternative efficient algorithm, because of their superiorities to identify various areas of estimated nonlinear system, seem desirable.In the literature [17], a local fusion modeling method based on LSSVR and nerofuzzy has been proposed.It employs LSSVR and a learning method named as layering two kind of problem to construct each local model.In the literature [18], it adopts another approach which makes use of interest training point to identify local model instead of all points and by applying vector-norm distance [19] to search for  nearest points.To seek a set of optimal data points, a Euclidian distance measurement method [20,21] is proposed and the local model is set up according to these neighboring points.From the pointview of capturing the local behavior, our aims are to construct their connection between some models and the localized support vector regression (LSVR) method [22] has been proposed.Considering the much more computational time of G-LSSVR, a local grey SVR [23] is developed to speed up the calculational time.Further, by introducing regularization, a general local and global learning framework [24] formulates multiple classifier in each data of neighbours.
Nevertheless, the local modeling approaches or local-LSSVR have more superiorities in identifying local characteristics than that approaches such as global-SVR or LSSVR; it is still unsatisfactory in modeling global capability.First, due to the different criterion to select  nearest neighbor training data in subsets, the better performance for local-LSSVR is not derived when those training data are at the boundary area.Second, because the number of constructing all local models is equal to the size of testing set, the local LSSVR approach generally leads to a heavy computation load [16].Third, it generates boundary effects resulting from local LSSVR method when the modeled data is at the boundary of whole data subset.
Based on the above consideration, our aims present a FW-LSSVR method for nonlinear system modeling based only on the obtained measure data.The paper integrates the superiorities of GKCA, weighted average mechanism, and some ideas from LSSVR.First of all, distilling the original input space into several regions with fuzzy partition by applying GKCA is a foundation for data reduction.Following that, those subset regression models (SRMs) which can be simultaneously solved by LSSVR are integrated into an overall output of the estimated nonlinear system by fuzzy weighted.The proposed method not only possesses the capability of illuminating local characteristic of the estimated models but also can deal with the problem of boundary effects resulted from local LSSVR method.Furthermore, in comparison of the support vector regression (SVR), the proposed method only utilizes fewer hyperparameters to construct model, and the overlap factor  is chosen in relatively smaller value than SVR to further reduce more computational time.Finally, experimental analysis demonstrates that our approach not only overcomes the disadvantages of local LSSVR, weighted SVR, and global LSSVR methods in the process of modeling nonlinear system but also has better root-mean-square error (RMSE) performance and needs less computational time.
The paper is organised as follows: brief descriptions for LSSVR and GKCA in Section 2 are firstly given, the proposed method is introduced in detail in Sections 3 and 4 shows several examples for demonstrating our approach, and Section 5 summarizes the whole paper.

Preliminaries
. .Least Squares Support Vector Regression.It has been shown that the generalization performance for LSSVR presented by [25] is comparable to that of the SVR through a meticulous empirical study [6].Next, we will concisely introduce LSSVR with the following training points, where   ∈   is the input pattern and   is the corresponding target.
( 1 ,  1 ) , . . ., (  ,   ) The LSSVR can be represented for a test input  as Because of adopting the Gaussian kernel width, kernel function (,   ) in ( 2) can further be rewritten as where support-value-vector  = [ where Φ(⋅) represents a feature mapping which nonlinear space is transformed to a high-dimensional linear space and parameter  ∈ R + is regularization constant which governs the relative importance between the data fitting and the smoothness of the solution.Using Lagrange multiplier method for (4) gives rise to an unconstrained optimization problem: In terms of the KKT condition, one derives Consequently, learning process of LSSVR corresponding to (5) is implemented by solving where  = [ . .Gustafson-Kessel Clustering Algorithm.Clustering analysis plays an important role in classification and regression problem.In order to study some important characteristics of complex system, it is crucial for researchers to decompose an original data set into several subsets which is well reflect a system's behavior.Especially, GKCA [26] used for extracting various clustering center in different shape and direction for a larger data set [27] and is superior to conventional FCM.GKCA can be achieved by minimizing the following objective function: is a component from fuzzy matrix ,   is defined by (12),  describes the number of clustering center ^, and it needs to be predefined.In a nutshell, GKCA can be boiled down to the following steps: (1) calculating the cluster centers where  denotes iteration number and  is the number of all data points.
(2) computing   according to the definition of covariance (3) computing the distance (4) revising the components   of fuzzy matrix The iteration stops when the difference between the fuzzy partition matrices  () and  (−1) in the following iterations are lower than .

Fuzzy Weighted Least Squares Support Vector Regression
The paper develops a new method combining respective advantages both global and local learning method to formulate overall framework.The procedures of the proposed FW-LSSVR approach are depicted by Figure 1.
. .Constructing Fuzzy Weighted with Triangle Membership Functions.Applications of fuzzy concepts were early developed by Zadeh [28].A triangular fuzzy number Ã can be parametrized by a triplet (  ,   ,   ), where   and   denote the left and right bounds, respectively, and   represents the mode of Ã.The membership function of the triangular fuzzy number Ã is defined by The -cut   of the fuzzy set Ã in the universe of discourse  is defined by where  ∈ [0, 1].
In generally, fuzzy partition is implemented by some clustering methods and GKCA [26] is common used to decompose the original data set.It discovers that GKCA is superior to that of FCM(fuzzy c-means) and subtractive clustering.GKCA extended the standard FCM algorithms by adopting a flexible distance measure that is calculated using covariance matrices as exhibited in (11).Meanwhile, various difformities and orientation in original data set are detected by GKCA.
In this paper, GKCA is used, in which it is based on the minimization of (8).As stated in Section 2.2, the iteration is to be stopped when the termination criterion is satisfied, namely, ‖ () −  (−1) ‖ < , and an appropriate fuzzy membership matrices is obtained finally.Following that, the cluster centers ]  and spread widthes   are calculated, respectively, as [29] ) where  is the number of training data,  is the number of clusters,   is the degree of membership of   in the cluster ,   is the th training data,  is a feature dimensionality, and ‖ * ‖ measures the distance between two vectors.
From ( 16) and ( 17), the weighted values can be calculated by applying triangle membership functions.In order to derive the weighted values, triangle membership function   (  ) is constructed as follows according to (14): Instead of ( 15), -cut   of the fuzzy set Ã, the overlap factor  is introduced into triangle membership functions to more readily dominate the size of original data subsets, and the degree of fulfilment   (  ) is calculated in terms of By the normalized firing level of the th fuzzy sets, weighted values   (  ) is finally calculated as In some applications [30], Gaussian membership function is adopted = 1, 2, . . ., ;  = 1, 2, . . ., ;  = 1, 2, . . ., where  = 4/ 2 and  ≥ 0 describing the width of the Gaussian fuzzy function, which is usually chosen as a interval [0.3, 0.5].
. .FW-LSSVR with Data Reduction.Takagi-Sugeno fuzzy models [31] have recently become a powerful practical instrument in identifying the complex system.Based on the fuzzy partition, nonlinear description of estimated system can well be expanded into several simple linear descriptions by applying rules of if-then Here  1 ,  2 , . ..,   are the fuzzy membership function of   ,   is corresponding output, and  and  are defined as consequent parameter.
There are the fuzzy sets assigned to corresponding input variables, variable   represents the value of the th rule output, and   and   are parameters of the consequent function.
For the input , fuzzy weighted output f can be summarized as where   () represents the normalized firing strength of the th rule for the th sample and is computed by (20) and (21).
Next, substituting linearizing around a point, the proposed method make use of the subset regression models (SRMs) which are simultaneously solved by LSSVR in each fuzzy partition area.Firstly, the original input data set is divided into several subsets with fuzzy partition.In each region, SRM is independently trained by LSSVR.Based on the obtained centers ]  and the spread width   from ( 16) and ( 17), the old data set is once again decomposed into a new one △  by introducing the overlap factor  to reduce the size of original subsets.We can perform the partition by the following pseudo code: where the overlap factor  is introduced to reduce the size of subsets and obtained a new training set with data reduction.
Then, the obtained new training subsets will be used to construct each SRM  by (3) as follows: where SRM  is termed as the th subset regression model, the parameters   and   are derived by LSSVR approach, and   describes the size of new subset △  .Following that, the weighted values   () computed by ( 21) are combined with the SRM  to form the global predicted output as follows: It is clear from (27) that each SRM  is solved by LSSVR and can be completed simultaneously.As a result, it can largely improve computational efficiency of the proposed method.In brief, the proposed approach can be summarized as follows.
Step .Define the overlap factor  and select the size of clustering subsets  where  is generally selected to 2.
Step .Compute the cluster centers ]  and the spread width   using ( 16) and ( 17) from the obtained matrices  and the training data .
Step .Determine new training subsets △  by (25) based on the overlap factor , the cluster centers, and the spread width.
Step .Set two hyperparameters { and } in the LSSVR.
Step .Construct each subset regression model SRM  () by the LSSVR approach and ( 26) is thus obtained.
Step .Compute the global predicted output according to (26) and ( 27) finally.

Experimental Studies
For the purpose of illustrating our approach, both RMSE (root mean squares error) and computational time are considered by four simulated data experiments.All numerical experiments are carried out on the personal computer with We evaluate the performance of the proposed approach on four benchmark data sets [15].The index adopted for measuring modeling accuracy is selected as Another index is the total computational time for constructing the proposed method and the local running time for constructing those SRMs.The two indexes of the proposed method are compared with G-LSSVR, local-LSSVR, and [15].In addition, the importance of the selected different overlap factor  is also compared.To obtain a fair comparison, their hyperparameters are set as the same values.The local-LSSVR is shortly introduced in the following.Let the training data  = {(  ,   ) |  = 1, 2, . . ., } be obtained by experiment or a real system and   be generated from testing data set and devoted to the test input of predicted output.In the closest regions of   , there are  training inputs to be selected by applying the norm-distance approach.As a result, local-LSSVR models corresponding to all testing output are derived by  training inputs in those regions.
Example .The approximated function is In this function, 501 training points and 1001 testing points are obtained from (29).Due to the use of the proposed approach, there are only two hyperparameters (i.e.,  and ) to be chosen, whereas SVR approach needs to choose three hyperparameters (i.e., , , and ).For comparison, Figure 2 shows the results of the WFA-LSSVR and G-LSSVR method.The two indexes both RMSE and computational time including local-LSSVR, G-LSSVR, [15] and our approach, are summarized in Tables 1 and 2, respectively.Additionally, the importance of selected different overlap factor  is also compared by Table 3.
If we take these tables into account, it discovers that our approach obtains a better nonlinear function approximation comparing to RMSE in Table 1 for G-LSSVR, local-LSSVR and [15].In addition, the running time of the proposed methods in Table 2 is approximately 10-times shorter comparing with local-LSSVR method at least.In other words, the proposed method leads to a less computational time than local-LSSVR.As shown in Table 2, local-LSSVR needs more computational time.The main reason is that the number of required local models is too large and is equal to the size of all testing set.From Table 3 we also see that the comparison results on RMSE and computational time corresponding to the overlap factor  as 1.5 performed better than as 2.5.That is to say, under the circumstances to cover the training data, larger  does not necessarily lead to a better performance.These results confirm the superiority of our proposed method over other methods.Example .The approximated function with two variables was In this function,  1 and  2 equally sampling on interval [−5, 5] are used as training inputs.The number of the training data obtained is 1681 (i.e., 41×41).The number of the used test data is 6561 (i.e., 81 × 81).For comparison, the same indexes in Example 1 are used including our approach, local-LSSVR, and Global-LSSVR and [15] is summarized in Tables 4 and 5. Additionally, the importance of selected different overlap factor  is also compared by Table 6.These results show that the proposed method (WFA-LSSVR) outperforms G-LSSVR, local-LSSVR, and [15].
Furthermore, it demonstrates that the predicted outputs of local modeling approach base on LSSVR lead to the problem of boundary effects under the different number of M=49, 81, 121, and 169, as shown in Figure 3. Figure 4 gives the estimated value of the proposed FW-LSSVR approach with 4, 6, 8 and 10 SRMs.From Tables 4 and 5 we can also see that only slightly worse results (training RMSE) were obtained by our approach than the local-LSSVR method with =169 training data points, but running time of the local-LSSVR is significantly longer, in which has no less than 107.3438 seconds in the experiment.From Table 6, the comparison results on RMSE and computational time corresponding to the overlap factor  as 1.5 performed better than as 3.0.That is to say, under the circumstances to cover the training data, larger  does not necessarily lead to a better performance and conversely a large number of training data points and more computational time are required to construct all Mathematical Problems in Engineering 501 data points are obtained from (31) and V() is a Gaussian noise with variance Var(V)=0.25 that is shown in Figure 5.The number of the used test data is 1001.In order to compare the performance of the proposed method with other approach, the results and the curves are given in Table 7 and in Figure 5, respectively.These results show that the proposed method outperforms G-LSSVR, local-LSSVR, and [15], and RMSE in Table 7 indicates the proposed method had the best generalization performance.In addition, the running time of the proposed methods in Table 8 is approximately 10 times shorter comparing with local-LSSVR method at least.In other words, the proposed method leads to a less computational time than local-LSSVR.As shown in Table 8, local-LSSVR needs more computational time.Additionally, the importance of selected different overlap factor  is also compared by Table 9.
Example .In this section, two hundred and ninety-six simulated data generated from a real Box-Jenkins [32] system are applied to the proposed method.These data points consisted of the gas flow rate signal () and the concentration of  2 which is described as the output of ().Figure 6 shows the training data that include the input signal () and the output signal ().To identify the model, we choose   = [(), ( − 1), ( − 1), ( − 2)] as the input variables and () as the output variable.In this example, 5-folds cross-validation is employed to evaluate the performance.According to the cross-validation method, the (training/testing) RMSE and the computational time of the global LSSVR (G-LSSVR) approach, the local-LSSVR (L-LSSVR) approach with  = 41, 61, 81, and the proposed approach  = 4, 8, 12 SRMs are summarized in Tables 10  and 11.From Tables 10 and 11, there is a little larger RMSE (training) for our technique than that of those approaches, the RMSE (testing) corresponding to our approach is smaller than that of them, and the run time for our technique is smaller than other local modeling approaches but litter bigger that global modeling techniques based on LSSVR.Figure 7 gives the comparisons between the actual output and the predicted output of our techniques.The importance of the selected different overlap factor  is also compared in Table 12.As shown in Table 12, although the RMSE (training) corresponding to the overlap factor  of 3.5 is less than that of 2.8, the RMSE (testing) corresponding to the overlap factor  as 2.8 is less than that of as 3.4.That is to say, generalization performance in relatively smaller value of  outperforms that of the large value.Additionally, in comparison of the [15], the proposed method only utilizes fewer hyperparameters to construct model, and the overlap factor  is chosen in relatively smaller value to further reduce more computational time.

Conclusion
In this paper, a fuzzy weighted least squares support vector regression (FW-LSSVR) method for nonlinear system modeling have been proposed and illustrated based on the advantages of fuzzy weighted mechanism and some ideas from LSSVR.Considering that each training subset is mutually independent, all SRMs can be constructed simultaneously and our method can largely reduce computational time.
As shown in our experimental results, there have better superiorities in calculation time and modeling accuracy for our approach than those approaches such as local or global modeling method.It is noted that, run time for our technique is smaller than other local modeling approaches but litter bigger that global modeling techniques based on LSSVR.Nevertheless, modeling accuracy for our approach has a considerable improvement than other techniques.Furthermore, in comparison with SVR, the proposed method only utilizes fewer hyperparameters to construct model, and the overlap

Figure 2 :
Figure 2: Comparison of the proposed model outputs, Global-LSSVR outputs, and testing output data for Example 1.

Figure 6 :
Figure 6: The simulated data set obtained from Box and Jenkins is used to validate the proposed method.

Table 2 :
Computational time of the proposed method, G-LSSVR and Local-LSSVR with the hyperparameter set {1.5, 1000}, and the overlap factor  as 1.5 are shown for Example 1, where the T-T represents total computational time for building the overall process of the proposed method and L-T represents the computational time for constructing all SRMs.

Table 3 :
Comparison results of the selected different overlap factor  for our approach are shown for Example 1, where  represents the number of SRMs, L-T the computational time for constructing all SRMs, and △  the number of data points for each training subset.

Table 6 :
Comparison results of the selected different overlap factor  for our approach are shown for Example 2, where  represents the number of SRMs, L-T the computational time for constructing all SRMs, and △  the number of data points for each training subset.

Table 9 :
Comparison results of the selected different overlap factor  for our approach are shown for Example 3, where  represents the number of SRMs, L-T the computational time for constructing all SRMs, and △  the number of data points for each training subset.

Table 10 :
Comparison results of the proposed method, [15], G-LSSVR and Local-LSSVR with the hyper-parameter set {25, 1000}, and the overlap factor  as 2.8 are shown for Example 4.

Table 11 :
Computational time of the proposed method, G-LSSVR and Local-LSSVR with the hyperparameter set {25, 1000}, and the overlap factor  as 2.8 are shown for Example 4, where the T-T represents total computational time for building the overall process of the proposed method and L-T represents the computational time for constructing all SRMs.

Table 12 :
Comparison results of the selected different overlap factor  for our approach are shown for Example 4, where  represents the number of SRMs, L-T the computational time for constructing all SRMs, and △  the number of data points for each training subset.+  2 +  3 +  4 +  5 )/5 +  2 +  3 +  4 +  5 )/5