New Approaches to Identification of PWARX Systems

We consider the clustering-based procedures for the identification of discrete-time hybrid systems in the piecewise affine (PWA) form. These methods exploit three main techniques which are clustering, linear identification, and pattern recognition. The clustering method based on the k-means algorithm is treated in this paper. It consists in estimating both the parameter vector of each submodel and the coefficients of each partition while knowing the model orders n a and n b and the number of submodels s. The performance of this approach can be threatened by the presence of outliers and poor initializations. To overcome these problems, we propose new techniques for data classification. The proposed techniques exploit Chiu’s clustering technique and the self-artificial Kohonen neural network approach in order to improve the performance of both the clustering and the final linear regression procedure. Simulation results are presented to illustrate the performance of the proposed method.


Introduction
Hybrid systems have received great attention in the last years since the behavior of a broad class of physical systems interacts continuous and discrete-event phenomena. The hybrid system is governed by continuous differential equations and discrete variables. The continuous behavior is the fact of the natural evolution of the physical process, whereas the discrete behavior can be due to the presence of switches, operating phases, transitions, computer program codes, and so forth. Several classes have been proposed in the literature for the representation of hybrid systems such as jump linear models (JL models) [1], Markov jump linear models (MJL models) [2], Mixed Logic Dynamical models (MLD models) [3,4], Max-Min-Plus-Scaling systems (MMPS models) [5], Linear Complementarity models (LC models) [6], Extended Linear Complementarity models (ELC models) [7], and Piecewise Linear models (PWA models) [8,9]. Only the PWA models are considered in this paper. These models are obtained by decomposing the state-input domain into a finite number of nonoverlapping convex polyhedral regions and by associating a simple linear or affine model to each region. This class of hybrid systems offers several interesting advantages. Firstly, it can approximate any nonlinear system with arbitrary accuracy [10]. Moreover, the properties of equivalence between PWARX models and other classes of hybrid systems allow transferring the results of PWA models to these classes [11]. Therefore, the notion of PWA models can be used to represent complex nonlinear continuous systems. In fact, we can exploit the "divide to reign" strategy which consists in decomposing the domain range of the nonlinear system into a set of operating regions. For each operation region, a linear or affine model is associated. So, the considered complex nonlinear system becomes by modeling as a hybrid system switching between linear submodels. The analysis and control of PWA systems, like any other type of dynamic system, require a mathematical model of its behavior. This model can be defined through a detailed analysis of the phenomena described by the system using the various laws that govern its operation. This approach can lead to very complicated models that cause problems of exploitation and implementation. However, for engineering, a mathematical model must provide a compromise between accuracy and simplicity of operation. A solution to this problem consists in using the identification approach which allows to build a mathematical model from observed input-output data. In the case of PWARX systems, the identification problem is known to be a challenging problem because it involves both the estimation of the parameters of the affine submodels and the hyperplanes defining the partition of the stateinput regression. Several approaches have been proposed in the literature for the identification of PWARX systems. These methods can be classified in numerous categories of solutions such as algebraic solution [12], clustering-based solution [8], Bayesian solution [13], bounded-error solution [14], and sparse optimization solution [15,16]. The clusteringbased solution has been the most popular because it is characterized by its capacity for modeling complex systems and its simplicity of implementation. It uses the following steps to identify the parameters and the hyperplanes: (i) constructing small data set from the initial data set, (ii) estimating a parameter vector for each small data set, (iii) classifying the parameter vectors in clusters, (iv) classifying the initial data set and estimate submodels with their partitions.
It is easy to deduce that data classification represents the main step toward the objective of PWARX system identification because a successful identification of the parameters depends on the correct data classification. The early approaches use classical clustering algorithms for the data classification [8,17,18]. These approaches are characterized by their simplicity of computation and implementation. But they can converge to local minima in the case of poor initializations. Furthermore, their performances degrade when the data are contaminated by the presence of outliers in the data to be classified. Obviously, the use of more powerful clustering algorithms can enhance the performance of these methods. In fact, we suggest to improve the performance of this approach by using other algorithms for data classification such as Chiu's algorithm [19] and the self-adapting artificial kohonen neural Network algorithm [20]. These algorithms allow to reduce the effect of outliers. Moreover, they do not need any initialization. This paper is organized as follows. In Section 2, we present the model and its main assumptions. Section 3 recalls the main steps of the identification of PWARX systems based on clustering technique. Section 4 presents the motivation of using the two proposed methods. In Sections 5 and 6, we describe two algorithms for data clustering allowing to resolve the main problems of the existing methods. The performances of the proposed approach are evaluated and compared through simulation results in Section 7. Section 8 concludes the paper.

Model and Assumptions
In the following, we address the problem of identification of PWARX model described by where (i) ( ) ∈ is the system output, where ( ) ∈ is the system input, and are the system orders, and = + ( + 1).
(v) is a piecewise affine function defined by where = [ 1] , is the number of submodels, are polyhedral partitions of the bounded domain , and ∈ +1 is the parameter vector.
The following assumptions are assumed to be verified.
(A1) The orders and and the number of submodels are known.
(A2) The noise ( ) is assumed to be a Gaussian process independent and identically distributed with zero mean and finite variance 2 .

Identification of PWARX Models Based on Clustering Approach
This section recalls the main steps of the clustering-based approach for the identification of the PWARX models [8,17].
Among the obtained local sets , some may contain only data from the same model as they are called pure local sets, and others can collect data from multiple submodels that are called mixed sets.

Mathematical Problems in Engineering 3
The parameter is chosen randomly as > + 1. It influences decisively on the performance of the algorithm. The optimal value of is always a compromise between two phenomena: the more this parameter is bigger, the more the parameter estimation is improved and the effect of noise is rejected. However, a large value of increases the number of local mixed sets.
For each local set, we can identify an affine model. To accomplish this task, we adopt the least square method to determine the local parameters : Our objective is to classify the vectors in separate classes using a suitable classification technique.
In this paper, three techniques of classification are treated: the -means algorithms where the classification is done by minimizing a suitable criterion [8,21] and Chiu's clustering technique and the self-artificial Kohonen neural network which are detailed in Sections 5 and 6.

Parameters Estimation.
As the obtained data are now classified, it is possible to determine the ARX submodels. We can then estimate the parameter vectors of each submodel , = 1, . . . , using the least square method.

Regions Estimation.
The final step is to determine the regions . The methods of statistical learning such as the support vector machines (SVM) offer an interesting solution to accomplish this task [22,23]. Support vector machines are a popular machine learning method for classification, regression, and other learning task. Originally, the SVM approach was addressed to binary classification, and then it has been extended to multiclass classification. This study is still an ongoing research issue [24,25]. In our case, it is matter of finding for every ̸ = the hyperplane that separate points existing in and in . Given two sets and , ̸ = , the linear separation problem is to find ∈ and ∈ such that This problem can be easily rewritten as a feasibility problem with linear inequality constraints. The estimated hyperplane separating from is denoted with , = , , where , and , are matrices of suitable dimensions. Moreover, we assume that the points in belong to the halfspace , ≤ , .
The regions are obtained by solving these linear inequalities. It is then enough to consider the bounded polyhedron [21]: where ≤ are the linear inequalities describing .

Motivation of Adopting Chiu's Clustering Technique and the Self-Artificial Kohonen Neural Network
The classification is an important step to achieve the objective of PWARX model identification because successful identification of both submodels and partitions depends on the performance of the used clustering technique. In fact, this problem presents an area of research in which few results have been devoted in the past because most of the existing methods for the identification of PWARX models are based on classical clustering algorithms such as -means methods. However, the classical clustering methods even the modified -means algorithms allow only to reduce the influence of outliers and the poor initializations. Consequently, they still suffer from many drawbacks which can be summarized as follows.
(i) They depend on the input signal which must be a persistent excitation to permit to the submodels to have a balanced input [26].
(ii) The parameter must have a small value in order to simplify the computation complexity. However, the best results are generally obtained with a high value of .
(iii) The -means algorithm does not guarantee the convergence toward an optimal cluster, and therefore it can converge to a local minimum. This is due mainly to the randomly initialization step used by this algorithm.
We are interested in using another techniques of classification that can identify and eliminate the misclassified points and can avoid the random initializations. As we have said that we will adopt a similar regression scheme to that of the -means procedure, we will focus then on a way that can separate the local parameters , = 1, . . . , .
Consider, for example, a dispersion of the local parameters as shown in Figure 1, for example, having the following real parameters: Based on the results presented in Figure 1, it is well noted that the local parameters are scattered in a way that they most often get around the real parameters. Therefore, the existence of natural groupings of data points because of the PWA properties is clearly observed. We find that determining the centers of these groupings is an interesting solution for our identification problem. For this purpose, we find in [27,28] a simple and effective algorithm proposed by Chiu for data points clustering. Moreover, we find in the self-adapting artificial Kohonen neural network an interesting and effective way for data classification [20].

Chiu's Classification Method for the Identification of PWARX Systems
Clustering of data forms the basis of many modeling and pattern classification algorithms. The purpose of clustering is to find natural groupings of data in a large data set, thus revealing patterns in the data that can provide a concise representation of the data behavior. Chiu proposed in [27,28] a simple and effective algorithm for data points clustering.

Principle.
Chiu's classification method consists in computing a potential value for each point from the data set based on its distances to the actual data points and consider each data point as a potential cluster center. The point having the highest potential value is chosen as the first cluster center. The key idea in this method is that once the first cluster center is chosen, the potential of all other points is reduced according to their distance from the cluster center. All points near the first cluster center will have greatly reduced potential. The next cluster center takes then the highest remaining potential value. This procedure of acquiring new cluster center and reducing the potential of the surrounding points repeats until the potential of all points falls below a threshold or until reaching the number of required clusters.

PWARX System Identification Based on Chiu's Classification Method.
We now present the use of Chiu's classification method for the identification of PWARX systems. In fact, consider the local parameters obtained by applying the least square method to the grouping obtained by associating with each its ( − 1) nearest neighbors as it is described in the -means procedure. These local parameter vectors ( , = 1, . . . , ) picked out by applying (6) are the objective of our proposed classification technique. Thus, we compute a potential value for each parameter vector using the following expression: where is a positive constant. The potential of each parameter vector is a function of its distances to all other parameter vectors. Thus, a parameter vector with many neighboring data points will have the highest potential value. The constant is the radius defining the neighborhood which can be determined by the following expression: where can be chosen such as 0 < < 1.
Since from the set of parameter vectors ( , = 1, . . . , ) there are some parameters obtained from mixed local sets, it is clear that we have interest in eliminating them. Equation (10) can be exploited to eliminate the misclassified parameter vectors. As this equation attributes to the outliers a low potential, we can fix a threshold under which the parameter vectors are not accepted and then removed from the data set. This threshold is described by the following equation: where 0 < < 1.
After this treatment the set of parameter vectors is filtered and reduced to ( , = 1, . . . , ) ( < ). Then, from this new data set we select the parameter vector with the highest potential value as the first cluster center. Let * 1 be this first center, and let * 1 be its potential. The potential is then updated by this formula The parameter vectors near the first cluster center will have then a reduced potential, and so they are unlikely to be selected as the next center. The parameter is a positive constant that must be chosen larger than to avoid obtaining closely spaced cluster centers. The constant is computed using this formula In general after obtaining the th cluster center, the potential of every parameter vector is updated by the following formula: where * and * are, respectively, the potential and the center of the th parameter vector. This work is then repeated until obtaining potential and centers. The obtained centers are our sought parameter vectors. Now, after obtaining centers it matters to search the elements belonging to each cluster. So, we calculate the distance between the estimated output and the real one and classify ( ) into the cluster whose distance is the minimum: argmin ( − ) , = 1, . . . , .

Properties.
The new clustering technique has several interesting properties which can be summarized as follows.
(i) This method does not require any initialization of centers. Therefore, the problem of convergence towards local minima is overcome.
(ii) This method removes the misclassified parameter vectors ( ) from the data set and repeats the overall identification procedure on the reduced set of data points. The outliers can be removed thanks to (10) that associates a low potentials with these parameter vectors.
(iii) The choice of the parameter is more flexible. In fact, we can improve the performance with high value of .

PWARX Identification Using a Self-Adapting Artificial Kohonen Neural Network
6.1. Principle. The Kohonen neural network is an interesting and effective tool for data classification [20]. The self-organizing Kohonen map is an oriented artificial neural network, consisting of two layers. In the input layer, the neurons correspond to the variables describing the observations. The output layer is, generally, organized as a grid (map) of neurons with two dimensions. Each neuron represents a group of similar observations. The Kohonen network is a technique for automatic classification (clustering, unsupervised learning). The objective is to produce a group so that the members in the same cluster are similar and the members located in different clusters are different. Z z 1 z 2 z i z n · · · · · · · · · · · · Y y(k) y(k − 1) y(k − j) y(k − p + 1) The neural network used in the proposed method is formed by one input layer of neurons and by one output layer of neurons. The architecture of this network is given by Figure 2 [29]. Each neuron of the Kohonen card receives signals coming from the input layer. The weight is relative to the connection between the input neuron and the output neuron . The weight vector associated with neuron is then composed of elements.
A Kohonen card computes the euclidian distance between an input and its weight vector .
Kohonen learning uses a function , whose value ( , ) represents the strength of the coupling between neuron and neuron during the training process. 6.2. Algorithm. The learning algorithm for Kohonen networks is shown in Algorithm 2.

PWARX System Identification Based on the Kohonen Neural Network Method.
Our purpose is to exploit the Kohonen self-organizing map to identify PWARX systems. Therefore, consider a collection of data points ( , = 1, . . . , ) obtained by (6) of the -means-based procedure. We propose, as a treatment to eliminate the outliers, to apply (10) of the Chiu's clustering technique. After obtaining the filtered set of data points , = 1, . . . , ( < ), the step of data classification will be done by using the Kohonen neural network algorithm while taking an input vector = .
The output layer is then formed by neurons ( is the submodels number), and are the clusters' centers. After obtaining cluster centers , we have to define the elements belonging to each original cluster partitioning the regressor input. To perform this task, we calculate the distance between the estimated output and the real one and classify into the corresponding cluster according to the following formula: argmin ( − ) , = 1, . . . , .

Simulations Results
We now present two simulations examples to illustrate the performance of the proposed approaches.

Quality Measures.
Hence, the objective of the simulations is to compare the performance of the proposed methods with that of the modified -means approach. The following quality measures are used to study the performance of each method [30].
(i) The maximum of relative error of parameter vectors is defined by where and are, respectively, the true and the estimated parameter vectors for submodel . The identified model is deemed acceptable if Δ is small or close to zero.
(ii) The averaged sum of the squared residual errors is defined by where SSR = ∑ ( ( ), ( ))∈ ( ( ) − [ ( ) 1] ) 2 and | | is the cardinality of the cluster . The identified model is considered acceptable if 2 is small and/or close to the expected noise variance of the true system.
(iii) The percentage of the output variation, that is explained by the model, is defined by wherêand are, respectively, the estimated and the true outputs' vectors and is the mean value of . The identified model is considered acceptable if FIT is close to 100.

Example 1.
Consider the following PWARX system [17]: We evaluate the performances of the proposed algorithms (Chiu-based algorithm and Kohonen-based algorithm) and the -means algorithm by using the same identification data. Figure 3 presents the input and the real output of the system.
The parameter defining the cardinality of the local data sets is chosen as follows: -mean algorithm ( = 6), Chiu's-based algorithm ( = 15), and Kohonen-based algorithm ( = 20). Taking into account the parameter appropriately chosen, each algorithm generates a sequence of local parameters. These local parameters are then classified into three sets, and the center of each set is also determined as it is shown in Figure 4. The centers of each set are depicted by the star symbols.
Based on the results presented in Figure 4, we observe that the outliers are removed by the proposed methods. However, the modified -means method has preserved the outliers.
After obtaining the estimated parameter vectors, we apply the SVM algorithm in order to estimate the regions. We can then attribute each parameter vector to the corresponding region where it is defined.
The estimated output obtained with three algorithms is presented in Figure 5, and the estimated parameter vectors are illustrated in Table 1. Table 2 presents the quality measures of (18), (19), and (20) of the two proposed methods and the -means approach.
Based on the results presented in Tables 2, 1, and Figure 5, we observe that the proposed methods give better performances than the -means method. The reason is that the proposed method reduces the influence of outliers and does not require any arbitrary initialization.

Example 2.
Consider the following PWARX model [31]: The input signal ( ) and the noise signal ( ) are random sequences from the normal distribution with variances, respectively, 0.5 and 0.05. For the -means algorithm, the parameters of the affine submodels are estimated by minimizing the criterion function. Therefore, the optimization algorithm has the drawback of getting trapped in a local minimum, and poor results can be obtained. In addition, all submodels must have a balanced excitation, but this is not always guaranteed. Thus, we cannot apply the Monte Carlo simulation to the -means algorithm. Only the algorithm based on Chiu'-clustering technique and the Kohonen neural network-based one are considered in this example.
We carry on this model a Monte Carlo simulation of size 100 with different noise realizations and different input excitations. The size of data generated in each simulation is = 250. We follow the same procedures described above. The estimated parameter vectors are illustrated in Table 3.
The quality measures are computed by the mean of the 100 measures of each simulation. They are presented in Table 4.

Conclusion
In this paper, we have considered only the clustering-based procedures for the identification of PWARX systems. We focused on the most challenging step which is the task of classification of data points. The clustering-based procedures require that the model orders and , and the number of submodels is a priori fixed. The parameter , defining the cardinality of the local data sets, is the main tuning knob.
The clustering method based on the -means algorithm treated in this paper showed that it performs poorly if the number of mixed local data sets is high. The increase of the number of mixed local data sets which depends on the chosen parameter leads to the presence of outliers. Added to that the poor initializations lead the algorithm to converge to local minima.
To overcome these problems, we have proposed two techniques of classification. The first is Chiu's clustering technique and the second is the Kohonen neural networkbased one. The proposed methods can guarantee optimal classification.
The main problem to deal with is the classification data points that are consistent with more than one submodel, namely, data points lying in the proximity of the intersection of two or more submodels. Wrong attribution of these data points may lead to misclassifications when estimating the polyhedral regions.
Finally, the choice of persistently exciting input signals for identification (i.e., allowing for the correct identification of all the affine dynamics) is another important topic to be addressed. Moreover, when dealing with discontinuous PWARX models, the choice of the input signal should be such that not only all the affine dynamics are sufficiently excited but also accurate shaping of the boundaries of the regions is possible.