LS-SVM APPROXIMATE SOLUTION FOR AFFINE NONLINEAR SYSTEMS WITH PARTIALLY UNKNOWN FUNCTIONS

. By using the Least Squares Support Vector Machines (LS-SVMs), we develop a numerical approach to find an approximate solution for affine nonlinear systems with partially unknown functions. This approach can obtain continuous and differential approximate solutions of the nonlinear differential equations, and can also identify the unknown nonlinear part through a set of measured data points. Technically, we first map the known part of the affine nonlinear systems into high dimensional feature spaces and derive the form of approximate solution. Then the original problem is formulated as an approximation problem via kernel trick with LS-SVMs. Furthermore, the approximation of the known part can be expressed via some linear equations with coefficient matrices as coupling square matrices, and the unknown part can be identified by its relationship to the known part and the approximate solution of afﬁne nonlinear systems. Finally, several examples for different systems are presented to illustrate the validity of the proposed approach.


1.
Introduction. Affine nonlinear system is an important type of nonlinear systems and it has numerous applications in mechanics, physics, biology, aerospace and various engineering fields. In recent decades, the investigation of the affine nonlinear systems has obtained substantial progress [5,11]. Typically, an affine nonlinear system is generally described byẋ where f (x) and g(x) are nonlinear functions and u(t) is the control input. System (1) has much significance in analyzing and designing the nonlinear systems and thus finding its solutions becomes very important. If f (x), g(x)u(t) are known, we generally regard the affine nonlinear systems as general nonlinear Ordinary Differential Equations(ODEs), and one can obtain their numerical solutions for low-index problem through some typical numerical methods, such as Euler method, Runge-Kutta method [4], Adomian decomposition method [18]. A direct computational method by Chebyshev series [1] is developed to obtain solutions for high-index problem.
Some other techniques have developed to produce continuous approximation solutions. Linearization method [12] has been developed to yield piecewise analytic solutions. Many researchers also proposed some methods based on the neural network models; see [7-9, 13, 16]. In detail, [8] introduced Hopfield neural network models to solve the first order differential equations. [7] presented a method based on function approximation with feed forward neural networks. Although the neural networks have nice properties such as universal approximation, they are limited for broad applications due to some inherent drawbacks, such as the multiple local optimal solutions, the difficult choice for the number of hidden units and the danger of over fitting. For this reason, a new method based on the Least squares support vector machines (LS-SVMs) has been proposed to solve ODEs in [10]. In fact, SVMs are proposed in [17] as a newly developed technique based on statistical learning theory. It adopts the structure risk minimization principle, which avoids local minimum and can effectively solve the over-fitting problem. It guarantees good generalization capability as well as providing better predict accuracy. The LS-SVMs are an extension of SVMs and have good performance in function identification, estimation and robust regression [3,14,15]. The LS-SVMs can achieve the global solution by solving a set of linear equations instead of a quadratic programming problem for general SVMs, which allows the LS-SVMs to be faster than SVMs and this reduces the computational costs significantly. Further, the LS-SVMs regression requires model parameters far less than methods based on neural networks. It should be noted that all methods mentioned above only solve the initial value and boundary value problems for differential equations. These methods could not be directly applicable to the nonlinear system with an unknown part. From the point view of control theory, it is necessary to consider the approximate solutions of nonlinear systems with unknown control part by utilizing some known sampled observations. The motivation for assuming that the control part is unknown in this paper is based on the following observations in practical systems: one is that the control sensors/actuators usually have some noises which can be modeled by an unknown part and the other reason is that the affine model is a popular model in many control systems.
In this paper, by considering better generalization capability of the LS-SVMs, we propose a new method based on LS-SVMs for finding approximate solutions in closed form (i.e., explicit, continuous and differential) to the affine nonlinear system (1) with known function f (x) and unknown function g(x)u(t) when a set of measured data points of states are known. And also, we can derive the approximate solutions for the unknown part by its relationship to the known part and the approximate solution of affine nonlinear systems.
In technique, we first divide the nonlinear part into known part and unknown part and then map the known part into a high dimensional feature space. The known part can then be expressed as some linear form whose coefficient matrices are coupling square matrixes. Finally, we utilize these feature maps to transform x(t), f (x) into the same feature space and then g(x)u(t) is expressed by its relationship to the approximation of f (x) and the derivative of x(t). Finally we solve them as a linear regression problem.
This paper is organized as follows. Section 2 presents a brief review of the least squares support vector regression problem. Some notations and preliminaries are given in section 3. In Section 4, we present our main results and discuss the selection of tuning parameters. In section 5, some examples are given to illustrate the validity of the proposed method. Section 6 gives the conclusions.
2. LS-SVM function estimation. As presented in [14] for the LS-SVM as a function estimation, the equation used to represent the identified model is described by where ω ∈ R n h , b ∈ R, φ(·) : R n → R n h is a nonlinear function which maps the input space into a high dimensional feature space. For given a training set{x k , y k }(k = 1, · · · N ) with x k ∈ R n , y k ∈ R being the input data and output data respectively, we aim to find the parameters ω and b for better approximation of the training data set with the model. The optimization problem of LS-SVM can be formulated as follows where γ ∈ R + , b ∈ R, and φ(·) : R n → R n h is a feature space mapping and n h is the dimension of feature space. In order to solve the optimization problem in equation (3), we define the following Lagrangian function: where α k is Lagrange multipliers as support vectors. With the Karush-Kuhn-Tucker conditions [17] one can calculate the partial derivatives of the Lagrangian: After eliminating ω and e k , the solution can be given as in a linear form:  where 1N = [1, ..., 1] T ∈ R N , α = [α 1 , α 2 , · · · , α N ] T and y = [y 1 , y 2 , · · · , y N ] T . From application of the Mercer condition [17], one can obtain Ω = K( Finally the identified model approximate solution can be represented aŝ where α k and b are the solution to equation (5), K(x, x k ) is kernel function which will be selected in advance as explained in next section.
3. Notations and preliminaries. The following notations and definitions will be used throughout this paper. By utilizing the Mercer's condition [17], we adopt the defined method of differential operator in [10]. For example, if we choose the Gaussian kernel function as K(x, y) = exp(− (x−y) 2 2σ 2 ), then the following relations hold, Remark 1. In order to keep the notations as precise as possible, we utilize the same feature map φ(t) for all the states, i.e.,x l (t) = ω l T φ(t)+b l and use the same feature map ϕ(x i ), i = 1, · · · , m, to transform f (x) to a linear combination form, and then g(x)u(t) , which can be expressed by gu(x, t), can be written as an expression (9) in next section.
Nevertheless it is possible to use different mapping functions, as long as the corresponding kernel function satisfying the Mercer's condition [17], and in this case one may have more model parameters. 4. Problem formulation and main results. Now we consider the following affine nonlinear systems:ẋ where the state vector x = [x 1 , · · · , x m ] T ∈ R m is assumed completely measurable in given data points, i.e., the control input and f (x) is assumed to be known and g(x)u(t) is assumed to be unknown, and they are in the following form, Also we assume to have some observed samples for the state x(t) in some interval as stated below. The aim of this paper is to find an approximate solutionx(t), obtain the approximation of known part f (x) and identify the unknown part g(x)u(t) for system (8). In order to achieve these goals, we assume that a general approximate solution of (8) is of the form ofx l (t) = ω T l φ(t) + b l . Further, we also map f (x) into a high dimensional feature space and express it as linear combination forms. That is to say, f (x) and g(x)u(t) can be expressed as below.
where [ω lk ] m×m the coupled matrices and ω l ,ω lk , b l ,b l are parameters of the model that have to be determined. In order to obtain the optimal value of these parameters, the collocation method [6] is used here in which we assume that Therefore the adjustable parameters ω l ,ω lk , b l ,b l ; l, k = 1, · · · , m, can be found by solving the following optimization problem: where N is the number of collocation points (which is equal to the number of training points) used to undertake the learning process. With LS-SVMs, we utilize the nonlinear feature map φ(x) to map the training data into a feature space and transform the nonlinear problem into a linear problem. Therefore in order to solve the problem (10), we make use of two nonlinear feature map functions φ(t), ϕ(x) to map X(t), f (x) into their feature spaces and transform the original problem into a linear problem which can be solved in the framework of LS-SVMs. For notational convenience, let us list the following notations which are used in the following sections. K 1 , K 2 represent two different types of kernel functions denoted as follows: Remark 2. The LS-SVM is used for dynamic data in this paper and it is usually used for static data in machine learning [2]. One can find that the objective function (10) includes the derivative item of state, which is different from the case for static data. This difference is that the derivative of the kernel function is introduced in the solution.
Now we consider optimization problem (10). Letx l (t) = ω T l φ(t) + b l , and now we introduce additional unknowns gu l (x i , t i ), l = 1, · · · , m; i = 1, · · · , N ; into the optimization goal. Then let gu(x i , t i ) optimally approximate the exact value in discrete points by optimizing the goal. Thus, gu l (x i , t i ) together with the parameters of framework of LS-SVMs constructs the following optimization problem.
In order to write the final solution in an elegant form, let∇ ij = Ω 1 0 (t i , t j ) , we have where Ω is a kernel function. Furthermore, we can also identify g(x)u(t) , and the approximation of f (x) can also be obtained as follows.
It should be noted that approximate solution in (13) can be obtained by solving the linear equation (12). This is the advantage of LS-SVM. Once we obtain the approximated solution, g(x)u(t)can be identified easily with (14).The efficiency of the LS-SVMs model depends on the choice of the tuning parameters. All examples of this paper use the Gaussian functions as the kernel functions. Therefore a model is determined by the regularization parameter γ and the kernel bandwidth σ . We obtain these parameters by using the fast leave-one-out cross-validation [2].
Remark 3. In order to make the tuning parameters as simple as possible, we use one common mapping function in the three groups of parameters. The results show that this simple selection will not affect the performance significantly.

5.
Simulation results. In this section, four examples are used to illustrate the validity and viability of the proposed method for solving the affine nonlinear systems. In these experiments, the observed sampling data points are obtained from the analytic solution or MATLAB built-in solver ode45 whose accuracy is much reasonable. The accuracy of an approximate solution is measured by means of mean square error (MSE) defined as follows: where M is the number test points. In all the examples, M is set to 100 points in the given domain. In order to illustrate the proposed method more precisely, we also calculate the real time difference E(t) = x(t) −x(t) if possible. Further, we also consider points outside the training interval in order to obtain an estimate of the exterior performance of the obtained solution. As to the known and unknown nonlinear parts, i.e.,f (x), g(x)u(t), we identify, approximate and compare them to the exact nonlinear functions f (x), g(x)u(t).
Remark 4. Note that f (x) and g(x)u(t) are both the functions of variable t , because x is function of t . So in simulation, f (x), g(x)u(t) are regarded as the functions of t, i.e., f (t), gu(t).
Its exact solution is If we assume that f (x) = 2x is known, and g(x)u(t) = −x 2 + 1 is unknown.The approximate solution obtained by the proposed method is compared with the exact solution (see Figure 1 ) and the identification g(x)u(t) and approximation of f (x) are compared with the exact nonlinear parts for exact solutions (see Figure 2). The results with different number of training samples are recorded in Table 1, from which, it is apparent that the approximate solution converges to the exact solution as N increases, where N is the number of training samples.      with f (y) = y 2 and g(y)u(t) = t 2 , we can obtain the training data by MATLAB built-in solver ode45 with y(0) = 1 .This problem is solved for t ∈ [0, 0.5] and the approximate solution obtained by the proposed method is compared with the solution obtained by MATLAB built-in solver ode45 in Figure 3 . The identification of g(x)u(t) and approximation of f (x) are compared with the exact nonlinear parts for points obtained by MATLAB built-in solver ode45 (see Figure 4 ). The obtained results with different number of training points are tabulated in Table 2. The proposed method shows a slightly better performance in comparison with the described method in [10], when the number of training points is twenty. Note that the subroutine DSolve of MATLAB7.0 failed to find the analytic solution for the above equation.    Example 3. Consider a system of index-3 defined as follows: We obtain the training data by using the MATLAB built-in solver ode45 with x(0) = [ 1 3 2 ] T . This problem is solved for t ∈ [0, 2] and the approximate solution obtained by the proposed method is compared with the solution obtained by MATLAB built-in solver ode45 in Figure 5 . The identification of g(x)u(t) and approximation of f (x) are compared with the exact nonlinear parts of the solution obtained by MATLAB built-in ode45 (see Figure 6 ). The obtained results with different number of training points are tabulated in Table 3. Note that the subroutine DSolve of MATLAB7.0 failed to find the analytical solution for the above equation.    Figure 7 ). The identification of g(x)u(t) and approximation of f (x) are compared with the exact nonlinear parts for exact solution (see Figure 8 ) and the results are recorded in Table 4, from which, it is apparent that as N increases, the solution converges to the true solution. We can see from all these examples that the approximate solutions can approach to the original solution quite well in the given interval. Also the unknown function part can be identified properly. On the other hand, the performance will become worse without training data outside the interval. This shows that the training data points are very important. 6. Conclusions. In this paper, a new method has been presented for solving affine nonlinear systems with partially unkonwn functions by using the LS-SVMs and a set of data points. It can provide accurate and differential solutions in a closed analytic form. The results of simulation show that the method can obtain the desired accuracy with only a few number of collocation points. The continuous approximate solution can be easily evaluated at any point within the interval. The method can be effectively applied to solve any index partially unknown affine nonlinear systems. As the affine nonlinear differential equations are partially unknown, this approach can offer new ideas to nonlinear system identification and can also be used to design the controller if the desired characteristics for the observed sampling data are given.
Though the idea in this paper can be used for obtaining the approximate solutions of general nonlinear systems, it will be different when one uses this idea for   identifying some unknown parts in nonlinear systems. In the future, we will investigate control and identification problems for different types of nonlinear systems with applications.