A Novel Support Vector Machine with Globality-Locality Preserving

Support vector machine (SVM) is regarded as a powerful method for pattern classification. However, the solution of the primal optimal model of SVM is susceptible for class distribution and may result in a nonrobust solution. In order to overcome this shortcoming, an improved model, support vector machine with globality-locality preserving (GLPSVM), is proposed. It introduces globality-locality preserving into the standard SVM, which can preserve the manifold structure of the data space. We complete rich experiments on the UCI machine learning data sets. The results validate the effectiveness of the proposed model, especially on the Wine and Iris databases; the recognition rate is above 97% and outperforms all the algorithms that were developed from SVM.


Introduction
In the past decades, support vector machine (SVM) [1] was thought to be a powerful tool for classification tasks. It can separate different classes by hyperplanes, which are determined by optimal directions and support vectors, while the optimal directions are obtained by maximizing the margins between each two classes. SVM and its variants [2][3][4][5] have been successfully applied to many research areas such as face detection and recognition [6], speech recognition [7], text classification [8], and image retrieval [9].
It is well known that SVM is an optimization problem. The optimal solution can be found by solving a quadratic programming problem. Because the objective function is convex, the global minimum solution is guaranteed. However, the traditional SVM solution is susceptible to class distribution, which means it is nonrobust to data samples. So, in order to overcome this shortcoming, Zafeiriou et al. [10] proposed a minimum class variance support vector machine (MCVSVM), which was inspired by optimization Fisher's discriminant ratio. By taking the manifold structure of the data space into consideration, Wang et al. [11] introduced within-class locality preserving into SVM and proposed the minimum class locality preserving variance support vector machine (MCLPV SVM). Besides, since SVM deals with a subset of data points (support vectors) rather than the entire data set, so to some extent, SVM solution is based on "local" characteristics of the data; therefore, Xiong and Cherkassky [12] incorporated global discriminant information into SVM and proposed the SVM+LDA. Analogously, Khan et al. [13] presented a novel SVM+NDA (nonparametric discriminant analysis) model for classification; it fused some partially global information and local information. In particular, both SVM+LDA and SVM+NDA can cope with the small sample problem, which benefited from the construction method of the models.
According to the above analysis, none of the mentioned methods takes the manifold structure of the data space into consideration, except the MCLPV SVM method, but MCLPV SVM loses some discriminant information. In the basic learning algorithms, the locality of learning data set should be considered; in recent years, many excellent publications showed the importance of locality, such as [14][15][16]. It is also important for finding the clusters in high dimensional data set, such as [17,18]. Recently, discriminant locality preserving projections (DLPP) [19][20][21] is proposed, and it is keen to find the subspace which can best discriminate different classes by maximizing the locality preserving betweenclass distances while minimizing the locality preserving within-class distances. So, DLPP can not only preserve 2 The Scientific World Journal local structure, but also implicate discriminant information. Inspired by DLPP, and considering that the mean sample can reflect the characteristics of data structure, which is also center-invariant in each class [22], this paper proposed a novel learning algorithm, called support vector machine with globality-locality preserving (GLPSVM). It introduced globality and locality preserving ability into the SVM. The proposed method preserves intrinsic manifold structure of the data space, takes the class distribution into consideration, and obtains a robust solution.
In summary, this paper is organized as follows. Section 2 gives a brief review of SVM. Section 3 gives the proposed method, including derivation, solving, and analysis. The experimental results are given in Section 4. Finally, conclusions are in Section 5.

A Brief Review of SVM
Given a set of pairwise samples S = {(x , )} =1 , where x ∈ R is a sample point in -dimensional space and ∈ {+1, −1} is the corresponding label. The direct way to separate these samples into two classes is to find a separating hyperplane.
For the linearly separable case, the SVM model is as follows: By transforming this optimization problem into its corresponding dual problem, the optimal discriminant vectors can be found through where and x are the dual variable and data sample (called support vector), respectively. The support vectors are crucial for classification since removing these points may change the solution of SVM. In SVM, the separable directions are decided by the optimal discriminant vectors obtained through (2). So, if we project the data into a feature space spanning by the optimal discriminant vectors, then these data will be separable in the feature space.
Usually, in real world applications, we need to deal with the multiclassification cases, such as face recognition [6] and text categorization [8]. In such cases, we need to extend SVM to multiclass SVM. The general approach is to code the classes according to a certain strategy, like one-againstall (OAA) or one-against-one (OAO) [23]. The OAA coding approach compares data in a single class with all the samples in others classes to generate the decision boundary; in this case, -many decision boundaries are built for -many classes. The OAO strategy generates decision boundaries from all possible pair of classes, which obtains ( − 1)/2-many decision boundaries. Comparatively, the OAO can obtain more discriminant vectors than OAA but will cost more computational time.

The Proposed GLPSVM
In this section, we will propose a novel support vector classier which takes the class distribution into consideration, and a robust solution is expected. Firstly, we will introduce the definition of globality-locality preserving.

Globality-Locality Preserving (GLP).
Discriminant locality preserving projections (DLPP) [19][20][21] is a powerful method for extracting the manifold structure of data samples. Given a set of samples is the corresponding labels. Let ∈ {+1, −1} and = { + , − }, and then all = +1 label samples belonging to + and the others belonging to − . Thus, we have = 1 + 2 , where 1 is the number of samples in + and 2 is the number of samples in − . Suppose that T = {t } =1 ∈ R × is the low-dimensional feature projections of X and DLPP tries to maximize an objective function as follows: where m and m represent the mean vectors of the projected samples in the th and th class, respectively. and are elements of the within-class weight matrix W and the between-class weight matrix B defined as where N 1 (x), N 2 (u) denote local neighbors of x and u, respectively, 1 is the sample neighborhood, and 2 is the mean sample neighborhood. The parameters 1 and 2 are empirically determined, and u is the mean vector of samples in the th class. Suppose w : X → T is a mapping from highdimensional data space to low-dimensional feature space; that is, T = w X. Then, the objection function (3) can be rewritten as follows: where U = [u 1 , u 2 ], H and L are the Laplacian matrices [24,25], , D is a diagonal matrix and its elements are column (or row) sums of B, L = D − W, , and D w is also a block diagonal matrix; each block of D w is a diagonal matrix and its elements are the column (or row) sums of each block of W.
Formula (5) is also called locality preserving discriminant ratio (criterion); that is, DLPP is keen to find the feature subspace via maximizing this ratio, which means simultaneously maximizing the locality preserving between-class distance and minimizing the locality preserving within-class distance.
On the other hand, Huang et al. [22] had another view of the weight matrix of DLPP, and they believe that the mean sample is center-invariant in the same class, and it can reflect the characteristics of data structure, so to some extent it decides the accuracy in classification tasks. In this paper, we preserve the structure information of mean samples, which can to a large extent make up the loss of global information when only local structure is preserved. In a word, we define locality preserving matrix and globality preserving matrix as follows: (i) locality preserving matrix: Z = XLX , (ii) globality preserving matrix: Z = UHU .
where I represents the regularization matrix, which is added to cope with small sample problems. This model not only maximizes the margin of the separating hyperplane, but also minimizes the scatter of the data in discriminant directions, which benefited from taking both the locally manifold structure of the data space and globality manifold structure of the mean sample space into consideration. Here, is an empirically determined key parameter which controls the tradeoff. According to the model, we can see that the optimal discriminant directions are no longer the same as classical SVM. It is because that we introduce the obtained GLP to the optimization model of SVM. The classification performance of the proposed method will be shown in Section 4.
Taking derivatives with respect to w and , respectively, we obtain Hence, we have the following dual problem: Suppose * is the optimization solution of this dual problem. Then, the optimal discriminant vectors w * can be found as So, the corresponding decision surface is Finally, the corresponding optimal bias * can be calculated as * = 1 where sv is the number of support vectors. As can be seen, in linearly separable case, GLPSVM is required to obtain a completely accurate decision hyperplane. However, in real world applications, the decision hyperplane no longer needs to be completely accurate, so we extend the GLPSVM to soft margin situations.

Soft Margin GLPSVM.
Reference [13] proposed the soft margin method for SVM, to cope with cases when we do not need to obtain a completely accurate decision hyperplane; that is, we permit an error tolerability within limits. Then, the soft margin GLPSVM can be described as follows: where is a predefined positive real number and larger values of correspond to higher penalty assigned to errors.
= [ 1 , 2 , . . . , ] is slack variables which reflect the degree of misclassification. Apparently, the soft margin GLPSVM is also a quadratic optimization problem, and we can solve it in the same way as standard GLPSVM.

4
The Scientific World Journal    Breast  699  9  2  Heart  270  13  2  Pima  768  8  2  New Thyroid (NT)  215  5  3  Wine  178  13  3  Iris  150  4  3  Glass  214  9  6 3.5. Effective Algorithm for GLPSVM. Note that the objection function of classical SVM is (1/2)w w; however, in this paper, we use (1/2)w Δw for replacement. In fact, if we define some relational expressions as follows: and then the GLPSVM model is equivalent to Then, we can find that the solution to GLPSVM can be solved by standard SVM software package, but the optimal discriminant vectors are different. Since we introduce globality-locality preserving to SVM, the optimal discriminant directions in GLPSVM can preserve the intrinsic manifold structure of the data in low-dimensional feature space. Besides, matrices Δ 1/2 and Δ −1/2 can be calculated through the eigenvalue decomposition of the matrix Δ; the interested readers can refer to literature [13] for more details.

The Parameters'
Influence on the Performance. In the proposed model, there are totally six parameters; they are the neighborhood parameters 1 , 2 and the heat kernel parameters 1 , 2 in locality preserving matrix and globality preserving matrix, respectively, and a trade-off parameter along with the regularization parameter . In this section, to show the parameters' influence on the performance, we do experiments on a binary ionosphere database. We select 30% of samples of each class for training, the rest of samples are for testing, and all the samples are normalized before the experiment.
Firstly, we set the trade-off parameter to be 0.2 and use the same heat kernel parameter ( 1 = 2 = ) to see the effect of the parameters and 1 . Table 1 shows classification accuracy under different settings of these three parameters. We can find that the regularization parameter plays an important role in classification performance. Besides, appropriate parameter selection can provide better classification results.
The Scientific World Journal 5 Next, we will explore the effect of the trade-off parameter . Table 2 presents the classification accuracy of GLPSVM under different trade-off parameter and different regularization parameter . Here, we give the highest accuracy with its corresponding sample neighborhood parameter 1 . It can be seen that the also plays an important role in classification results and small may be more appropriate than large .

Comparative Analysis.
In this subsection, comparative experiments are conducted to test the ability of the proposed GLPSVM. We compare it with SVM, SVM+LDA, MCVSVM, and MCLPV SVM on seven different databases selected from the well-known UCI database. The seven databases include three binary databases and four multiclass databases. In the multiclass tasks, one-against-one coding strategy is employed. The detailed information of these selected databases is shown in Table 3. For all these databases, 30% of samples in each class are randomly selected for training, the remaining samples are used for testing, and all the data samples are normalized before experiment. Table 4 gives the classification accuracy of SVM, SVM+LDA, MCVSVM, MCLPV SVM, and GLPSVM under different regularization parameter, neighborhood parameter, and heat kernel parameter, where the highest average classification accuracy is presented. Here, the average classification accuracy is obtained through repeating the operation 20 times. It can be seen from Table 4 that the proposed GLPSVM has always the highest accuracy, especially on the Wine and the Iris databases. The accuracy is more than 97%, far more than the other algorithms.

Conclusions
In this paper, a new extension of SVM was proposed, which was called support vector machine with globalitylocality preserving (GLPSVM). It took the intrinsic manifold structure of the data space into consideration. Besides, the soft margin GLPSVM was also presented. The effective algorithm of GLPSVM showed that the model could be solved through transferring it to the standard SVM model and using the standard SVM software package for solving, which would greatly improve the implementation efficiency. Finally, experimental results on real world databases validated that the proposed method could have better performance than SVM, SVM+LDA, MCVSVM, and MCLPV SVM.