The Generalization Error Bound for the Multiclass Analytical Center Classifier

This paper presents the multiclass classifier based on analytical center of feasible space (MACM). This multiclass classifier is formulated as quadratic constrained linear optimization and does not need repeatedly constructing classifiers to separate a single class from all the others. Its generalization error upper bound is proved theoretically. The experiments on benchmark datasets validate the generalization performance of MACM.


Introduction
Multiclass classification is an important and on-going research subject in machine learning. Its application is immense, such as machine vision [1,2], text and speech categorization [3,4], natural language processing [5], and disease diagnosis [6,7]. Two kinds of approaches have been proposed to solve multiclass classification problem [8]. The first multiclass classification approach is extending binary classifier to handle the multiclass case directly. This included neural networks, decision trees, support vector machines, naive Bayes, and -nearest neighbors. The second approach decomposes the multiclass classification problem into several binary classification tasks. Several methods are used for this decomposition: one-versus-all [9], all-versus-all [10], and error-correcting output coding [11].
The one-versus-all approach reduces the problem of classifying among classes into binary problems, where each problem discriminates a given class from the other −1 classes.
For the all-versus-all method, a binary classifier is built to discriminate between each pair of classes, while discarding the rest of the classes. This requires building ( −1)/2 binary classifiers for classes problem. When testing a new example, voting is performed among the classifiers and the class with the maximum number of votes wins.
For error-correcting output coding, it works by training binary classifiers to distinguish between the different classes. Each class is given a codeword of length according to a binary matrix . Each row of corresponds to a certain class. The above multiclass classification algorithms need construct binary classifier repeatedly to separate a single class from all the others for classes problem, which leads to daunting computation and low efficiency of classification. Reference [12] proposes multiclass support vector machine (MSVM), which corresponds to simple quadratic optimization and need not repeat constructing binary classifier. However, support vector machine corresponds to the center of the largest inscribed hypersphere of feasible space. When the feasible space, that is, the space of hypotheses consistent with the training data, is elongated or asymmetric, support vector machine is not effective [13]. To address the above problems, multiclass classifier based on the analytical center of feasible space (MACM) is proposed. At the same time, in order to validate its generalization performance theoretically, its generalization error upper bound is formulated and proved. And the experiments on benchmark dataset validate the generalization performance of MACM.

Multiclass Analytical Center Classifier
To facilitate the discussion of multiclass analytical center classifier, the following definitions are introduced.  (1) Given a new point x ∈ R +1 , a piecewise linear classifier is a function : R +1 → {1, . . . , } as follows: where arg max returns to a class label corresponding to the maximum value.

Generalization Error Bound of Multiclass Analytical Center Classifier
In order to analyze the generalization error bound theoretically, we introduce the definition of classification margin and data radius and then deduce the margin-based generalization error bound of MACM.
Proof. Because the sample in ( ) is not independent, the generalization error bound cannot be attained from Theorem 9. Theorem 9 is independent of the sample distribution, so we can construct a new sample distribution D . According to the new distribution and dataset to generate the independent sample set ( ) with samples, that is, for every (x, ) ∈ , define (x, ) as the point sampled uniformly and randomly from ( ) according to the distribution D ; then, we have ( ) = ⋃ (x, )∈ (x, ). From Theorem 8, the data radius ( ( )) of ( ) satisfies Event which denotes a sample in ( ) is wrongly classified and event which denotes the misclassification occurs in ( ). From the above analysis, the misclassification of any sample in ( ) causes the misclassification of the point in ( ), so that the probability of events and satisfies the following inequality: Because the cardinality of ( ) equals 2( −1), the probability of sample misclassification in ( ) is written as follows: From union bound theorem, we have the following inequality: This ends the proof of Theorem 11.

Computational Experiments
In this section, we present the computational results comparing multiclass analytical center classifier (MACM) and multiclass support vector machine (MSVM) [12]. A description of each of the datasets follows this paragraph. The kernel function for the piecewise nonlinear MACM and MSVM methods is (x, x ) = (xx / + 1) , where is the desired polynomial.
Wine Recognition Data. The wine dataset uses the chemical analysis of wine to determine the cultivar. There are 178 points with 13 features. This is a three class dataset distributed as follows: 59 points in class 1, 71 points in class 2, and 48 points in class 3.
Glass Identification Database. The Glass dataset is used to identify the origin of a sample of glass through chemical analysis. This dataset is comprised of six classes of 214 points with 9 features. The distribution of points by class is as follows: 70 float processed building windows, 17 float processed vehicle windows, 76 nonfloat processed building windows, 13 containers, 9 tableware, and 29 headlamps. Table 1 contains the results for MACM and MSVM on wine and glass datasets. As anticipated, MACM produces better testing generalization than MSVM.

Summary
In this paper, the multiclass analytical center classifier based on the analytical center of feasible space, which corresponds to a simple quadratic constrained linear optimization, is proposed. At the same time, in order to validate its generalization performance theoretically, its generalization error upper bound is formulated and proved. By the experiments on wine recognition and glass identification dataset, it is shown that the multiclass analytical center classifier outperforms multiclass support vector machine in generalization error.