Abstract
In the classification task, the ensemble of classifiers have attracted more and more attention in pattern recognition communities. Generally, ensemble methods have the potential to significantly improve the prediction base classifier which are included in the team. In this paper, we propose the algorithm which modifies the confidence values. This values are obtained as an outputs of the base classifiers. The experiment results based on thirteen data sets show that the proposed method is a promising method for the development of multiple classifiers systems. We compared the proposed method with other known ensemble of classifiers and with all base classifiers.
You have full access to this open access chapter, Download conference paper PDF
Similar content being viewed by others
Keywords
1 Introduction
For several years, in the field of supervised learning a number of base classifiers have been used in order to solve one classification task. The use of the multiple base classifier for a decision problem is known as an ensemble of classifiers (EoC) or as multiple classifiers systems (MCSs) [12, 15]. The building of MCSs consists of three main phases: generation, selection and integration (fusion) [4]. When injecting randomness into the learning algorithm or manipulating the training objects is done [3, 11], then we are talking about homogeneous classifiers. In the second approach the ensemble is composed of heterogeneous classifiers. It means, that some different learning algorithms are applied to the same data set.
The selection phase is related to the choice of a set of classifiers from the whole available pool of base classifiers [14]. Formally, if we choose one classifier then it is called the classifier selection. But if we choose a subset of base classifiers from the pool then it is called the ensemble selection or ensemble pruning. Generally, in the ensemble selection, there are two approaches: the static ensemble selection and the dynamic ensemble selection [2, 4].
A number of articles [7, 16, 21] present a large number of fusion methods. For example, in the third phase the simple majority voting scheme [19] is most popular. Generally, the final decision which is made in the third phase uses the prediction of the base classifiers and it is popular for its ability to fuse together multiple classification outputs for the better accuracy of classification. The fusion methods can be divided into selection-based, fusion-based and hybrid ones according to their functioning [6]. The fusion strategies can be also divided into fixed and trainable ones [7]. Another division distinguishes class-conscious and class-indifferent integration methods [16].
In this work we will consider the modification confidence values method. The proposed method is based on information from decision profiles. The decision scheme used in training phase is created only from these confidence values that concern to the correct classification.
The remainder of this paper is organized as follows. Section 2 presents the concept of the base classifier and ensemble of classifiers. Section 3 contains the proposed modification confidence values method. The experimental evaluation, discussion and conclusions from the experiments are presented in Sect. 4. The paper is concluded by a final discussion.
2 Supervised Classification
2.1 Base Classifiers
The aim of the supervised classification is to assign an object to a specific class label. The object is represented by a set of d features, or attributes, viewed as d-dimensional feature vector x. The recognition algorithm maps the feature space x to the set of class labels \(\varOmega \) according to the general formula:
The recognition algorithm defines the classifier, which in the complex classification task with multiple classifiers is called a base classifier.
The output of a base classifier can be divided into three types [16].
-
The abstract level â the classifier \(\psi \) assigns the unique label j to a given input x.
-
The rank level â in this case for each input (object) x, each classifier produces an integer rank array. Each element within this array corresponds to one of the defined class labels. The array is usually sorted and the label at the top being the first choice.
-
The measurement level â the output of a classifier is represented by a confidence value (CV) that addresses the degree of assigning the class label to the given input x. An example of such a representation of the output is a posteriori probability returned by Bayes classifier. Generally, this level can provide richer information than the abstract and rank level.
In this work we consider the situation when each base classifier returns CVs. Additionally, before the final combination of base classifiersâ outputs the CVs modification process is carried out.
2.2 Ensemble of Classifiers
Let us assume that \(k\in \{1,2,...,K\}\) different classifiers \(\varPsi _1,\varPsi _2,\ldots ,\varPsi _K\) are available to solve the classification task. The output information from all K component classifiers is applied to make the ultimate decision of MCSs. This decision is made based on the predictions of all the base classifiers.
One of the possible methods for integrating the output of the base classifier is the sum rule. In this method the score of MCSs is based on the application of the following sums:
where \(p_k(\omega |x)\) is CV for class label \(\omega \) returned by classifier k.
The final decision of MCSs is made following the maximum rule:
In the presented method (3) CV obtained from the individual classifiers take an equal part in building MCSs. This is the simplest situation in which we do not need additional information on the testing process of the base classifiers except for the models of these classifiers. One of the possible methods in which weights of the base classifier are used is presented in [5].
Decision template (DT) is another approaches to build the MCSs. DT was proposed in [17]. In this MCS model DTs are calculated based on training set. One DT per class label. In the operation phase the similarity between each DT and outputs of base classifiers for object x is computed. The class label with the closest DT is assigned to object x. In this paper algorithm with DT is labelled \(\varPsi _{DT}\) and it is used as one of the reference classifiers.
3 Modification of Confidence Values Algorithm
3.1 Training Phase
The proposed algorithm of the modification CVs values uses DPs. DP is a matrix containing CVs for each base classifier, i.e.:
In the first step of the algorithm we remove CVs which relate to the misclassification on the training set. This set contains N labelled examples \(\{(x_1,\overline{\omega }_1),...,(x_N,\overline{\omega }_N)\}\), where \(\overline{\omega }_i\) is the true class label of the object described by feature vector \(x_i\). CVs are removed according to the formula:
where \(I(\varPsi (x),\overline{\omega })\) is an indicator function having the value 1 in the case of the correct classification of the object described by feature vector x, i.e. when \(\varPsi (x)=\overline{\omega }\).
In the next step, our algorithm, the decision scheme (DS) is calculated according to the formula:
where
and
The parameter \(\beta \) in our algorithm determines how we compute DS elements. For example, if \(\beta =0\), then \(ds_{k\omega }\) is the average of appropriate DFs received after the condition (5).
3.2 Operation Phase
During the operation phase the modification of CVs is carried out using \(DS(\beta )\) calculated in the training phase. For the new object x being recognized, the outputs of the base classifiers construct DP(x) as in (4). The modification CVs from DP(x) is performed with the use of \(DS(\beta )\) calculated in (6) according to the formula:
where \(\overline{m}\), m and \(\underline{m}\) define how ordinal CVs are modified. The modification process taking into account the Eq. (9) causes the ensemble of the classifier method to use the modified CVs. The algorithm using the proposed method is denoted as \(\varPsi ^{MCV}\). In the experiment modified CVs are combined according to the sum method (3).
4 Experimental Studies
In the experiments we use 13 data sets. Nine of them come from the Keel Project [1] and four com from UCI Repository [9] (Blood, Breast Cancer Wisconsin, Indian Liver Patient and Mammographic Mass UCI). The details of the data sets are included in Table 1. In the experiment 16 base classifiers were used from four different classification models. The first group of four base classifiers works according to \(k-NN\) rule, the second group uses the Support Vector Machines models. The next group uses the Neural Network model and the last group uses the decision trees algorithms. The base classifiers are labelled as \(\varPsi _1,...,\varPsi _{16}\).
The main aim of the experiments was to compare the quality of classifications of the proposed modification CVs algorithms \(\varPsi _{MCV}\) with the base classifiers \(\varPsi _1,...,\varPsi _{16}\) and their ensemble without the selection \(\varPsi _{DT}\) and \(\varPsi _S\). The parameters of \(\varPsi _{MCV}\) algorithm were established on \(\overline{m}=1.25\), \(m=1\) and \(\underline{m}=0.75\). In the experiments the feature selection process [13, 18] was not performed and we have used the standard 10-fold-cross-validation method.
The classification error with the mean ranks obtained by the Friedman test for classification algorithms used in the experiments are presented in Table 2. Considering only the mean ranks obtained by the Friedman test the best result achieved proposed in the work algorithm labelled \(\varPsi _{MCV}\). The obtained results were compared also by the post-hoc test [20]. The critical difference (CD) for this test at \(p=0.05\) is equal to \(CD=7.76\) â for 19 methods used for classification and 13 data sets. We can conclude that the post-hoc Nemenyi test detects significant differences between the proposed algorithm \(\varPsi _{MCV}\) and the four base classifiers \(\varPsi _4\), \(\varPsi _6\), \(\varPsi _8\), and \(\varPsi _{14}\). The classifier method labelled \(\varPsi _S\) is statistically better than the three base classifiers \(\varPsi _6\), \(\varPsi _8\) and \(\varPsi _{14}\). This confirms that the proposed algorithm is better than the ensemble of classifiers with the used sum method. It should be noted, however, that the difference in average ranks is not large enough to point to the significant differences between \(\varPsi _{MCV}\) and \(\varPsi _{S}\) algorithms.
5 Conclusion
In this paper we have proposed the methods that use information from the decision profiles and modified CVs received from the base classifiers. The aim of the experiments was to compare the proposed algorithm with all base classifiers and the ensemble classifiers based on the sum and decision profile methods. The experiments have been carried out on 13 benchmark data sets. We can conclude that the post-hoc Nemenyi test detects significant differences between the proposed algorithm \(\varPsi _{MCV}\) and four base classifiers. While the algorithm \(\varPsi _S\) classification results are statistically different from the three base classifiers. The obtained results show an improvement in the quality of the classification proposed algorithm \(\varPsi _{MCV}\) with respect to all the base classifiers and the used reference ensemble of the classifiersâ methods. Future work might involve the application of the proposed methods for various practical tasks [8, 10, 22] in which base classifiers are used. Additionally, the advantage of the proposed algorithm can be investigated as well as its ability to work in the parallel and distributed environment.
References
AlcalĂĄ, J., FernĂĄndez, A., Luengo, J., Derrac, J., GarcĂa, S., SĂĄnchez, L., Herrera, F.: Keel data-mining software tool: data set repository, integration of algorithms and experimental analysis framework. J. Mult.-Valued Logic Soft Comput. 17(2â3), 255â287 (2010)
BaczyĆska, P., Burduk, R.: Ensemble selection based on discriminant functions in binary classification task. In: Jackowski, K., et al. (eds.) IDEAL 2015. LNCS, vol. 9375, pp. 61â68. Springer, Heidelberg (2015). doi:10.1007/978-3-319-24834-9_8
Breiman, L.: Randomizing outputs to increase prediction accuracy. Mach. Learn. 40(3), 229â242 (2000)
Britto, A.S., Sabourin, R., Oliveira, L.E.: Dynamic selection of classifiersâa comprehensive review. Pattern Recogn. 47(11), 3665â3680 (2014)
Burduk, R.: Classifier fusion with interval-valued weights. Pattern Recogn. Lett. 34(14), 1623â1629 (2013)
Canuto, A.M., Abreu, M.C., de Melo Oliveira, L., Xavier, J.C., Santos, A.D.M.: Investigating the inuence of the choice of the ensemble members in accuracy and diversity of selection-based and fusion-based methods for ensembles. Pattern Recogn. Lett. 28(4), 472â486 (2007)
Duin, R.P.: The combining classifier: to train or not to train? In: Proceedings of the 16th International Conference on Pattern Recognition, 2002, vol. 2, pp. 765â770. IEEE (2002)
ForczmaĆski, P., ĆabedĆș, P.: Recognition of occluded faces based on multi-subspace classification. In: Saeed, K., Chaki, R., Cortesi, A., WierzchoĆ, S. (eds.) CISIM 2013. LNCS, vol. 8104, pp. 148â157. Springer, Heidelberg (2013)
Frank, A., Asuncion, A.: UCI Machine Learning Repository (2010)
Frejlichowski, D.: An algorithm for the automatic analysis of characters located on car license plates. In: Kamel, M., Campilho, A. (eds.) ICIAR 2013. LNCS, vol. 7950, pp. 774â781. Springer, Heidelberg (2013)
Freund, Y., Schapire, R.E., et al.: Experiments with a new boosting algorithm. In: ICML, vol. 96, pp. 148â156 (1996)
Giacinto, G., Roli, F.: An approach to the automatic design of multiple classifier systems. Pattern Recogn. Lett. 22, 25â33 (2001)
Inbarani, H.H., Azar, A.T., Jothi, G.: Supervised hybrid feature selection based on PSO and rough sets for medical diagnosis. Comput. Methods Programs Biomed. 113(1), 175â185 (2014)
Jackowski, K., Krawczyk, B., WoĆșniak, M.: Improved adaptive splitting, selection: the hybrid training method of a classifier based on a feature space partitioning. Int. J. Neural Syst. 24(03), 1430007 (2014)
Korytkowski, M., Rutkowski, L., Scherer, R.: From ensemble of fuzzy classifiers to single fuzzy rule base classifier. In: Rutkowski, L., Tadeusiewicz, R., Zadeh, L.A., Zurada, J.M. (eds.) ICAISC 2008. LNCS (LNAI), vol. 5097, pp. 265â272. Springer, Heidelberg (2008)
Kuncheva, L.I.: Combining Pattern Classifiers: Methods and Algorithms. Wiley, Hoboken (2004)
Kuncheva, L.I., Bezdek, J.C., Duin, R.P.: Decision templates for multiple classifier fusion: an experimental comparison. Pattern Recogn. 34(2), 299â314 (2001)
Rejer, I.: Genetic algorithm with aggressive mutation for feature selection in BCI feature space. Pattern Anal. Appl. 18(3), 485â492 (2015)
Ruta, D., Gabrys, B.: Classifier selection for majority voting. Inf. Fusion 6(1), 63â81 (2005)
TrawiĆski, B., SmÈ©etek, M., Telec, Z., Lasota, T.: Nonparametric statistical analysis for multiple comparison of machine learning regression algorithms. Int. J. Appl. Math. Comput. Sci. 22(4), 867â881 (2012)
Xu, L., KrzyĆŒak, A., Suen, C.Y.: Methods of combining multiple classifiers and their applications to handwriting recognition. IEEE Trans. Syst. Man Cybern. 22(3), 418â435 (1992)
Zdunek, R., Nowak, M., PliĆski, E.: Statistical classification of soft solder alloys by laser-induced breakdown spectroscopy: review of methods. J. Eur. Optical Soc.-Rapid Publ. 11(16006), 1â20 (2016)
Acknowledgments
This work was supported by the Polish National Science Center under the grant no. DEC-2013/09/B/ST6/02264 and by the statutory funds of the Department of Systems and Computer Networks, Wroclaw University of Technology.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Open Access This chapter is licensed under the terms of the Creative Commons Attribution-NonCommercial 2.5 International License (http://creativecommons.org/licenses/by-nc/2.5/), which permits any noncommercial use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.
The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.
Copyright information
© 2016 IFIP International Federation for Information Processing
About this paper
Cite this paper
Burduk, R., BaczyĆska, P. (2016). Ensemble of Classifiers with Modification of Confidence Values. In: Saeed, K., Homenda, W. (eds) Computer Information Systems and Industrial Management. CISIM 2016. Lecture Notes in Computer Science(), vol 9842. Springer, Cham. https://doi.org/10.1007/978-3-319-45378-1_42
Download citation
DOI: https://doi.org/10.1007/978-3-319-45378-1_42
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-45377-4
Online ISBN: 978-3-319-45378-1
eBook Packages: Computer ScienceComputer Science (R0)