Ensemble of Classifiers with Modification of Confidence Values

Burduk, Robert; Baczyńska, Paulina

doi:10.1007/978-3-319-45378-1_42

Robert Burduk¹⁴ &
Paulina Baczyńska¹⁴

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 9842))

Included in the following conference series:

IFIP International Conference on Computer Information Systems and Industrial Management

2034 Accesses
2 Citations

Abstract

In the classification task, the ensemble of classifiers have attracted more and more attention in pattern recognition communities. Generally, ensemble methods have the potential to significantly improve the prediction base classifier which are included in the team. In this paper, we propose the algorithm which modifies the confidence values. This values are obtained as an outputs of the base classifiers. The experiment results based on thirteen data sets show that the proposed method is a promising method for the development of multiple classifiers systems. We compared the proposed method with other known ensemble of classifiers and with all base classifiers.

You have full access to this open access chapter, Download conference paper PDF

A Diversity Production Approach in Ensemble of Base Classifiers

A New Multi-classifier Ensemble Algorithm Based on D-S Evidence Theory

Article 09 June 2022

Kaiyi Zhao, Li Li, … Jiayao Li

Single Classifier Selection for Ensemble Learning

Keywords

1 Introduction

For several years, in the field of supervised learning a number of base classifiers have been used in order to solve one classification task. The use of the multiple base classifier for a decision problem is known as an ensemble of classifiers (EoC) or as multiple classifiers systems (MCSs) [12, 15]. The building of MCSs consists of three main phases: generation, selection and integration (fusion) [4]. When injecting randomness into the learning algorithm or manipulating the training objects is done [3, 11], then we are talking about homogeneous classifiers. In the second approach the ensemble is composed of heterogeneous classifiers. It means, that some different learning algorithms are applied to the same data set.

The selection phase is related to the choice of a set of classifiers from the whole available pool of base classifiers [14]. Formally, if we choose one classifier then it is called the classifier selection. But if we choose a subset of base classifiers from the pool then it is called the ensemble selection or ensemble pruning. Generally, in the ensemble selection, there are two approaches: the static ensemble selection and the dynamic ensemble selection [2, 4].

A number of articles [7, 16, 21] present a large number of fusion methods. For example, in the third phase the simple majority voting scheme [19] is most popular. Generally, the final decision which is made in the third phase uses the prediction of the base classifiers and it is popular for its ability to fuse together multiple classification outputs for the better accuracy of classification. The fusion methods can be divided into selection-based, fusion-based and hybrid ones according to their functioning [6]. The fusion strategies can be also divided into fixed and trainable ones [7]. Another division distinguishes class-conscious and class-indifferent integration methods [16].

In this work we will consider the modification confidence values method. The proposed method is based on information from decision profiles. The decision scheme used in training phase is created only from these confidence values that concern to the correct classification.

The remainder of this paper is organized as follows. Section 2 presents the concept of the base classifier and ensemble of classifiers. Section 3 contains the proposed modification confidence values method. The experimental evaluation, discussion and conclusions from the experiments are presented in Sect. 4. The paper is concluded by a final discussion.

2 Supervised Classification

2.1 Base Classifiers

The aim of the supervised classification is to assign an object to a specific class label. The object is represented by a set of d features, or attributes, viewed as d-dimensional feature vector x. The recognition algorithm maps the feature space x to the set of class labels $\varOmega $ according to the general formula:

$$\begin{aligned} \varPsi :X\rightarrow \varOmega . \end{aligned}$$

(1)

The recognition algorithm defines the classifier, which in the complex classification task with multiple classifiers is called a base classifier.

The output of a base classifier can be divided into three types [16].

The abstract level – the classifier $\psi $ assigns the unique label j to a given input x.
The rank level – in this case for each input (object) x, each classifier produces an integer rank array. Each element within this array corresponds to one of the defined class labels. The array is usually sorted and the label at the top being the first choice.
The measurement level – the output of a classifier is represented by a confidence value (CV) that addresses the degree of assigning the class label to the given input x. An example of such a representation of the output is a posteriori probability returned by Bayes classifier. Generally, this level can provide richer information than the abstract and rank level.

In this work we consider the situation when each base classifier returns CVs. Additionally, before the final combination of base classifiers’ outputs the CVs modification process is carried out.

2.2 Ensemble of Classifiers

Let us assume that $k\in \{1,2,...,K\}$ different classifiers $\varPsi _1,\varPsi _2,\ldots ,\varPsi _K$ are available to solve the classification task. The output information from all K component classifiers is applied to make the ultimate decision of MCSs. This decision is made based on the predictions of all the base classifiers.

One of the possible methods for integrating the output of the base classifier is the sum rule. In this method the score of MCSs is based on the application of the following sums:

$$\begin{aligned} s_{\omega }(x)=\sum _{k=1}^{K}p_k(\omega |x), \qquad \omega \in \varOmega , \end{aligned}$$

(2)

where $p_k(\omega |x)$ is CV for class label $\omega $ returned by classifier k.

The final decision of MCSs is made following the maximum rule:

$$\begin{aligned} \varPsi _{S}(x)= \arg \max _{\omega } s_{\omega }(x). \end{aligned}$$

(3)

In the presented method (3) CV obtained from the individual classifiers take an equal part in building MCSs. This is the simplest situation in which we do not need additional information on the testing process of the base classifiers except for the models of these classifiers. One of the possible methods in which weights of the base classifier are used is presented in [5].

Decision template (DT) is another approaches to build the MCSs. DT was proposed in [17]. In this MCS model DTs are calculated based on training set. One DT per class label. In the operation phase the similarity between each DT and outputs of base classifiers for object x is computed. The class label with the closest DT is assigned to object x. In this paper algorithm with DT is labelled $\varPsi _{DT}$ and it is used as one of the reference classifiers.

3 Modification of Confidence Values Algorithm

3.1 Training Phase

The proposed algorithm of the modification CVs values uses DPs. DP is a matrix containing CVs for each base classifier, i.e.:

$$\begin{aligned} DP(x)= \left[ \begin{array}{ccc} p_1(0|x) &{} \dots &{}p_1(\varOmega |x) \\ \vdots &{} \dots &{}\vdots \\ p_K(0|x) &{} \dots &{}p_K(\varOmega |x) \\ \end{array} \right] . \end{aligned}$$

(4)

In the first step of the algorithm we remove CVs which relate to the misclassification on the training set. This set contains N labelled examples $\{(x_1,\overline{\omega }_1),...,(x_N,\overline{\omega }_N)\}$, where $\overline{\omega }_i$ is the true class label of the object described by feature vector $x_i$. CVs are removed according to the formula:

$$\begin{aligned} p'_k(\omega |x)= \left\{ \begin{array}{crr} p_k(\omega |x),&{}\quad \text {if} &{}\quad I(\varPsi (x),\overline{\omega })=1 \\ 0,&{} \text {if} &{} \quad I(\varPsi (x),\overline{\omega })=0. \end{array} \right. \end{aligned}$$

(5)

where $I(\varPsi (x),\overline{\omega })$ is an indicator function having the value 1 in the case of the correct classification of the object described by feature vector x, i.e. when $\varPsi (x)=\overline{\omega }$.

In the next step, our algorithm, the decision scheme (DS) is calculated according to the formula:

$$\begin{aligned} DS(\beta )= \left[ \begin{array}{ccc} ds(\beta )_{10} &{} \dots &{}ds(\beta )_{1\varOmega } \\ \vdots &{} \dots &{}\vdots \\ ds(\beta )_{K0} &{} \dots &{}ds(\beta )_{K\varOmega } \\ \end{array} \right] , \end{aligned}$$

(6)

where

$$\begin{aligned} ds(\beta )_{k\omega } =\overline{ds}_{k\omega } + \beta \sqrt{\frac{\sum _{n=1}^{N}(p'_k(\omega _n|x_n)\, - \overline{ds}_{k\omega })^2}{N-1}} \end{aligned}$$

(7)

and

$$\begin{aligned} \overline{ds}_{k\omega } =\frac{\sum _{n=1}^{N} p'_k(\omega _n|x_n)}{N}. \end{aligned}$$

(8)

The parameter $\beta $ in our algorithm determines how we compute DS elements. For example, if $\beta =0$, then $ds_{k\omega }$ is the average of appropriate DFs received after the condition (5).

3.2 Operation Phase

During the operation phase the modification of CVs is carried out using $DS(\beta )$ calculated in the training phase. For the new object x being recognized, the outputs of the base classifiers construct DP(x) as in (4). The modification CVs from DP(x) is performed with the use of $DS(\beta )$ calculated in (6) according to the formula:

$$\begin{aligned} {p'}_k(\omega |x)= \left\{ \begin{array}{lll} \overline{m}*{p}_k(\omega |x) &{}\quad \text {if } &{} {p}_k(\omega |x) \ge ds(1)_{k\omega }\\ m*{p}_k(\omega |x) &{} \quad \text {if } &{} ds(0)_{k\omega }< {p}_k(\omega |x) < ds(1)_{k\omega },\\ \underline{m}*{p}_k(\omega |x) &{}\quad \text {if } &{} {p}_k(\omega |x) \le ds(0)_{k\omega }\\ \end{array} \right. \end{aligned}$$

(9)

where $\overline{m}$, m and $\underline{m}$ define how ordinal CVs are modified. The modification process taking into account the Eq. (9) causes the ensemble of the classifier method to use the modified CVs. The algorithm using the proposed method is denoted as $\varPsi ^{MCV}$. In the experiment modified CVs are combined according to the sum method (3).

4 Experimental Studies

In the experiments we use 13 data sets. Nine of them come from the Keel Project [1] and four com from UCI Repository [9] (Blood, Breast Cancer Wisconsin, Indian Liver Patient and Mammographic Mass UCI). The details of the data sets are included in Table 1. In the experiment 16 base classifiers were used from four different classification models. The first group of four base classifiers works according to $k-NN$ rule, the second group uses the Support Vector Machines models. The next group uses the Neural Network model and the last group uses the decision trees algorithms. The base classifiers are labelled as $\varPsi _1,...,\varPsi _{16}$.

The main aim of the experiments was to compare the quality of classifications of the proposed modification CVs algorithms $\varPsi _{MCV}$ with the base classifiers $\varPsi _1,...,\varPsi _{16}$ and their ensemble without the selection $\varPsi _{DT}$ and $\varPsi _S$. The parameters of $\varPsi _{MCV}$ algorithm were established on $\overline{m}=1.25$, $m=1$ and $\underline{m}=0.75$. In the experiments the feature selection process [13, 18] was not performed and we have used the standard 10-fold-cross-validation method.

Table 1. Description of data sets selected for the experiments

Full size table

Table 2. Classification error and mean rank positions for the base classifiers ($\varPsi _1,...,\varPsi _{16}$), algorithms $\varPsi _{S}$, $\varPsi _{DT}$ and the proposed algorithm $\varPsi _{MCV}$ produced by the Friedman test

Full size table

The classification error with the mean ranks obtained by the Friedman test for classification algorithms used in the experiments are presented in Table 2. Considering only the mean ranks obtained by the Friedman test the best result achieved proposed in the work algorithm labelled $\varPsi _{MCV}$. The obtained results were compared also by the post-hoc test [20]. The critical difference (CD) for this test at $p=0.05$ is equal to $CD=7.76$ – for 19 methods used for classification and 13 data sets. We can conclude that the post-hoc Nemenyi test detects significant differences between the proposed algorithm $\varPsi _{MCV}$ and the four base classifiers $\varPsi _4$, $\varPsi _6$, $\varPsi _8$, and $\varPsi _{14}$. The classifier method labelled $\varPsi _S$ is statistically better than the three base classifiers $\varPsi _6$, $\varPsi _8$ and $\varPsi _{14}$. This confirms that the proposed algorithm is better than the ensemble of classifiers with the used sum method. It should be noted, however, that the difference in average ranks is not large enough to point to the significant differences between $\varPsi _{MCV}$ and $\varPsi _{S}$ algorithms.

5 Conclusion

In this paper we have proposed the methods that use information from the decision profiles and modified CVs received from the base classifiers. The aim of the experiments was to compare the proposed algorithm with all base classifiers and the ensemble classifiers based on the sum and decision profile methods. The experiments have been carried out on 13 benchmark data sets. We can conclude that the post-hoc Nemenyi test detects significant differences between the proposed algorithm $\varPsi _{MCV}$ and four base classifiers. While the algorithm $\varPsi _S$ classification results are statistically different from the three base classifiers. The obtained results show an improvement in the quality of the classification proposed algorithm $\varPsi _{MCV}$ with respect to all the base classifiers and the used reference ensemble of the classifiers’ methods. Future work might involve the application of the proposed methods for various practical tasks [8, 10, 22] in which base classifiers are used. Additionally, the advantage of the proposed algorithm can be investigated as well as its ability to work in the parallel and distributed environment.

References

Alcalá, J., Fernández, A., Luengo, J., Derrac, J., GarcÍa, S., Sánchez, L., Herrera, F.: Keel data-mining software tool: data set repository, integration of algorithms and experimental analysis framework. J. Mult.-Valued Logic Soft Comput. 17(2–3), 255–287 (2010)
Google Scholar
Baczyńska, P., Burduk, R.: Ensemble selection based on discriminant functions in binary classification task. In: Jackowski, K., et al. (eds.) IDEAL 2015. LNCS, vol. 9375, pp. 61–68. Springer, Heidelberg (2015). doi:10.1007/978-3-319-24834-9_8
Chapter Google Scholar
Breiman, L.: Randomizing outputs to increase prediction accuracy. Mach. Learn. 40(3), 229–242 (2000)
Article MATH Google Scholar
Britto, A.S., Sabourin, R., Oliveira, L.E.: Dynamic selection of classifiers–a comprehensive review. Pattern Recogn. 47(11), 3665–3680 (2014)
Article Google Scholar
Burduk, R.: Classifier fusion with interval-valued weights. Pattern Recogn. Lett. 34(14), 1623–1629 (2013)
Article Google Scholar
Canuto, A.M., Abreu, M.C., de Melo Oliveira, L., Xavier, J.C., Santos, A.D.M.: Investigating the inuence of the choice of the ensemble members in accuracy and diversity of selection-based and fusion-based methods for ensembles. Pattern Recogn. Lett. 28(4), 472–486 (2007)
Article Google Scholar
Duin, R.P.: The combining classifier: to train or not to train? In: Proceedings of the 16th International Conference on Pattern Recognition, 2002, vol. 2, pp. 765–770. IEEE (2002)
Google Scholar
Forczmański, P., Łabedź, P.: Recognition of occluded faces based on multi-subspace classification. In: Saeed, K., Chaki, R., Cortesi, A., Wierzchoń, S. (eds.) CISIM 2013. LNCS, vol. 8104, pp. 148–157. Springer, Heidelberg (2013)
Chapter Google Scholar
Frank, A., Asuncion, A.: UCI Machine Learning Repository (2010)
Google Scholar
Frejlichowski, D.: An algorithm for the automatic analysis of characters located on car license plates. In: Kamel, M., Campilho, A. (eds.) ICIAR 2013. LNCS, vol. 7950, pp. 774–781. Springer, Heidelberg (2013)
Chapter Google Scholar
Freund, Y., Schapire, R.E., et al.: Experiments with a new boosting algorithm. In: ICML, vol. 96, pp. 148–156 (1996)
Google Scholar
Giacinto, G., Roli, F.: An approach to the automatic design of multiple classifier systems. Pattern Recogn. Lett. 22, 25–33 (2001)
Article MATH Google Scholar
Inbarani, H.H., Azar, A.T., Jothi, G.: Supervised hybrid feature selection based on PSO and rough sets for medical diagnosis. Comput. Methods Programs Biomed. 113(1), 175–185 (2014)
Article Google Scholar
Jackowski, K., Krawczyk, B., Woźniak, M.: Improved adaptive splitting, selection: the hybrid training method of a classifier based on a feature space partitioning. Int. J. Neural Syst. 24(03), 1430007 (2014)
Article Google Scholar
Korytkowski, M., Rutkowski, L., Scherer, R.: From ensemble of fuzzy classifiers to single fuzzy rule base classifier. In: Rutkowski, L., Tadeusiewicz, R., Zadeh, L.A., Zurada, J.M. (eds.) ICAISC 2008. LNCS (LNAI), vol. 5097, pp. 265–272. Springer, Heidelberg (2008)
Chapter Google Scholar
Kuncheva, L.I.: Combining Pattern Classifiers: Methods and Algorithms. Wiley, Hoboken (2004)
Book MATH Google Scholar
Kuncheva, L.I., Bezdek, J.C., Duin, R.P.: Decision templates for multiple classifier fusion: an experimental comparison. Pattern Recogn. 34(2), 299–314 (2001)
Article MATH Google Scholar
Rejer, I.: Genetic algorithm with aggressive mutation for feature selection in BCI feature space. Pattern Anal. Appl. 18(3), 485–492 (2015)
Article MathSciNet Google Scholar
Ruta, D., Gabrys, B.: Classifier selection for majority voting. Inf. Fusion 6(1), 63–81 (2005)
Article MATH Google Scholar
Trawiński, B., Smȩetek, M., Telec, Z., Lasota, T.: Nonparametric statistical analysis for multiple comparison of machine learning regression algorithms. Int. J. Appl. Math. Comput. Sci. 22(4), 867–881 (2012)
Article MathSciNet MATH Google Scholar
Xu, L., Krzyżak, A., Suen, C.Y.: Methods of combining multiple classifiers and their applications to handwriting recognition. IEEE Trans. Syst. Man Cybern. 22(3), 418–435 (1992)
Article Google Scholar
Zdunek, R., Nowak, M., Pliński, E.: Statistical classification of soft solder alloys by laser-induced breakdown spectroscopy: review of methods. J. Eur. Optical Soc.-Rapid Publ. 11(16006), 1–20 (2016)
Google Scholar

Download references

Acknowledgments

This work was supported by the Polish National Science Center under the grant no. DEC-2013/09/B/ST6/02264 and by the statutory funds of the Department of Systems and Computer Networks, Wroclaw University of Technology.

Author information

Authors and Affiliations

Department of Systems and Computer Networks, Wroclaw University of Technology, Wybrzeze Wyspianskiego 27, 50-370, Wroclaw, Poland
Robert Burduk & Paulina Baczyńska

Authors

Robert Burduk
View author publications
You can also search for this author in PubMed Google Scholar
Paulina Baczyńska
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Robert Burduk .

Editor information

Editors and Affiliations

Bialystok University of Technology , Bialystok, Poland
Khalid Saeed
University of Bialystok , Vilnius, Lithuania
Władysław Homenda

Rights and permissions

Open Access This chapter is licensed under the terms of the Creative Commons Attribution-NonCommercial 2.5 International License (http://creativecommons.org/licenses/by-nc/2.5/), which permits any noncommercial use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

Reprints and permissions

Copyright information

About this paper

Cite this paper

Burduk, R., Baczyńska, P. (2016). Ensemble of Classifiers with Modification of Confidence Values. In: Saeed, K., Homenda, W. (eds) Computer Information Systems and Industrial Management. CISIM 2016. Lecture Notes in Computer Science(), vol 9842. Springer, Cham. https://doi.org/10.1007/978-3-319-45378-1_42

Download citation

DOI: https://doi.org/10.1007/978-3-319-45378-1_42
Published: 09 September 2016
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-45377-4
Online ISBN: 978-3-319-45378-1
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Societies and partnerships

The International Federation for Information Processing (opens in a new tab)

Ensemble of Classifiers with Modification of Confidence Values

Abstract

Similar content being viewed by others

A Diversity Production Approach in Ensemble of Base Classifiers

A New Multi-classifier Ensemble Algorithm Based on D-S Evidence Theory

Single Classifier Selection for Ensemble Learning

Keywords

1 Introduction

2 Supervised Classification

2.1 Base Classifiers

2.2 Ensemble of Classifiers

3 Modification of Confidence Values Algorithm

3.1 Training Phase

3.2 Operation Phase

4 Experimental Studies

5 Conclusion

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Societies and partnerships

Navigation

Ensemble of Classifiers with Modification of Confidence Values

Abstract

Similar content being viewed by others

A Diversity Production Approach in Ensemble of Base Classifiers

A New Multi-classifier Ensemble Algorithm Based on D-S Evidence Theory

Single Classifier Selection for Ensemble Learning

Keywords

1 Introduction

2 Supervised Classification

2.1 Base Classifiers

2.2 Ensemble of Classifiers

3 Modification of Confidence Values Algorithm

3.1 Training Phase

3.2 Operation Phase

4 Experimental Studies

5 Conclusion

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Societies and partnerships

Search

Navigation