Keywords

1 Introduction

For several years, in the field of supervised learning a number of base classifiers have been used in order to solve one classification task. The use of the multiple base classifier for a decision problem is known as an ensemble of classifiers (EoC) or as multiple classifiers systems (MCSs) [12, 15]. The building of MCSs consists of three main phases: generation, selection and integration (fusion) [4]. When injecting randomness into the learning algorithm or manipulating the training objects is done [3, 11], then we are talking about homogeneous classifiers. In the second approach the ensemble is composed of heterogeneous classifiers. It means, that some different learning algorithms are applied to the same data set.

The selection phase is related to the choice of a set of classifiers from the whole available pool of base classifiers [14]. Formally, if we choose one classifier then it is called the classifier selection. But if we choose a subset of base classifiers from the pool then it is called the ensemble selection or ensemble pruning. Generally, in the ensemble selection, there are two approaches: the static ensemble selection and the dynamic ensemble selection [2, 4].

A number of articles [7, 16, 21] present a large number of fusion methods. For example, in the third phase the simple majority voting scheme [19] is most popular. Generally, the final decision which is made in the third phase uses the prediction of the base classifiers and it is popular for its ability to fuse together multiple classification outputs for the better accuracy of classification. The fusion methods can be divided into selection-based, fusion-based and hybrid ones according to their functioning [6]. The fusion strategies can be also divided into fixed and trainable ones [7]. Another division distinguishes class-conscious and class-indifferent integration methods [16].

In this work we will consider the modification confidence values method. The proposed method is based on information from decision profiles. The decision scheme used in training phase is created only from these confidence values that concern to the correct classification.

The remainder of this paper is organized as follows. Section 2 presents the concept of the base classifier and ensemble of classifiers. Section 3 contains the proposed modification confidence values method. The experimental evaluation, discussion and conclusions from the experiments are presented in Sect. 4. The paper is concluded by a final discussion.

2 Supervised Classification

2.1 Base Classifiers

The aim of the supervised classification is to assign an object to a specific class label. The object is represented by a set of d features, or attributes, viewed as d-dimensional feature vector x. The recognition algorithm maps the feature space x to the set of class labels \(\varOmega \) according to the general formula:

$$\begin{aligned} \varPsi :X\rightarrow \varOmega . \end{aligned}$$
(1)

The recognition algorithm defines the classifier, which in the complex classification task with multiple classifiers is called a base classifier.

The output of a base classifier can be divided into three types [16].

  • The abstract level – the classifier \(\psi \) assigns the unique label j to a given input x.

  • The rank level – in this case for each input (object) x, each classifier produces an integer rank array. Each element within this array corresponds to one of the defined class labels. The array is usually sorted and the label at the top being the first choice.

  • The measurement level – the output of a classifier is represented by a confidence value (CV) that addresses the degree of assigning the class label to the given input x. An example of such a representation of the output is a posteriori probability returned by Bayes classifier. Generally, this level can provide richer information than the abstract and rank level.

In this work we consider the situation when each base classifier returns CVs. Additionally, before the final combination of base classifiers’ outputs the CVs modification process is carried out.

2.2 Ensemble of Classifiers

Let us assume that \(k\in \{1,2,...,K\}\) different classifiers \(\varPsi _1,\varPsi _2,\ldots ,\varPsi _K\) are available to solve the classification task. The output information from all K component classifiers is applied to make the ultimate decision of MCSs. This decision is made based on the predictions of all the base classifiers.

One of the possible methods for integrating the output of the base classifier is the sum rule. In this method the score of MCSs is based on the application of the following sums:

$$\begin{aligned} s_{\omega }(x)=\sum _{k=1}^{K}p_k(\omega |x), \qquad \omega \in \varOmega , \end{aligned}$$
(2)

where \(p_k(\omega |x)\) is CV for class label \(\omega \) returned by classifier k.

The final decision of MCSs is made following the maximum rule:

$$\begin{aligned} \varPsi _{S}(x)= \arg \max _{\omega } s_{\omega }(x). \end{aligned}$$
(3)

In the presented method (3) CV obtained from the individual classifiers take an equal part in building MCSs. This is the simplest situation in which we do not need additional information on the testing process of the base classifiers except for the models of these classifiers. One of the possible methods in which weights of the base classifier are used is presented in [5].

Decision template (DT) is another approaches to build the MCSs. DT was proposed in [17]. In this MCS model DTs are calculated based on training set. One DT per class label. In the operation phase the similarity between each DT and outputs of base classifiers for object x is computed. The class label with the closest DT is assigned to object x. In this paper algorithm with DT is labelled \(\varPsi _{DT}\) and it is used as one of the reference classifiers.

3 Modification of Confidence Values Algorithm

3.1 Training Phase

The proposed algorithm of the modification CVs values uses DPs. DP is a matrix containing CVs for each base classifier, i.e.:

$$\begin{aligned} DP(x)= \left[ \begin{array}{ccc} p_1(0|x) &{} \dots &{}p_1(\varOmega |x) \\ \vdots &{} \dots &{}\vdots \\ p_K(0|x) &{} \dots &{}p_K(\varOmega |x) \\ \end{array} \right] . \end{aligned}$$
(4)

In the first step of the algorithm we remove CVs which relate to the misclassification on the training set. This set contains N labelled examples \(\{(x_1,\overline{\omega }_1),...,(x_N,\overline{\omega }_N)\}\), where \(\overline{\omega }_i\) is the true class label of the object described by feature vector \(x_i\). CVs are removed according to the formula:

$$\begin{aligned} p'_k(\omega |x)= \left\{ \begin{array}{crr} p_k(\omega |x),&{}\quad \text {if} &{}\quad I(\varPsi (x),\overline{\omega })=1 \\ 0,&{} \text {if} &{} \quad I(\varPsi (x),\overline{\omega })=0. \end{array} \right. \end{aligned}$$
(5)

where \(I(\varPsi (x),\overline{\omega })\) is an indicator function having the value 1 in the case of the correct classification of the object described by feature vector x, i.e. when \(\varPsi (x)=\overline{\omega }\).

In the next step, our algorithm, the decision scheme (DS) is calculated according to the formula:

$$\begin{aligned} DS(\beta )= \left[ \begin{array}{ccc} ds(\beta )_{10} &{} \dots &{}ds(\beta )_{1\varOmega } \\ \vdots &{} \dots &{}\vdots \\ ds(\beta )_{K0} &{} \dots &{}ds(\beta )_{K\varOmega } \\ \end{array} \right] , \end{aligned}$$
(6)

where

$$\begin{aligned} ds(\beta )_{k\omega } =\overline{ds}_{k\omega } + \beta \sqrt{\frac{\sum _{n=1}^{N}(p'_k(\omega _n|x_n)\, - \overline{ds}_{k\omega })^2}{N-1}} \end{aligned}$$
(7)

and

$$\begin{aligned} \overline{ds}_{k\omega } =\frac{\sum _{n=1}^{N} p'_k(\omega _n|x_n)}{N}. \end{aligned}$$
(8)

The parameter \(\beta \) in our algorithm determines how we compute DS elements. For example, if \(\beta =0\), then \(ds_{k\omega }\) is the average of appropriate DFs received after the condition (5).

3.2 Operation Phase

During the operation phase the modification of CVs is carried out using \(DS(\beta )\) calculated in the training phase. For the new object x being recognized, the outputs of the base classifiers construct DP(x) as in (4). The modification CVs from DP(x) is performed with the use of \(DS(\beta )\) calculated in (6) according to the formula:

$$\begin{aligned} {p'}_k(\omega |x)= \left\{ \begin{array}{lll} \overline{m}*{p}_k(\omega |x) &{}\quad \text {if } &{} {p}_k(\omega |x) \ge ds(1)_{k\omega }\\ m*{p}_k(\omega |x) &{} \quad \text {if } &{} ds(0)_{k\omega }< {p}_k(\omega |x) < ds(1)_{k\omega },\\ \underline{m}*{p}_k(\omega |x) &{}\quad \text {if } &{} {p}_k(\omega |x) \le ds(0)_{k\omega }\\ \end{array} \right. \end{aligned}$$
(9)

where \(\overline{m}\), m and \(\underline{m}\) define how ordinal CVs are modified. The modification process taking into account the Eq. (9) causes the ensemble of the classifier method to use the modified CVs. The algorithm using the proposed method is denoted as \(\varPsi ^{MCV}\). In the experiment modified CVs are combined according to the sum method (3).

4 Experimental Studies

In the experiments we use 13 data sets. Nine of them come from the Keel Project [1] and four com from UCI Repository [9] (Blood, Breast Cancer Wisconsin, Indian Liver Patient and Mammographic Mass UCI). The details of the data sets are included in Table 1. In the experiment 16 base classifiers were used from four different classification models. The first group of four base classifiers works according to \(k-NN\) rule, the second group uses the Support Vector Machines models. The next group uses the Neural Network model and the last group uses the decision trees algorithms. The base classifiers are labelled as \(\varPsi _1,...,\varPsi _{16}\).

The main aim of the experiments was to compare the quality of classifications of the proposed modification CVs algorithms \(\varPsi _{MCV}\) with the base classifiers \(\varPsi _1,...,\varPsi _{16}\) and their ensemble without the selection \(\varPsi _{DT}\) and \(\varPsi _S\). The parameters of \(\varPsi _{MCV}\) algorithm were established on \(\overline{m}=1.25\), \(m=1\) and \(\underline{m}=0.75\). In the experiments the feature selection process [13, 18] was not performed and we have used the standard 10-fold-cross-validation method.

Table 1. Description of data sets selected for the experiments
Table 2. Classification error and mean rank positions for the base classifiers (\(\varPsi _1,...,\varPsi _{16}\)), algorithms \(\varPsi _{S}\), \(\varPsi _{DT}\) and the proposed algorithm \(\varPsi _{MCV}\) produced by the Friedman test

The classification error with the mean ranks obtained by the Friedman test for classification algorithms used in the experiments are presented in Table 2. Considering only the mean ranks obtained by the Friedman test the best result achieved proposed in the work algorithm labelled \(\varPsi _{MCV}\). The obtained results were compared also by the post-hoc test [20]. The critical difference (CD) for this test at \(p=0.05\) is equal to \(CD=7.76\) – for 19 methods used for classification and 13 data sets. We can conclude that the post-hoc Nemenyi test detects significant differences between the proposed algorithm \(\varPsi _{MCV}\) and the four base classifiers \(\varPsi _4\), \(\varPsi _6\), \(\varPsi _8\), and \(\varPsi _{14}\). The classifier method labelled \(\varPsi _S\) is statistically better than the three base classifiers \(\varPsi _6\), \(\varPsi _8\) and \(\varPsi _{14}\). This confirms that the proposed algorithm is better than the ensemble of classifiers with the used sum method. It should be noted, however, that the difference in average ranks is not large enough to point to the significant differences between \(\varPsi _{MCV}\) and \(\varPsi _{S}\) algorithms.

5 Conclusion

In this paper we have proposed the methods that use information from the decision profiles and modified CVs received from the base classifiers. The aim of the experiments was to compare the proposed algorithm with all base classifiers and the ensemble classifiers based on the sum and decision profile methods. The experiments have been carried out on 13 benchmark data sets. We can conclude that the post-hoc Nemenyi test detects significant differences between the proposed algorithm \(\varPsi _{MCV}\) and four base classifiers. While the algorithm \(\varPsi _S\) classification results are statistically different from the three base classifiers. The obtained results show an improvement in the quality of the classification proposed algorithm \(\varPsi _{MCV}\) with respect to all the base classifiers and the used reference ensemble of the classifiers’ methods. Future work might involve the application of the proposed methods for various practical tasks [8, 10, 22] in which base classifiers are used. Additionally, the advantage of the proposed algorithm can be investigated as well as its ability to work in the parallel and distributed environment.