2017 An Ensemble of Enhanced Fuzzy Min Max Neural Networks for Data

falah@ump.edu.my Abstract An ensemble of Enhanced Fuzzy Min Max (EFMM) neural networks for data classification is proposed in this paper. The certified belief in strength (CBS) method is used to formulate the ensemble EFMM model, with the aim to improve the performance of individual EFMM networks. The CBS method is used to measure trustworthiness of each individual EFMM network based on its reputation and strength indicators. Trust is built from strong elements associated with the EFMM network, allowing the CBS method to improve the performance of the ensemble model. An auction procedure based on the first-price sealed-bid scheme is adopted for determining the winning EFMM network in undertaking classification tasks. The effectiveness of the ensemble model is demonstrated using a number of benchmark data sets. Comparing with the existing EFMM networks, the proposed ensemble model is able to improve classification accuracy rates in the empirical


Introduction
An artificial neural network, or simply neural network, is a computational model that consists of an inter-connected group of artificial neurons organized in a network structure, with the aim to emulate the capabilities of the biological neural system [1][2]. One of the most active research areas of neural networks is data classification. Many different neural classifiers have been developed and used in undertaking various classification problems [3][4][5][6][7][8][9]. In this study, the Enhanced Fuzzy Min-Max (EFMM) neural network [10] is selected to be the backbone to develop an ensemble model for tackling data classification problems.
EFMM is a supervised model that entails a dynamic network structure with an online learning capability. It has many salient properties [10]. It is designed to overcome certain drawbacks of the Fuzzy Min Max (FMM) [3] learning algorithm, i.e. limitations of the hyperbox expansion rule, overlap test rule, and hyperbox contraction rule [10]. EFMM combines neural computing and fuzzy set theory into a common framework for tackling data classification tasks. It has been demonstrated that EFMM is able to produce good and accurate classification results, as highlighted in [10]. However, its learning algorithm can be unstable (which occurs in other neural networks with online learning capabilities), in which a change in the sequence of training samples could affect the network performance [11]. Therefore, it is useful to enhance the EFMM learning algorithm and improve its performance by using an ensemble approach.
To minimize the classification errors, one useful way is to deploy a group of classifiers for decision making. In other words, it is useful to combine the predictions (or select the best one), instead of using only single classifier in making a final decision. An ensemble model is concerned with combining the decisions of a group of individual classifiers in such a way to reach an more accurate decision. As reported in the literature, using multiple individual classifiers (or base classifiers) in an ensemble model allows more reliable and accurate predictions to be produced [12][13][14][15].
Nevertheless, designing an effective ensemble model is a challenging task. One of the potential problems is the trust measurement of the base classifiers, i.e. how to measure trustworthiness of each classifier and select the most accurate decision from different decisions. A variety of trust and reputation methods are available [16][17][18]. Accordingly, an attempt has  [19], which is based on reputation and strength. The CBS method has been shown to be useful in improving the accuracy rates of a number of base classifiers (the original FMM network [3]). In this study, the application of the ensemble model (MACS-CBS) in [19] is further evaluated using a group of EFMM classifiers, instead of the original FMM networks. The performance is evaluated using different benchmark data sets. The present paper is organized as follows. In Sections 2, the EFMM neural network is introduced. The ensemble model based on CBS is explained in section 3. Section 4 details the experimental study. Finally, conclusions and suggestions for further work are presented in Section 5.

The EFMM Neural Network
The EFMM network is a supervised model that integrates both neural and fuzzy computing paradigms into a unified framework. It uses hyperbox fuzzy sets to create and store knowledge (as hidden nodes) in its network structure. Each hidden node represents a hyperbox. Each hyperbox is defined by its minimum (min) and maximum (max) points in an n-dimensional data space. The fuzzy part in EFMM is formed by combining the hyperbox min-max points with the fuzzy membership function. The fuzzy membership function determines the degree of which an input sample belongs to an output class.
The EFMM structure consists of three layers, as shown in Figure 1. Firstly, FA is the input layer, which has the number of input nodes equals to the number of input features. Secondly, FB is the hyperbox layer. Each FB node represents a hyperbox fuzzy set, which is created during the learning process. The connections between the FA and FB nodes are the minimum and maximum points. They are stored in two matrices (V and W), while the membership function is the FB transfer function [3,10]. Thirdly, FC is the output layer, which has the number of nodes equals to the number of output classes.
Learning in EFMM requires a set of input data samples, Ah, h=1,…, N, where N is the total number of data samples, along with their corresponding target classes. Based on the data samples, EFMM creates a number of hyperboxes incrementally. The learning algorithm of FMM comprises a series of expansion and contraction processes. In general, when a new training sample is provided, EFMM uses the membership value (between 0 and 1), which is a measure of the degree-of-fit of a sample with respect to a hyperbox [3,10], to find the closest hyperbox that matches the sample. When an input sample is contained in a hyperbox, it is said that the input sample has a full class membership with respect to the hyperbox. Otherwise, a new hyperbox is created if none of the hyperboxes can be expanded to include the new input sample. The hyperbox size is controlled by a user-defined parameter called the expansion coefficient (Ө). The EFMM learning algorithm comprises a three-step process, viz., hyperbox expansion, hyperbox overlap test, and hyperbox contraction. The hyperbox expansion process is performed to include the input sample in a specific hyperbox, provided that the hyperbox size does not exceed the expansion coefficient, 0≤ Ө ≤1. In this case, when hyperbox B j expands to include a new input sample, A h , the following constraint must be met [10]: If the input sample does not belong to any hyperboxes (i.e. the constraint in (1) is violated for all the existing hyperboxes, even after the expansion process), a new hyperbox is created to absorb the input sample into the network. This incremental process allows new hyperboxes to be added without re-training.
In the overlapping test, EFMM determines whether there are any overlaps among n hyperboxes from different classes. The expansion process leads to overlapping regions among the existing hyperboxes. Therefore, an overlap test is performed to check for overlapping regions between the expanded hyperbox and the existing ones that belong to other classes. An overlapping region exists when at least one of the nine cases explained in [10] is met. By checking each dimension of the input sample, an overlapping region is detected when δ oldδ new <1. Then, by setting ∆=i and δ old =δ new , the overlap test proceeds to the next dimension. The test stops when no more overlapping regions are detected. In this case, δ oldδ new =1 [10]. If the hyperboxes from different classes overlap, a hyperbox contraction process is initiated to eliminate the overlapping regions. Notice that overlapping regions caused by hyperboxes from the same class are allowed, as shown in Figure 2. During the contraction process, EFMM uses twelve (12) cases as explained in [10], to adjust the n overlapped dimensions in each pair overlapped hyperboxes. In other words, the overlapping regions are eliminated by adjusting the overlapped hyperboxes [10].

An Ensemble Model Using the CBS Scheme
The proposed ensemble model consists of a manager and a layer of classifiers, as shown in Figure 3. The manager chooses the winning classifier, which has the best trust score. The manager is responsible for controlling all operations in the ensemble model. The classifier layer consists of three modules. Each module is an EFMM classifier. The CBS scheme used for trust measurement is the nucleus of each module. The learning procedure in each module is as follows [19]: 1. Training: Each EFMM classifier is trained with a different sequence of the training data samples. This leads to diverse knowledge bases with respect to different training data sequences. 2. Prediction: The prediction process produces the accuracy rate of each hyperbox. The hyperbox accuracy rate indicates the degree of the belief element, which represents the knowledge of each classifier. This element is updated during the test phase in the online classification stage.  ( where HA is hyperbox accuracy, CP is the number of correct predictions, ICP is the number of incorrect predictions, i is the hyperbox number, and j is the number of input samples classified by hyperbox (i), j= 1, 2, ….., n.
The CBS scheme is established based on the Bucket Brigade Algorithm (BBA) [20]. However, in the present CBS scheme, the bidding part of BBA is modified by incorporating (i.e., the belief degree), as shown in Equation (2). Each classifier participates in the auction by starting with a specific net worth (amount of money) defined by the manager. The net worth is called strength (S). Each classifier then declares its maximum bid, which is proportional to its strength. The bid ratio (B) is affected strongly by the element, which represents the state of knowledge of the classifier, and is calculated based on the historical information of each classifier. The information is recorded during prediction, which is based on the accuracy rate of each hyperbox, i.e. Equation (2).
All participating classifiers are assumed to start bidding with an initial strength of 100 (i.e., S=100), and all classifiers start their bids after receiving the test sample. It is found from many experimental trials that a suitable value for the bid coefficient ( ) is 0.01. Each classifier places its bids in proportion to its strength as follows [20]: where k is the classifier number, k= 1, 2,.., m. In this study, Equation (3) is used as the reward and penalty measure to update the strength (as embedded in Equation (5)). To compute the trust element, Equation (4) is a modification version of Equation (3): The classifier that makes a correct prediction with the largest CBS (the highest trust value) is chosen as the winner by the manager, using the first-price sealed-bid auction. Based on the final decision, the strength of each classifier is updated, by adopting the classifier's bid defined in Equation (3). The updated value is positive (reward) if a classifier gives a correct prediction. It is negative (penalty) in the case of an incorrect prediction. The equation for updating the strength (either reward or penalty) is as follows [20]: (5) where P represents penalty, T is a tax, R is a reward, is the classifier index, and t is the time step. Both the strength and HA values are updated in each time step; whereby the HA element is updated according to the correct and incorrect predictions. The final test accuracy rate of the ensemble model is computed as follows: where CPTS and ICPTS are the numbers of correct and incorrect predicted test samples, respectively. Since the test accuracy rate (Equation 6) is updated online for each test sample, it also serves as a tie-breaker to determine the winner in the case where all the CBS values are equal, i.e., the classifier with the highest test accuracy rate is selected to be the final winner. In this section, it is explained the results of research and at the same time is given the comprehensive discussion. Results can be presented in figures, graphs, tables and others that make the reader understand easily [2], [5]. The discussion can be made in several subchapters.

Experimental Study
Two cases studies were conducted to evaluate the effectiveness of the proposed ensemble model for undertaking data classification tasks. In the first case study, the Page Blocks benchmark problem was used to compare the performances between individual and ensemble EFMM models. The Page Block data set was obtained from the University of California, Irvine (UCI) machine learning repository [21]. In the second case study, the circle-inthe-square problem was used to compare the proposed ensemble model with other model published in [22]. The overall results were obtained by using the bootstrap method [23].

Case Study I
This case study shows the performance comparison between individual and ensemble EFMM models, using the Page Blocks data set. The data set was divided into two randomly subsets, 80% as training set and 20% for test. In order to find the CBS value in the ensemble model, the data set was divided into three subsets: training (60%), prediction (20%), and test (20%). A series of experiments was conducted by using three different hyperbox sizes, i.e. a small size Ө= 0.05, a medium size Ө= 0.5, and a large hyperbox size Ө= 0.9. Each experiment was repeated 10 times (i.e., 10 times for each Ө setting). Table 1 shows the mean classification results computed using the bootstrap method [23]. As shown in Table 1, the ensemble model outperforms individual EFMM classifiers. This indicates that the CBS model scheme is useful for combining the predictions from individual classifiers. The main reason is that the CBS scheme comprises two useful elements. The first is the hyperbox accuracy rate, which represents the knowledge of the classifier, which reflects the degree of the belief element. The second is the strength (S). Both elements are useful indicators that help the ensemble model to make accurate predictions, as compared with individual EFMM classifiers.

Case Study II
By following the procedure in [22], the aim of this circle-in-the-square problem was to examine the effect of changing the training set sizes with respect to the performance of the ensemble model. The results were compared with those of Fuzzy ARTMAP published in [22]. In accordance with [22], the training set size varied from 100 to 100000 (randomly generated), while 1000 new data samples were used for test. To find the CBS value in the ensemble model, the training data samples were further divided into two subsets: training (80%) and prediction (20%), while the test samples (1000) remained the same. The expansion coefficient or the hyperbox size (Ө) was set to a small value (0.05), in order to maximize the performance. Table  2 shows the mean accuracy rates computed using the bootstrap method [23]. Comparing with FAM, both individual and ensemble EFMM models achieved better test accuracy rates with 100, 1000, and 10000 training samples, at the expense of a higher network complexity, as shown in Table 2. With 100000 samples, both models created complexity structures, which resulted in higher misclassification rates, as compared with FAM.

Conclusion
An ensemble EFMM model has been proposed for tackling data classification problems. The CBS scheme, which is devised based on strength (money) and reputation (hyperbox accuracy) of each EFMM classifier, has been shown to be useful for formulating the ensemble model. Trust is built from meaningful elements that are linked with the EFMM classifiers, allowing the CBS scheme to improve the performance of the ensemble model. The first-price sealed bid auction is adopted for the ensemble model to determine the winning classifier. The ensemble model has been evaluated using the Page Block and circle-in-square problems. The results have demonstrated the advantage of the ensemble model, which is able to yield good classification performances as compared with individual EFMM classifiers.
As the use of CBS with the original EFMM classifier has shown promising results, further work is focused on improving the performance of the ensemble model by using different variants of FMM. The resulting ensemble models will be evaluated comprehensively using realworld classification problems in different domains.