Evolutionary deep belief networks with bootstrap sampling for imbalanced class datasets

ABSTRACT


Introduction
An imbalanced class can affect negatively in the decision-making process by providing poor results due to misclassification and fluctuating error rates [1].Common methods used to overcome such problem are by data sampling or algorithm modeling [2].The data sampling method is commonly used and is useful when it comes to handling the imbalanced class issue because it deals with the problem directly.The basic approach that is frequently used is undersampling or oversampling.However, the issue with undersampling is that it is possible to get rid of crucial data needed for prediction [3], while the issue with oversampling is that it causes overfitting in learning [4].Nevertheless, implementing a data sampling method is still sought after since it can minimize the negative effects of imbalanced class problems because it deals with the data directly.Another plausible solution for an imbalanced class problem is by algorithm modeling.Deep learning has shown promising results in many domains, especially the ones that require high-level abstraction and has complex data features, such as image processing, emotion detection and handwriting recognition [5]- [7].An example of deep learning algorithms is a deep belief network (DBN).DBN can learn from complex feature input such as emotion recognition [7] and acoustic modeling [8].Therefore, it can learn the features from an imbalanced class dataset and classify it correctly.Despite the promising performance of DBN in various fields, the algorithm is generally computationally expensive and unable to achieve a competent result when learning from an inadequate amount of data [7], [8].
Imbalanced class ordeal in a dataset is a common classification task problem.According to references [4] and [9]- [12], imbalanced class refers to the "disparity of data dispensation between the classes".The class that has more training values is called the majority class and the class that has the least or most missing data values are called the minority class [13].Minority data class is a realistic problem that the A R T I C L E I N F O A B S T R A C T real-world situation faced because most of the time even for an important dataset such as cancer detection [14] and bank fraud [15], the data instances are scarce.It can be expensive if the new data needs labeling [16].Unfortunately, most of the algorithms that showed stable and promising performance when using balanced data in classification tasks displayed conflicting outcome when the imbalanced class dataset is used [17].Prediction of minority class is presumed to have a higher error rate compared to the majority class and its test examples are often wrongly classified as well [1].Imbalanced data distribution among the classes causes deficient classification models [11].The algorithm that performs on a balanced dataset will not perform as good when using an imbalanced dataset [9], regardless of how good the model is.In a study done by Yan et al. [10], an imbalanced class dataset in multimedia format is implemented as the input for CNN.The dataset is a TRECVID dataset, which means it is in the form of video.The outcome shows that the error rate fluctuate unlike when using a balanced dataset, the error rate of the algorithm decrease steadily.There are a few commonly used methods utilized to tackle the challenges of the imbalanced class dataset.The first method is using machine learning algorithm and model hybrids according to the input types [2], [12].Another method is by data preprocessing of the imbalanced dataset itself [2].
Bootstrap sampling is when a small sample is derived from its original sample iteratively [10], [18].This method basically reuses its training samples and this is a suitable technique to avoid data redundancy as well as data disposal.Megumi et al. [19] conducted a neuroscience experiment involving fMRI neurofeedback.Bootstrap sampling was utilized as a method to assess the experiment's difference in correlation between the neurofeedback and other networks.Bootstrapping sampling is a frequently adopted technique implemented to improve the performance of deep learning algorithms with imbalanced class data [4], [16].Yan et al. [10] implemented the convolutional neural network (CNN) to classify an imbalanced multimedia dataset.The bootstrap sampling method is integrated with the algorithm to minimize its fluctuating error rate.The experiment yielded high F1-score as compared to another framework proposed by Tokyo Institute of Technology (TiTech).In another literature, Berry et al. [16] implemented bootstrap sampling as a method to improve both computational time and accuracy rate after training the imbalanced and unlabeled data using deep belief network (DBN).The result is recorded to have a 41% decrease in an error rate that needs human intervention as compared to no bootstrapping implementation.Sun et al. [20] predict wind speed and wind power using deep belief network and optimized random forest.The experiment has an inconsistent amount of data because some data are simply unavailable.Therefore, the experiment employed bootstrap sampling as an approach to resampling the training data to improve the performance of their model.
A deep belief network (DBN) is made up of a stack Restricted Boltzmann Machine (RBM) for network pre-training and implement a backpropagation neural network (BPNN) as a fine-tuning step.The RBM architecture connects the hidden layers and visible layers bidirectionally.This two-way connection between the layers results in deeper extraction between the neurons as the weights are connected exclusively.RBM is probabilistic [21], which means that RBM units are assigned statistically random with values 0 or 1.Its two-layer, bipartite, undirected graphical model has a set of binary hidden random variables h of dimension K, a set of binary or real-valued visible random variables v of dimension D. The symmetric connections of the two layers are represented by a weight matrix [22].According to Zheng et al. [23], the output layer of the lower-level layer in the RBM will be the input layer to its higher-level layer in a bottom-up manner to allow the pre-training of weight occurs within the DBN.This feature contributes to an increase in accuracy level.There are two common types of RBM [7], which are Bernoulli RBM and Gaussian RBM.Bernoulli RBM has binary values for its hidden and visible layers, whereas Gaussian RBM has real number values for its hidden and visible layers.
DBN is a composition of simple learning modules, RBM, in a bottom-up way.The RBMs in DBN is trained per layer in a greedy manner [24].DBN's generative pre-training adds a higher level of feature abstraction of the input in the network.Neural network layers are exponentially dense [8], but the deepness of DBN allows low-level feature abstraction handled by the lower layers and the high level or nonlinear feature abstraction handled by the higher layers of the network.However, this made DBN computationally expensive and time-consuming due to its number of layers.According to an experiment done by Le and Provost [7], training a DBN is expensive in terms of computation because pre-training took 11 minutes per epoch, and fine-tuning takes up 10 minutes per epoch.Time is taken to feature extract using DBN for speech emotion recognition cost 136 hours.Plus, it is longer when compared to other methods for the dataset [25].Generative pre-training of weights in DBN is essential to augment the possibilities of the input through layers from below [8]to make it more accurate, but this results in an expensive computation of the network.According to Hinton [26], this setback can be minimized by applying "contrastive divergence" on every layer.Another common problem with DBN is the parameter setting [27].There is much effort to combine the settings of DBN, finally, to get the best performance.Although DBN shows it can learn from imbalanced class dataset better than CNN, the time taken for training is long [28].A financial distress prediction using real-life dataset implements a DBN hybrid to perform the prediction [29].The dataset is classified into two categories, distressed and non-distressed for "Micro-business" and "Small and Medium Business" (SMB), and the ratio is stated to be imbalanced.RBM feature on a DBN is used in a pre-training data, and SVM is employed for classification phase.The result is 76.8% accurate as compared to 62.1% by ANN.Berry et al. [16] utilized DBN to cater to imbalanced unlabeled data.The dataset consists of ultrasound images of tongue when a subject is performing human speech.Imbalanced in speech data is simply unavoidable because some images will look similar than the rest as stated in Zipf's Law.Bootstrapping is incorporated into DBN to reduce computation time.The method proves to improve the accuracy and reduced time taken for labeling the data.Kuang and He [30] attempted to classify fMRI datasets using DBN to predict whether a patient has ADHD or not.The ADHD dataset is imbalanced, and its effect on DBN is low accuracy rate.Therefore, the dataset went through preprocessing methods and this approach saw an increase of accuracy rate using DBN.Genetic algorithm (GA) is a heuristic search algorithm that models from the biological evolution [31], [32] introduced by John Holland in the 1970s.GA mimics the human genetic mutation and selection process [33].GA is made up of the chromosome.A chromosome contains multiple genes, and a collection of chromosomes is called a population [31]- [34].The objective of GA is to ensure that the next iteration has better chromosomes that its previous ones.Therefore, a selection of fitness functions is used as a yardstick to verify that the process is successful.It has various mapping techniques and fitness measurements [35].The crossover and mutation features of GA creates randomness in the population allows the heuristics to avoid local optima solutions [36].Liu et al. [34] state that it is an ideal algorithm to be used in fields such as optimization and forecast.
In the latest researches, GA is commonly used to improve or a part of a hybrid algorithm when it comes to prediction and classification.GA has been used as a hybrid with a Particle Swarm Optimization (PSO) to improve a feature selection process of the Indian Pines hyperspectral data set [33].Jamshidi et al. [35] have used GA as a part of optimization in removing an element in the chemistry domain.GA is also used to optimize SVM in an application using a wavelet transform to forecast short-term wind speed [34].Other than that, GA is utilized as an optimization for CMP [32].Elhoseny et al. [37] have employed GA to balance the energy consumption in WSN domain.The data used is heterogeneous.GA is applied in as a feature selection for a credit risk assessment based on a bank in Croatia [38].Neath et al. [39] applied GA to obtain the ideal level of performance for the proportional-integral-derivative (PID) controller by tuning its parameter.Assodiky et al. [40] used GA as a feature selection as a part of an H20 Deep Learning in order to classify ECG data to detect Arrhythmia.Inanlo and Zadeh [41] applied classification on social networks using GA-based DBN.The network has converged properly and is stable to classify social networks dataset.For the imbalanced data classification task, GA is used efficiently to overcome the common problems through its selective feature.Deshmukh and Akarte [42] has used GA as an approach to improve SVM for its imbalanced medical data task.Haque et al. [36] took an approach to use a heterogeneous Ensemble of Classifiers (EoC) in order to overcome the imbalanced data problem through generalization.The authors proposed a GA-based technique to appoint the best classifiers that will build a good heterogenous EoC and acquired better result than base classifier and other ensembles.Another method to deal with the imbalanced dataset is by using the cost matrix method.Perry et al. [43] integrated GA to produce cost matrices that will allow the algorithm to deal with different usecases of imbalanced data efficiently.GA is known to be robust and a good optimization algorithm [34].Haque et al. [36] stated that GA is suitable for tasks that are massive and elaborate because it is less likely to get stuck in local optima, unlike other heuristics.However, Asadi et al. [44] claim that GA is computationally expensive when an evaluation function needs to be executed many times.Therefore, the overall context of the datasets needs to be taken into account if the algorithm aims to be costsensitive.
In this paper, an optimized DBN is proposed to control the negative outcomes caused by imbalanced class data towards the performance of the algorithm using an evolutionary algorithm.An evolutionary algorithm (EA) is incorporated to provide the optimum dropout number, learning rate, batch size, and iteration number of BPNN for fine-tuning in DBN.Bootstrap sampling is also incorporated in the algorithm structure to minimize the bias of data training samples.These modifications improved the ability to predict more accurate outcomes for the imbalance dataset.

The Proposed Method
DBN shows the good result when dealing with inputs of the complex feature.However, the result of DBN in predicting imbalanced class datasets is unstable.This paper proposes GA and bootstrap sampling as a part of DBN to minimize the shortcomings.Fig. 1 explains the modification performed on DBN.

Fig. 1. Flowchart of Evolutionary DBN with bootstrap sampling
In Evolutionary DBN with bootstrap sampling, the algorithm receives an imbalanced class dataset as input.The instances of the data are taken and split up into training and testing set using cross-validation.The testing set of the dataset is assigned to 0.2, which means 80% of the dataset is used for training, and 20% is left out and is used for testing.The maximum epoch is set to 100.One epoch will take the inputs as neurons and calculate the weights and biases into the connected hidden layers.One node of a neuron consists of initialized weights and calculated with activation function, rectifier linear unit (ReLU).The output of one neuron is an input for another neuron, which allows the network to learn.The input will go through a network of connected RBM for the pre-training phase and backpropagation neural network (BPNN) for fine-tuning phase.Then, the weights will be adjusted for the next epoch as per trained by the previous epoch.After the calculated weights are adjusted, the network will iterate the same process of calculation and keep adjusting the weights until the maximum epoch is reached.The weights are responsible for the network to make a decision and predict the output as trained using the training data.In the algorithm, pre-training using RBM is set to 5. The DBN classifier is assigned with 2 hidden layers that consist of 256 neurons per hidden layer.Genetic algorithm (GA) is employed as the evolutionary part of Evolutionary DBN with bootstrap sampling.The main steps of GA are initializing population, calculate the fitness of the individuals in the current population, creating a new population and the fitness of its individuals are calculated and compared with the previous population.Next, the individuals with the best fitness value will be mutated and evolved for the next generation.The procedure is repeated until termination.In this algorithm, GA initializes the population of 5 DBN classifiers with randomized parameters.The parameters improved using GA approach are the BPNN iteration number, the learning rate of the network, batch size, and dropout number.The population underwent an evolutionary process where the fitness calculation is performed.The classifier with the best and secondbest fitness value is selected as a new generation.Then, this new generation went through a mutation process where GA chose a parameter of the best classifier randomly and assigned the value randomly according to its value range.In this case, the BPNN iteration number is set between 100 to 300, the learning rate is between 0.01 and 0.1, the batch size is set between 1 to 10 and dropout size is between 0.1 and 0.6.After this evolution process, the algorithm returns a GA optimized DBN and performed bootstrap sampling.Bootstrap sampling is implemented after an evolutionary DBN classifier has optimized its parameter setting.The classifier will train itself using the initial training data split.Then, the dataset gets reshuffled using the same ratio utilized for the initial training data.The classifier will retrain itself again.This process repeats until the fourth time.This is the optimal sampling number for Evolutionary DBN.As the maximum epoch reached, the testing set is used to test the algorithm performance.The algorithm predicts and classifies the output according to its learning performance.
Performance metrics such as accuracy rate, weighted mean precision, weighted mean recall, and F1score is computed and taken as a measure to evaluate the overall implementation of Evolutionary DBN with bootstrap sampling.The accuracy rate is the total number of correctly classified over the total number of samples.The formula for the accuracy rate is shown in (1).
Where TP is true positives, TN is true negatives, FP is false positives, and FN is false negatives.However, the accuracy rate alone is not enough to review the performance when handling imbalanced class datasets.Therefore, weighted mean recall of the algorithm is taken into account to ensure there is no bias when it comes to recalling samples from minority class [45].The formula for the weighted mean recall is as shown in (2).
Weighted mean precision indicates the preciseness of the algorithm for each imbalanced class datasets.The formula is shown in (3).
F1-score evaluate the harmonic value between recall and precision.This metric is useful to find the balance between the biased of an algorithm with its preciseness when classifying instances in the correct category.The formula of F1-score is shown in (4).

Datasets
This section presents the imbalanced class datasets used for the experiment.The datasets are chosen based on the data disparity of the instances between their classes.The datasets are also chosen based on similar studies using the same datasets conducted by Weiss and Provost [1], Zhang et al. [46], Boughorbel et al. [47], and Lopez et al. [48].
Table 1 shows the distribution of the imbalanced class datasets.The instances are the number of items in the dataset.The number of the attribute includes the class.Missing values are disposed from the datasets.The imbalance pairings for the datasets range between 15-85, 20-80 and 75-25 for binomial category datasets.The distribution ratios are not balanced either by 50-50 or 40-60 pairing.The datasets with a huge gap of the ratio are more exposed to encounter biased prediction as compared to a dataset with a smaller ratio gap.For multiclass datasets, the data are divided into nominal and ordinal categories.Nominal is when a dataset label has more than two classes.Ordinal data is also when the instances can be classified into more than two classes but in an ordered form.The instances are sorted into each class according to the attributes it satisfies.It presents the distribution of the majority class, minority class and other classes in the dataset.This is because all the imbalanced class datasets in the multiclass category have a varied number of classes.Therefore, it might not clear to see the difference between the majority and minority classes if the data distribution is presented according to each class

Results and Discussion
This section presents the results of Evolutionary DBN with bootstrap sampling to classify imbalanced class datasets and its comparison to other algorithms such as DBN and DNN.Performance metrics used to evaluate the performance of Evolutionary DBN with bootstrap sampling are accuracy rate, weighted mean recall, weighted mean precision, and F1-score.The results are recorded in the respective tables and are analyzed.

Accuracy Rate
Table 2 depicts the accuracy rate achieved for each algorithm for each imbalanced class dataset used in this experiment.The result is discussed according to the category of the imbalanced class datasets.For binomial category, the proposed Evolutionary DBN with bootstrap sampling achieved the highest accuracy rate contrast to other algorithms used for comparison, with an exception for SPECT dataset where SVM also achieved a high accuracy score.SVM manage to score a high accuracy rate for the dataset mentioned because the attributes of the dataset are in binary form, which is suitable for SVM hyperplane approach.If we compare DBN result to other algorithms such as DNN, BPNN, and SVM, the algorithm only score the highest when dataset SPECTF is used in the experiment.Therefore, Evolutionary DBN with bootstrap sampling managed to improve the performance of DBN when predicting the output of imbalanced binomial dataset.DBN manage to achieve the highest accuracy using SPECTF data because it has the most attributes.Therefore, it is only ideal for DBN to perform when the algorithm has many features to learn.Evolutionary DBN with bootstrap sampling overcome this specific requirement for DBN to perform and achieve the highest accuracy rate in all imbalanced class dataset in a binomial category.In the nominal category, Evolutionary DBN with bootstrap sampling attained the highest accuracy rate for imbalanced datasets, Tumor, Ecoli and Audiology, which accounts for 3 out of 5 nominal datasets.For binary attributes in the nominal category, Evolutionary DBN with bootstrap sampling accomplished high accuracy rate for Tumor dataset, but second-highest for Zoo dataset alongside with BPNN.DBN scored the highest accuracy rate for Zoo dataset.For continuous numerical attributes in the nominal category, Evolutionary DBN with bootstrap sampling accomplished high accuracy rate for Ecoli dataset, but second lowest for Yeast dataset.Ecoli has 8 attributes in contrast to Yeast with 9 attributes.The instances are 336 and 1484, respectively.Ecoli dataset has 8 classes as compared to 9 classes of Yeast dataset.The parallel comparison between the binary and continuous numerical attributes shows that the number of instances, attributes, and a total class of the imbalanced datasets are not the factors influencing the performance achieved by Evolutionary DBN with bootstrap sampling.However, for the classes in Zoo dataset, although there are 7 classes, each of the class comprised of different animals that share a similar attribute but not necessarily from the same species.For example, Class 2 consists of 20 bird-like species such as "chicken", "penguin", and "vulture" among others.The shared attributes for these animals might be "2 legs" and "eggs", but a "penguin" is labeled "aquatic" as opposed to a "vulture" is labeled "airborne", while "chicken" is not labeled with such attributes.They are all considered in Class 2. This sort of structure in the dataset might cause Evolutionary DBN with bootstrap sampling unable to achieve 100% accuracy rate as compared to DBN.For continuous numerical attribute, it is important to note that for Yeast dataset, where Evolutionary DBN with bootstrap sampling achieved second-lowest accuracy result yields very low accuracy rate from other algorithms as well with BPNN scored the highest at 57.91%.
In ordinal category, Evolutionary DBN with bootstrap sampling obtained the highest accuracy rate for 2 out of 3 imbalanced class datasets, which are Contraceptive and Post-operative datasets.For Dermatology dataset, the algorithm achieved the second-lowest accuracy result, and DBN is the lowest accurate as compared to DNN, BPNN, and SVM, which has between 91% and 95% accuracy range.Even though there is an increment of about 10% in accuracy, but it can be concluded that DBN structure unable to learn from the dataset attributes.As for Contraceptive dataset, Evolutionary DBN with bootstrap sampling achieved 98% accuracy as compared to DBN at 47.46% and other algorithms ranges between 42% and 63%, this shows a huge improvement rate.Such result is also shown in Post-operative dataset where Evolutionary DBN achieved 94.44% accuracy while other algorithms except DNN, achieved an accuracy rate between 72% and 77%.Both Contraceptive and Post-operative has 3 classes, 10 and 9 attributes respectively with the different number of instances.Nevertheless, Evolutionary DBN with bootstrap sampling manages to extract the features for the classes and learn from the imbalanced class dataset well and show improvement as compared to DBN in all cases.

Weighted Mean Recall
Table 3 presents the weighted mean recall of the algorithms.A recall rate is useful in determining the algorithm is not biased in recalling only the majority class of an imbalanced class dataset, rather also train and test instances from the minority class.An algorithm might have a high accuracy rate, but not a good recall rate.This can mean the algorithm only train and test instances from the majority class.For binomial datasets, Evolutionary DBN with bootstrap sampling possesses the highest recall rate when compared to other deep learning and machine learning algorithms.In this category, the recall rate for Evolutionary DBN with bootstrap sampling ranges from 0.95 to 1.0, which is very good as opposed to the recall rate from DBN that ranges from 0.6 to 0.83.For other algorithms, the lowest recall rate in a binomial category is BPNN for Parkinson dataset at 0.18, and the highest recall rate is SVM for SPECT dataset at 1.0.This shows that Evolutionary DBN with bootstrap sampling manages to learn and recall instances for prediction from both datasets despite its differences in attribute characteristic types and number of attributes, as shown in Table 1.For nominal datasets, Evolutionary DBN with bootstrap sampling achieved perfect recall rate of 1.0 for imbalanced class datasets, Tumor and Ecoli.Both imbalanced class datasets have binary and continuous numerical characteristics for their attributes respectively.When we compare the recall rate in each attribute category, Evolutionary DBN with bootstrap sampling has a high recall rate for binary attributes, which scored 1.0 and 0.95 for respective datasets.However, for continuous numerical attribute category, Evolutionary DBN with bootstrap sampling has a recall rate of 1.0 for Ecoli dataset, but only 0.46 for Yeast dataset.As mentioned previously in accuracy rate analysis and according to Table 1, both of the mentioned imbalanced class datasets are similar in attribute characteristics, data distribution, and the number of classes.Although the difference of the number of instances is staggering between the two imbalanced class datasets, it is unlikely that is the main factor for the underperformance of Evolutionary DBN with bootstrap sampling, considering the number of instances for Yeast dataset is similar to the number of instances for Contraceptive dataset.When compared the performance with other deep learning and machine learning algorithms, the highest recall rate is achieved by BPNN at 0.58, while the rest has a recall rate from 0.4 to 0.48.It is fair to conclude that the algorithms have difficulties in learning from this particular imbalanced class dataset.
As a conclusion for a nominal category, with exception to Yeast dataset, Evolutionary DBN with bootstrap sampling scored a high recall rate from 0.88 to 1.0 for the rest of imbalanced class datasets.
Although DBN manages to achieve a recall rate of 1.0 for Zoo dataset, its recall rate in other datasets is fairly low when compared to Evolutionary DBN with bootstrap sampling.Evolutionary DBN with bootstrap sampling manages to show an improvement of recall rate when compared to DBN, which means the algorithm is less biased when using the instances from both majority and minority classes in nominal type imbalanced class datasets.
For ordinal datasets, Evolutionary DBN with bootstrap sampling presents the highest recall rate for two imbalanced class datasets.As clarified previously in the accuracy rate analysis section, Evolutionary DBN with bootstrap sampling exhibits low recall rate for Dermatology dataset, despite there is an increment from DBN.This shows that the evolutionary and bootstrap sampling feature of the algorithm manages to improve the recall rate of DBN.However, the performance is relatively low when compared to other deep learning and machine learning algorithms.However, the inverse is shown for the other two imbalanced class datasets.For example, for Contraceptive datasets, Evolutionary DBN with bootstrap sampling exhibits not only the highest recall rate at 0.98, but the difference with the lowest recall rate of DNN at 0.33, is much more as contrast to the comparison between the highest recall rate for Dermatology dataset at 0.93 by DNN to 0.6 by Evolutionary DBN with bootstrap sampling.Based on the recall rate for an ordinal category, it can be concluded that Evolutionary DBN with bootstrap sampling manages to recall the instances for prediction and also minimize the partiality towards minority classes in the datasets.

Weighted Mean Precision
Table 4 presents the weighted mean precision of the algorithms.Precision is a commonly used performance metric to determine the preciseness of an algorithm.A precision takes account the correctly classified from the actual classification.This measures the sensitivity of an algorithm.The closer its value to 1, the more precise it is.Consistent with accuracy rate and recall rate from Table 2 and Table 3, Evolutionary DBN with bootstrap sampling has the highest precision rate for imbalanced class datasets in a binomial category.However, the precision rate is smaller as compared to the recall rate from Table 2.This observation can conclude that the algorithm has more false positive (FP) compared to false negative (FN).For example, in Haberman dataset, "0" is labeled when the patient survived 5 years or longer after an operation for breast cancer and "1" is labeled if the patient died within 5 years.Since the FP is higher, it means the number of Types I Error is higher.The number of patients predicted to survive for 5 years or longer are wrongly classified, when they are actually dead within 5 years are higher, when compared to a smaller number of Type II Error where the number of patients predicted to be dead within 5 years are actually survived for 5 years or longer.In the nominal category, the precision result for all the algorithms is also consistent with the accuracy rate and recall rate.Similar to the performance of precision rate in a binomial category, the precision rate in the nominal category is a bit lower as compared to its recall rate.The precision rate for Evolutionary DBN with bootstrap sampling is high for all the imbalanced class datasets except for Yeast dataset.The range of precision rate for this category is from 0.03 to 1.0.Evolutionary DBN with bootstrap sampling manages to achieve precision rate from 0.8 to 1.0 for 4 out of 5 imbalanced class datasets.
In ordinal category, Evolutionary DBN with bootstrap sampling achieved the highest precision rate for 2 out of 3 imbalanced class datasets.However, for Post-operative dataset, the precision rate is higher than its recall rate.To conclude, the precision rate for Evolutionary DBN with bootstrap sampling is high as it scores from 0.8 to 1.0 except for 2 imbalanced class datasets in both nominal and ordinal categories, which rates at 0.21 and 0.4 respectively.This shows that Evolutionary DBN with bootstrap sampling algorithm is precise for binomial imbalanced class datasets, but for the multiclass category, the algorithm performs well but except for two imbalanced class datasets as shown in Table 4.

F1-score
F1-score is a good measure to decide how synchronized is our recall and precision values.An algorithm can have a good recall rate, but have low precision rate or vice versa.Therefore, it raises the question of whether the algorithm has a good performance.This is where F1-score is useful, as it takes into consideration both recall and precision rates and finds its harmonic value.Similar to recall and precision, as the F1-score is closer to value 1; the algorithm has an agreeable recall and precision rates.
In binomial category, the F1-score achieved by Evolutionary DBN with bootstrap sampling is from 0.92 to 1.0.These score presents an improvement as compared to F1-score achieved by DBN, which is from0.45 to 0.76.From Table 5, SVM has a score between 0.64 to 1.0, which is quite stable as compared to DNN and BPNN.As mentioned previously in accuracy rate, the SVM structure makes it easy to learn and predict from imbalanced class datasets in the binomial category.Considering the structure of Evolutionary DBN with bootstrap sampling has similar node structure as DBN, DNN, and BPNN, this shows a huge increment of performance in this category.The F1-score observation deduces that Evolutionary DBN with bootstrap sampling has high accuracy as well as high recall and precision.As for nominal category, except Yeast dataset, the F1-score achieved by Evolutionary DBN with bootstrap sampling is from 0.84 to 1.0.As we can see from Table 2 and Table 3, although the decrement of accuracy and recall rate between Evolutionary DBN with bootstrap sampling and DBN seems small, due to the low precision value of Evolutionary DBN with bootstrap sampling for the mentioned imbalanced class dataset, the F1-score difference between the algorithm and DBN is large, which is 0.29 and 0.44 respectively.It can be assumed that Evolutionary DBN with bootstrap sampling cannot learn from the dataset.However, when compared to other deep learning and machine learning algorithms, Yeast dataset only managed to receive the highest F1-score performance at 0.56 by BPNN.This is very low especially when we contrast to other imbalanced class datasets in the nominal category, the highest F1-score can be achieved from 0.84 to 1.0.Evolutionary DBN with bootstrap sampling achieves all these high F1-scores.Therefore, despite the previous assumption, it can also be concluded that Yeast dataset has a complex feature that is difficult for other deep learning and machine learning algorithms to learn from as well.
Nevertheless, for the rest of imbalanced class datasets in the nominal category, it can be inferred that Evolutionary DBN with bootstrap sampling's performance for accuracy, recall and precision is established based on its high performance of F1-score.In ordinal category, Evolutionary DBN with bootstrap sampling has the highest F1-score for 2 out of 3 imbalanced class datasets.It is consistent with its performance for accuracy, recall and precision rates in Table 2, Table 3, and Table 4.In both datasets, Evolutionary DBN with bootstrap sampling shows an improvement when compared to DBN.As for Dermatology dataset, the F1-score is affected by the low precision value in Table 4.Despite the improved accuracy and recall rates of Evolutionary DBN with bootstrap sampling when compared to DBN, the result shows that the algorithm is not as precise as reflected in other metrics.

Conclusion
Based on the result analyses derived from the performance metrics, Evolutionary DBN with bootstrap sampling outperforms DBN and other deep learning and machine learning algorithms for imbalanced class datasets in a binomial category.However, for multiclass categories, nominal and ordinal, the result is mixed.For example, in a nominal category, Evolutionary DBN with bootstrap sampling outperforms DBN and other deep learning and machine learning algorithms for 3 out of 5 imbalanced class datasets.In Zoo dataset, the result for Evolutionary DBN with bootstrap sampling dropped as compared to when DBN is used.Although Evolutionary DBN with bootstrap sampling has fairly high performance, it is likely that the addendum structure of Evolutionary algorithm and bootstrap sampling to the DBN component made the learning from the specific dataset made the weight calculations in the algorithm too complicated when such dataset is used.In ordinal category, Evolutionary DBN with bootstrap sampling performs well for 2 out of 3 imbalanced class datasets.In Dermatology dataset, Evolutionary DBN with bootstrap sampling showed an increment of accuracy and recall performance when compared with DBN, in contrast to its performance for Zoo dataset in a nominal category.Despite the increased performance in both of the metrics, both Evolutionary DBN with bootstrap sampling and DBN still have the lowest performance in this dataset as compared to other deep learning and machine learning algorithms.It is likely that the structure of both algorithms are not suitable for this type of dataset considering DNN and BPNN, which has similar algorithm structure and calculation manage to achieve the two highest performance for this particular dataset.It can be concluded that for multiclass imbalanced class datasets, Evolutionary DBN with bootstrap sampling performs fairly well even if the result is not as solid as when binomial imbalanced class datasets are used.Another observation that can be made is that Evolutionary DBN with bootstrap sampling performs well on imbalanced class datasets that posses multivariate attributes type in all the categories.

Table 1 .
Details and distribution of Imbalanced Class Datasets

Table 2 .
Accuracy Rate of Algorithms

Table 3 .
Weighted Mean Recall of Algorithms

Table 4 .
Weighted Mean Precision of Algorithms