Dealing with Imbalanced Dataset Leveraging Boundary Samples Discovered by Support Vector Data Description

These days, imbalanced datasets, denoted throughout the paper by ID, (a dataset that contains some (usually two) classes where one contains considerably smaller number of samples than the other(s)) emerge in many real world problems (like health care systems or disease diagnosis systems, anomaly detection, fraud detection, stream based malware detection systems, and so on) and these datasets cause some problems (like under-training of minority class(es) and over-training of majority class(es), bias towards majority class(es), and so on) in classification process and application. Therefore, these datasets take the focus of many researchers in any science and there are several solutions for dealing with this problem. The main aim of this study for dealing with IDs is to resample the borderline samples discovered by Support Vector Data Description (SVDD). There are naturally two kinds of resampling: Under-sampling (U-S) and oversampling (O-S). The O-S may cause the occurrence of over-fitting (the occurrence of over-fitting is its main drawback). The U-S can cause the occurrence of significant information loss (the occurrence of significant information loss is its main drawback). In this study, to avoid the drawbacks of the sampling techniques, we focus on the samples that may be misclassified. The data points that can be misclassified are considered to be the borderline data points which are on border(s) between the majority class(es) and minority class(es). First by SVDD, we find the borderline examples; then, the data resampling is applied over them. At the next step, the base classifier is trained on the newly created dataset. Finally, we compare the result of our method in terms of Area Under Curve (AUC) and F-measure and This work is licensed under a Creative Commons Attribution 4.0 International License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. Computers, Materials & Continua DOI:10.32604/cmc.2021.012547 Article ech T Press Science


Introduction
Data mining is a sub-field in artificial intelligence [1][2][3][4][5][6][7][8][9][10]. It has wide applications in classification and clustering of data in real world problems [11][12][13][14][15][16][17][18][19][20]. Nowadays, different classifiers have been gradually proposed through different underlying assumptions and mechanisms in order to enhance classification accuracy [21][22][23][24][25][26][27][28][29][30][31][32]. One of the most challenging problems for classifiers has been to learn an Imbalanced Dataset (ID) problem. A dataset will be considered to be an imbalanced one, if it contains at least two classes where the number of data points in one class (majority class) overshadows the number of data points in the other class (minority class). Ordinary supervised learning algorithms are weak in learning ID problems. They are inclined to the majority class [33]. Ignoring the minority class is not tolerated in many problems such as the medical ones [34], risk financial assessment ones [35], etc. To tackle the challenges of IDs, different methods have been proposed that are divided into two categories: (a) external approaches and (b) internal approaches. The methods of the first category are the ones that try to balance the distribution of the class data points. The methods of the second category are the ones that try to manipulate machine learning algorithms so as to be able to handle IDs. In the current research as an approach of first type, it has been tried to boost the data points in sub-sampling trials that are error-prone. To do this, we have used an auxiliary set of the boundary data points discovered by Support Vector Data Description (SVDD).
Base classifiers perform poorly when dealing with IDs. Therefore, the learning of a given ID is considered to be a great challenge. Standard base classifiers poorly diagnose the minority class samples. Several approaches have been established for dealing with the problem of class imbalance in IDs, to improve the generalization in classification. We can categorize them into 2 general classes [36]: (1) the approaches which solve the mentioned problem in algorithm-level, (2) those which solve the mentioned problem in data-level. Those in the first class solve the learning ID problem through adjusting previous machine learning methods so as to learn better in the new imbalanced situation. The approaches in the second class solve the learning ID problem through manipulation of training data (minority class(es) and/ or majority class(es)) so as to make the dataset balanced. It is generally done through an over-sampling (O-S) or an under-sampling (U-S) (or a hybrid of them). O-S increases the minority class size, but U-S decreases the majority class size [36]. It is widely acceptable that U-S is a better solution in the learning ID problem [37].
Nevertheless, most of these techniques neglect the effect of borders' samples on classification performance; the high impact borderline samples might expose to misclassification. In this paper, a new framework is introduced to deal with learning ID problem. The performance of our framework is evaluated and compared with other state-of-the-art systems. A number of experiments have been performed under some benchmark datasets with different imbalanced ratios. The results obtained from our framework, when compared to the state-of-the-art works, confirm its better performance for the different datasets and different base classifiers.
Many attempts have been made to alleviate the problem of class imbalance. The Synthetic Minority Over-Sampling TEchnique (SMOTE) [38,39] is an O-S approach that was developed to deal with the problem of IDs' learning by creating synthetic minority class samples. SMOTE resamples the minority class instances by synthesizing new samples of the minority class. Several variants of SMOTE have been proposed to overcome the drawbacks of SMOTE. Such as Borderline-SMOTE [39] which determines boundary minority class samples by using neighbor information and then applied SMOTE on the border samples; Safe-level-SMOTE [40] synthesizing the minority samples according to the safe level which is computed by using nearest neighbor minority instances; MWMOTE [41] generates samples from the weighted informative samples using a clustering approach; K-means SMOTE [42] and so on. Han et al. [39] proposed the borderline-SMOTE algorithm, which was modified later by He et al. [43], to improve SMOTE performance as it has inevitably randomness, where the numbers of the majority class instances and the border instances neighboring the minority class are compared. Then the O-S is done for the border samples of the minority class; that is, the interpolation is carried out in the proper area, they found that borderline-SMOTE performs better than SMOTE. Nevertheless, as SMOTE creates artificial instances with minority class label and ignores the majority class instances during creation of artificial instances, it is highly likely that it causes class mixture and makes over-generalization [44]. In this paper, a new approach is proposed which is suitable to address the ID problem. Our proposed approach is tested and assessed on different benchmarks and it is compared with many state of the art approaches that have been introduced to deal with learning ID problem.
In recent years, machine learning communities have paid much attention to imbalanced learning. Considering vast domain of the real-world problems, attention to imbalanced learning challenge grows every day. It is worthy to be mentioned that we are involved in imbalanced learning in many real-world problems. For example, the analysis of the satellite high-resolution images and healthcare recognition systems are two problems involving in imbalanced learning problem. It is a key point that minority class (es) is(are) target class(es); due to its(their) insufficient samples, it is (they are) hardly distinguishable from majority class(es) in imbalanced learning problems. For example, patients are hardly distinguishable from healthy individuals. The questions posed in the current study are: (a) "how is it possible to change skewed class distribution into balanced one?", and (b) "when is the proposed method superior to the previous methods for learning IDs?"; and the answers to these questions are provided in the following.
In the current era, IDs are a great part of real world datasets. As for IDs, majority class(es) is(are) superior to minority class(es); therefore, correct classification of samples of minority class(es) is of high importance. For example, the problem of detection of diabetic or Escherichia coli-infected patients can be considered to be imbalanced learning. Diabetic patients go to the minority class showing the superiority of minority class to majority class in terms of importance. For each new sample, there are 4 possibilities: (a) a diabetic patient is diagnosed as a diabetic patient, (b) a diabetic patient is diagnosed as a healthy person, (c) a healthy person is diagnosed as a healthy person, and (d) a healthy person is diagnosed as a diabetic patient. Accordingly, if a healthy person is diagnosed as diabetic, it will not be a very bad thing (at least not fatal); but if a diabetic patient is diagnosed as a healthy one, we will face a misclassification which may threaten the life of a human.
The paper is organized into 5 sections. Section 1 includes topic and problem introduction. Section 2 is dedicated to definitions and literature. The presentation of the proposed method and its explanation are available in Section 3. Experimental results are presented in Section 4 in detail. Finally, Section 5 concludes the paper and presents future research directions.

Definitions
Imbalanced dataset: A dataset which has more data points in one or multiple of its class(es) compared to data points in the other class(es) is an ID. The mentioned more frequent class(es) are called majority and the other(s) is(are) minority class(es). Fig. 1a shows an arbitrary ID with one minority class and one majority class.
The drawback in learning of IDs is that traditional classification algorithms are biased toward majority classes (negative samples). Consequently, increasing misclassification of samples in minority classes (positive samples) is likely. Recently, numerous solutions have been proposed to deal with the mentioned problem. The following definitions focus on some definitions needed for understanding these methods.
Cost-sensitive learning techniques: This type of solutions contains approaches at the data level, at the algorithmic level or at the both levels combined considering higher costs for the misclassification of examples of the positive class(es) (minority class(es)) with respect to the negative class(es) (majority class(es)). Most of the studies on the behavior of several standard classifiers in the ID domains have shown that significant loss of performance is mainly due to the skewed class distribution given by the imbalance ratio defined as the ratio of the number of instances in the majority class to the number of examples in the minority class [45].
Data sampling: In which the training instances are modified in a way to produce a more or less balanced class distribution that allows a basic classifier to perform in a similar manner to the standard classification. O-S and U-S techniques are applied on the training data distribution. The both techniques can be used for dealing with learning of IDs. Keep in mind that a change in training data distribution leads to biased training of dataset, because uniform misclassification costs are incurred. For example, the training dataset distribution is changed; the ratio of correct examples to false examples will change from 1:1 to 2:1. Accordingly, one example goes to misclassified class. Sampling is proposed for some reasons including: (a) First and the most important reason is that there is no need to administer cost-sensitive approach for all training algorithms. Therefore, only a learning-based approach is available, (b) There are numerous biased training datasets and the size of training dataset has to be reduced for academic learning, and (c) There is no precise cost defined for each misclassification.
Over-sampling: O-S is a process to extract a data superset from original set of minority class(es). It is a process of resampling or generating new examples from the existing ones in minority class(es).
Under-sampling: U-S is a process to extract a data subset from original set of majority class(es). It is a process of eliminating some of examples in majority class(es). Artificial O-S techniques: Artificial O-S techniques (like SMOTE) are those that aim at increasing data samples of the minority class(es) to deal with effect of the low number of the samples in minority class(es) of ID. In this method, a set of synthetic data samples from minority class(es) are produced and then they are added to ID to be balanced. By producing an additional number of samples from minority class(es), traditional base classifiers, such as decision trees and support vector machine, artificial neural network, will be able to enhance their decision-making.
Ensemble methods: Ensemble classifiers are known as models with multiple classifiers. These methods aim at enhancing the performance of models with single classifiers. They generate multiple classifiers and combine them in order to introduce a new classifier having the capacity of all the combined classifiers in itself. The main idea is to develop multiple classifiers from the original dataset and then to sum up their predictions facing an unknown example. Ensemble methods are based on combination of ensemble learning algorithms using the techniques that are similar to the ones employed by cost-sensitive methods. A complete categorization of ensemble methods in ID problems has been recently introduced. Some ensemble methods, which have been specifically proposed for ID problems, are as follows [46]: In the following, the proposed method is introduced inspiring some of the mentioned methods using SVDD.

Related Work
Seiffert et al. [47] have proposed a combined method called RUSBooST to reduce class errors. If the training dataset is an ID, achieving an efficient classifier may be challenging. Their paper studies the performance of RUSBooST in comparison to its components RUS and AdaBoost. They have indicated that RUSBooST outperforms RUS and AdaBoost in terms of classification accuracy. Additionally, RUSBooST is compared to another member of the same family, SMOTEBoost. The results are the same as that of SMOTE. The study shows the results for each basic learner with no sampling or bagging. This study proves that RUSBooST is a fast and simple algorithm with less complication in replacement compared to SMOTEBoost. SMOTEBoost has two major drawbacks: it is complicated to implement and time-consuming. The mentioned drawbacks could be solved replacing RUS by SMOTE.
Hajizadeh et al. [48] have studied the nearest neighbor classifier with locally weighted distance method (NNLWD). This study aims at promoting the performance of the nearest neighbor classifier in IDs without interrupting the original data distribution. The approach proposed in this study performs well in minority class(es). Also, it performs acceptable in majority class(es). The mentioned approach precisely classifies the samples of different classes. With regard to class distribution, each class is designated a weight. The Weighting which leads to better performance of nearest neighbors' method is based on G-Mean. Generally, the study showed that O-S of minority class(es) and U-S of majority class(es) were useful in dealing with IDs. It also indicated that overuse of the two methods leads to some complications including loss of important information and over-fitting phenomenon.
Weiss et al. [45] have studied the comparison between cost-sensitive and sampling methods in dealing with IDs. Performance of a classification algorithm for a two-class problem (a problem which has only two classes: true or false) is evaluated. In the mentioned method, the optimized metric to investigate the classifier performance is the total costs if classification costs are miscalculated. In the mentioned study, the only metric is the total costs.
Chawla et al. [38] has studied AdaBoost algorithm to solve the ID problems. Synthetic minority O-S technique has been specifically designed to solve the imbalanced learning problems. In the mentioned study, SMOTEBoost has been incorporated with boost techniques in order to solve the imbalanced learning problems. Contrary to standard boost method designating equal weights for all misclassified examples, SMOTEBoost generates synthetic minority examples and directly changes the newly designated weights. It finally adjusts the skewed class distribution. In the given method, some synthetic minority examples are generated by operating on feature space. Having generated more synthetic minority examples, categorical training algorithms including decision tree have been applied. This study deals with the two features: continuous and discrete. For calculation of minority nearest neighbor, Euclidean distance is used and for continuous features and absolute-value distance is used for discrete features. Their proposed algorithm [38] uses the benefits of BOOST and SMOTE algorithms successfully. It is summarized as: "While BOOST algorithm enhances the prediction accuracy of classifiers focusing on complicated examples of all classes, SMOTE enhances the performance of minority example classifiers".
Liu et al. [49] have studied the usability of decision tree in imbalanced learning problems. They have introduced a new decision tree. The decision tree of relative certainty enhanced the classifier performance. To produce a well-defined decision tree, the study started with data collection. C4.5 was used for measurement. It resulted in an explanation for why the final datasets skew toward the majority class. To solve the bias, a variable named CCP has been introduced. The newly introduced variable has been a basis for CCPDT. To develop statistically meaningful rules, a set of methods have been derived from bottom-up and top-down methods using Fisher test to prune the statistically meaningless branches. In their method, the statistical classifier performance enhanced and trees have faced balanced datasets. Their study geometrically and theoretically indicated that CCP is sensitive to class distribution. Accordingly, CCP is embedded in datasets to use the optimized variables in decision tree.
Chawla [50] has studied the IDs and sampling alternatives and also decision tree. A dataset has been considered imbalanced by him if the class(es) is(are) presented unequally. A question is posed in this study that what is the proper dataset distribution according to different dataset distributions? Observations show that normal data distribution is mostly the optimized distribution for classifier learning algorithm. Additionally, IDs lead to greater dispersion with regard to IDs in feature space. Therefore, O-S and U-S may lose their usability. Accordingly, this study frequently uses O-S and U-S along with synthetic minority sampling. In this study, C4.5 has been used for 3 sampling methods. The experimental analysis has aimed at evaluating the structural effects, estimation and sampling methods upon Area Under Curve (AUC). SVDD [51] has a sphere borderline surrounding datasets. Similar to SVM, SVDD uses flexible kernel matrices. Generally speaking, data distribution description is of many advantages: first, it helps elimination of irrelevant and poor-defined data. Second, it is useful to classify datasets a class of which is well-sampled and another class is poor-sampled. Another advantage is the ability to compare datasets. Imagine a dataset is trained after multiple expensive stages have been completed. If there is a new dataset for the similar process, the two mentioned datasets can be compared. If the old and the new datasets are similar, training can be eliminated but if they are different, new training dataset would be obligatory.
Another work [52] has proposed to categorize majority examples into x-member classes, where N is number of negative samples and P is number of positive samples. The x partitions extracted from negative (majority) class are without overlapping. For each partition of negative class, we add all samples minority classes, and then an AdaBoost classifier is run. Finally, the obtained results for all x datasets are incorporated.
Balanced Random Forest [53] abbreviated as BRF is different from Random Forest in that it uses balanced initiators. It also is different from under-sampling+random forest in which it pre-processes training datasets and then applies random forest.
ASYMBoost [54] is a cost-sensitive AdaBoost algorithm. In the mentioned algorithm r ¼ Liu et al. [55] have conducted a study titled exploratory U-S for imbalanced class training to deal with imbalanced class problems. U-S is a popular method for ID problems using majority subsets. It leads to an efficient method. The mentioned study has aimed at proposing two methods to solve the ID problem. First known as easy ensemble derive multiple majority subsets for each of which a training algorithm is assigned. Then their results are incorporated. Second known as balance cascade trains the training algorithms consecutively. In this way, well-classified majority examples of each class would be eliminated from the given dataset at the next classification stage.
The family of Spider method [56] has been proposed to solve the problem of cost-sensitivity. To this end, majority class clearance stages are incorporated into minority class US.
In 2015, some researchers have proposed a new method to deal with ID problems [57]. Their method, KernelDASYN, has introduced an adaptive synthetic kernel matrix for IDs. In their method, an adaptive synthetic structure has been built-up for minority classes. Adaptive data distribution is estimated by kernel matrix and weighted by stiffness degree. In the mentioned method, a function named PDF is used to estimate likelihood density. After that, numerous potent classification methods have been recently proposed [58].
In [59], a new synthetic classification method has been proposed for ID problems. It is called ISEOMs. In the mentioned method, SOM-based learning modification is possible by searching the winner neuron based on energy function and by minimizing the local error at competitive learning stage. The current method has enhanced the classifier performance extracting knowledge from minority classes. Positive and negative examples of training phase are used for minority and majority classes, respectively. Positive SOM has been developed based on the original minority class.
In [60], some researchers have proposed a new method to design a balanced classifier on imbalanced training data based on margin distribution theory. Recently, Large margin Distribution Machine (LDM) has been put forward and it has obtained superior classification performance compared with Support Vector Machine (SVM) and many state-of-the-art methods. However, one of the deficiencies of LDM is that it easily leads to the lower detection rate of the minority class than that of the majority class on ID which contradicts to the needs of high detection rate of the minority class in the real application. In the mentioned paper, Cost-Sensitive Large margin Distribution Machine (CS-LDM) has been brought forward to improve the detection rate of the minority class by introducing cost-sensitive margin mean and cost-sensitive penalty.
In [61], the performance of a novel method, Parallel Selective Sampling (PSS), has been assessed. It is able to select data from the majority class to reduce imbalance in large datasets. PSS was combined with the Support Vector Machine (SVM) classification. PSS-SVM has showed excellent performances on synthetic datasets, much better than SVM. Moreover, it has been shown that on real datasets PSS-SVM classifiers had performances slightly better than those of SVM and RUSBoost classifiers with reduced processing times. In fact, their proposed strategy was conceived and designed for parallel and distributed computing. In conclusion, PSS-SVM is a valuable alternative to SVM and RUSBoost for the problem of classification by huge and imbalanced data, due to its accurate statistical predictions and low computational complexity.
In [62] some researchers have proposed a feature learning method based on the autoencoder to learn a set of features with better classification capabilities of the minority and the majority classes to address the imbalanced classification problems. Two sets of features are learned by two stacked autoencoders with different activation functions to capture different characteristics of the data and they are combined to form the Dual Autoencoding Features. Samples are then classified in the new feature space learnt in this manner instead of the original input space.
In [63], the authors have described preprocessing, cost-sensitive learning and ensemble techniques, carrying out an experimental study to contrast these approaches in an intra and inter-family comparison. They have carried out a thorough discussion on the main issues related to using data intrinsic characteristics in this classification problem. This has helped them to improve the given models with respect to: the presence of small disjuncts, the lack of density in the training data, the overlapping between classes, the identification of noisy data, the significance of the borderline instances, and the dataset shift between the training and the test distributions. Finally, they have introduced several approaches and recommendations to address these problems in conjunction with ID, and they have shown some experimental examples on the behavior of the learning algorithms on data with such intrinsic characteristics.
A geometric structural ensemble (GSE) has been introduced [64]. GSE partitions instances of majority class and then eliminates useless instances through constructing a hypersphere using the Euclidean criterion. By repeating the mentioned task, the simple models will be created.

Proposed Method
According to the previous sections, the classification algorithms well-tuned for ID outperform the conventional classification methods. The current study aims at introducing a new method well-tuned for ID that is based on O-S concept (like methods such as SMOTE) and also U-S concept (like methods such as RUS). It uses SVDD to find the borderline (or error-prone) data samples. Then, using the mentioned data samples, we introduce a hybrid O-S and U-S mechanism. The study uses a different method for ID classification. SMOTE and RUS-based borderline finding techniques and classifiers including RF, IBK and AdaBoost have been used in the proposed method. The proposed method approaches the predefined goal focusing on the desired classification accuracy. The final results have been significantly optimized. Before introducing the complete description of the proposed method, the three classification frameworks used in this method are briefed.
Random Forest (RF) [53] is a concept of random-decision forest. RF is an ensemble learning method for classification conducting the classification process by building a number of decision trees during its training phase. The output aims at determining class tag of test instances. As a matter of fact, RF solves the problem of the decision tree over-fitting to training dataset. AdaBoost [65] can be used in combination with other learning algorithms to enhance the performance of those algorithms. The outputs of other simple learning algorithms are incorporated into weights to provide a powerful synthetic output. AdaBoost is called adaptive because the next poor-learning algorithms easily find the misclassified instances. AdaBoost is sensitive to noise and irrelevant data. IBK is a k-nearest-neighbors classifier using the attribute of distance. The number of K in the k-nearest neighbors (the default is K ¼ 1) can be clearly described. Predictions related to more than one neighbor can be assigned to different weights based on their distances from the test example. The mentioned algorithm proposes two relationships for changing distance into weight. The number of training examples holding with classifier is limited. Generally speaking, there is a data distribution description for each dataset. The data distribution description means the location of dataset examples in feature space based on features of each example. The current study is started by dividing data samples into healthy and unhealthy groups through classification task. Healthy samples are those accurately classified and unhealthy ones are those wrongly classified. In different datasets, there would be some misclassified samples or wrongly dropped samples locating near to or on the borderline between classes. This study aims at finding these borderline samples using SVDD. After the borderline samples have been identified, the process aims at resampling of borderline data samples to find a novel balanced dataset. Finally, well-known classifiers come into help to classify the novel balanced dataset. Keep in mind that the proposed method uses 80% of data as training data and 20% of data as test data. This section introduces SVDD and resampling methods along with our solution to ID classification. As mentioned before, the present method finds the borderline samples using SVDD. SVDD receives a dataset as input to determine the kernel using kernel matrix. The next step aims at finding the dataset borderline denoted by R. Then, each sample distance from kernel is calculated. It is obvious that samples near to R are called borderline samples. Borderline samples are classified into two groups: positive samples (the samples with class values equal to 1) and negative samples (the samples with class values equal to 0). According to ID features, negative samples outnumber positive ones. So, positive samples undergo O-S and negative ones undergo U-S to balance the dataset. The balanced dataset is a novel dataset. Then, the novel dataset is classified. The pseudo codes of the mentioned algorithms are described as follows. In SVDD, a constant named sigma is required as cross-sectional variable in kernel radial basis matrix. This study achieves the optimized numerical value of 23 after assigning different values to the sigma parameter. The proposed approach has summarized in Fig. 1b. SMOTE pseudo code is presented in Fig. 3. SMOTE has been described in Section 2. In the following, RUS and SVDD are shown in Fig. 3. Fig. 3 also shows the proposed algorithm composing of the three above methods.

Experimental Study
There are different methods to evaluate the classification quality. The current study uses AUC, Fmeasure and G-mean. Classification accuracy ranges from 0 to 1 meaning whether a data is accurately classified or not. Most of the classifiers determine the uncertainty with roughly estimated values. To calculate accuracy, a threshold boundary has to be defined. The average threshold boundary is 0.5. Assume that there is a classifier being able to provide correct answers for all questions. Assume threshold 0.7 leads to 100 correct answers for negative samples and threshold 0.9 results in 100 correct answers for In case of IDs, AUC is useful to call curve accuracy. The AUC values are calculated as . F-Measure uses precision and calls to retrieve information. It is obvious that the greater value of F-Measure leads to higher classification quality. Geometric mean (or G-Mean) in mathematics is an effective method to find the centroid attitude of a dataset by their values multiplication. The advantage of G-mean is that real values of members are not required to be defined. G-mean is calculated as: , where TP rate shows the true positive rate and TN rate indicates the true negative rate. To have the optimized comparison between results of different datasets, the time period needed to run various algorithms on various datasets to obtain the assumed answer is summarized as a timetable. Among various methods for statistical test, paired k-hold-out t-test has been chosen [66]. In the t-test, experimental t is calculated and compared to real t considering the confidence interval of 0.05. If estimated t is bigger than real t, there is meaningful difference.

Experiments and Analysis
This section aims at evaluating the results of the proposed calcification algorithm. The results of the proposed and the previous state-of-the-art algorithms are compared in terms of F-measure, G-mean and AUC. In the current study, some of the datasets frequently used for ID problems are experimented; including Pima, Abalone, Haberman, Housing, Phoneme, SatImage and Ionosphere. They have been studied in the previous studies. These datasets are extracted from UCI [67] and their details are given in Tab. 1.
In the following, the proposed method is compared to the previous state-of-the-art methods. Fig. 4 provides the results of the proposed method based on the evaluated measures in comparison to other methods. The compared methods are Bagging [68], AdaBoost [65], SMOTE [39], Borderline-SMOTE [39], KernelADASYN [57], RF [53], BRF [53], Under-RF [53], Over-RF [53], Asym [55], Easy [56,69] and Cascade [56,69]. Split-balancing and cluster-balancing [70] are compared in three different classification models. Borderline-SMOTE used in the paper has been the method mentioned "borsmote1" by their authors and the sampling is done so as to equally balance the both classes. According to Fig. 4, the proposed method is superior to the state-of-the-art methods in the Ionosphere, Abalone and Haberman benchmarks in terms of F-measure, G-mean and AUC. But it fails to be superior to some the state-of-the-art methods in the Housing, Pima, Phoneme and SatImage benchmarks in terms of AUC. Tab. 2 shows the results of 100 runs of the proposed algorithm on the datasets given in Tab. 1 in terms of F-measure, G-mean and AUC.
According to Tab. 2, F-measure, G-mean and AUC obtained after 100 runs of various algorithms are summarized and SVDD is known for providing the optimized mean value. Tab. 3 indicates the AUC mean obtained via split-balancing and cluster-balancing. Tab. 3 summarizes the results of each method based on RF, SMO and IBK as basic classifier.
According to Tab. 3, the proposed method provides the optimized performances for all of basic classifiers such as RF, SMO and IBK. Tab. 4 introduces the results of t-test run on the proposed method in comparison with other methods using various datasets. Let's assume the number of the methods that are meaningfully outperformed by any given method is w. Let's also assume the number of the methods that meaningfully outperform that given method is l. Each number in Tab. 4 for a method indicates its w À l ð Þ. Tab. 4 shows the relationship between methods and datasets. It is obvious that the most meaningful relationship has been found between the proposed method and datasets. Tab. 5 shows the time required to run the algorithm in comparison with other methods averaged on all datasets. According to Tab. 5, the proposed algorithm takes longer time to run in comparison with other state-of-the-art methods because it preprocesses datasets many times. Tab. 5 also summarizes mean values of F-Measure, G-mean and AUC.
Tab. 6 shows the time required to run the proposed algorithm in comparison with other state-of-the-art methods on each dataset mentioned in Tab. 1.

Conclusions and Future Work
Data mining is frequently used in various scientific fields. It has been recently developed. One of the tasks in data mining is considered to be classification. Nowadays, an obstacle that classification algorithms face is IDs. Simple classification algorithms will not be applicable if the dataset contains at least two classes, one with very numerous data samples (called also majority class) and one with a few samples (called also minority class). Two common approaches widely used to tackle with the ID problem are O-S and U-S. A shared disadvantage of all U-S methods is the elimination of useful samples. A shared drawback of O-S methods is that they can be the reason of over-fitting occurrence.
The proposed solution to the mentioned problems is borderline resampling in the current study. To accomplish the mentioned solution, the current study aims at focusing on the error-prone data samples (the samples that highly likely are misclassified). The mentioned samples are located on the borderline  between classes. To find the error-prone data samples, Support Vector Data Description (SVDD) has been employed.
Therefore, the primary aim is to find these datasets to run O-S and U-S. Finally, the new dataset can be classified using various traditional classification methods. The results are compared to the previous ones to show that the current method is superior to the previous stat-of-the-art ones. According to experimental result analysis section, the proposed algorithm provides better values in terms of F-measure, G-mean and AUC. For future studies, it is recommended to run the proposed algorithm using KNN. Advantages of this method are its simplicity, efficacy and cost-effectiveness of learning process.
Funding Statement: This study is supported by grants to HAR and HP. HAR is supported by UNSW Scientia Program Fellowship and is a member of the UNSW Graduate School of Biomedical Engineering.

Conflicts of Interest:
The authors declare that they have no conflicts of interest.