International Journal of Computational Intelligence Systems

Volume 12, Issue 2, 2019, Pages 1412 - 1422

A-SMOTE: A New Preprocessing Approach for Highly Imbalanced Datasets by Improving SMOTE

Authors
Ahmed Saad Hussein1, 2, Tianrui Li1, *, Chubato Wondaferaw Yohannese1, Kamal Bashir1
1School of Information Science and Technology, Southwest Jiaotong University, Chengdu 611756, China
2University of Information Technology and Communications, Baghdad 00964, Iraq
*Corresponding author. Email: trli@swjtu.edu.cn
Corresponding Author
Tianrui Li
Received 10 January 2019, Accepted 9 November 2019, Available Online 28 November 2019.
DOI
10.2991/ijcis.d.191114.002How to use a DOI?
Keywords
Imbalanced datasets; SMOTE; Machine learning; Oversampling; Undersampling
Abstract

Imbalance learning is a challenging task for most standard machine learning algorithms. The Synthetic Minority Oversampling Technique (SMOTE) is a well-known preprocessing approach for handling imbalanced datasets, where the minority class is oversampled by producing synthetic examples in feature vector rather than data space. However, many recent works have shown that the imbalanced ratio in itself is not a problem and deterioration of the model performance is caused by other reasons linked to the minority class sample distribution. The blind oversampling by SMOTE leads to two major problems: noise and borderline examples. Noisy examples are those from one class located in the safe zone of the other. Borderline examples are those located in the neighborhood of the class boundary. These samples are associated with deteriorating performance of the models developed. Therefore, it is critical to concentrate on the minority class data structure and regulate the positioning of the newly introduced minority class samples for better performance of classifiers. Hence, this paper proposes the advanced SMOTE, denoted as A-SMOTE, to adjust the newly introduced minority class examples based on distance to the original minority class samples. To achieve this objective, we first employ the SMOTE algorithm to introduce new samples to the minority and eliminate those examples that are closer to the majority than the minority. We apply the proposed method to 44 datasets at various imbalance ratios. Ten widely used data sampling methods selected from the literature are employed for performance comparison. The C4.5 and Naive Bayes classifiers are utilized for experimental validation. The results confirm the advantage of the proposed method over the other methods in almost all the datasets and illustrate its suitability for data preprocessing in classification tasks.

Copyright
© 2019 The Authors. Published by Atlantis Press SARL.
Open Access
This is an open access article distributed under the CC BY-NC 4.0 license (http://creativecommons.org/licenses/by-nc/4.0/).

Download article (PDF)
View full text (HTML)

Journal
International Journal of Computational Intelligence Systems
Volume-Issue
12 - 2
Pages
1412 - 1422
Publication Date
2019/11/28
ISSN (Online)
1875-6883
ISSN (Print)
1875-6891
DOI
10.2991/ijcis.d.191114.002How to use a DOI?
Copyright
© 2019 The Authors. Published by Atlantis Press SARL.
Open Access
This is an open access article distributed under the CC BY-NC 4.0 license (http://creativecommons.org/licenses/by-nc/4.0/).

Cite this article

TY  - JOUR
AU  - Ahmed Saad Hussein
AU  - Tianrui Li
AU  - Chubato Wondaferaw Yohannese
AU  - Kamal Bashir
PY  - 2019
DA  - 2019/11/28
TI  - A-SMOTE: A New Preprocessing Approach for Highly Imbalanced Datasets by Improving SMOTE
JO  - International Journal of Computational Intelligence Systems
SP  - 1412
EP  - 1422
VL  - 12
IS  - 2
SN  - 1875-6883
UR  - https://doi.org/10.2991/ijcis.d.191114.002
DO  - 10.2991/ijcis.d.191114.002
ID  - SaadHussein2019
ER  -