Resampling-Based Relabeling & Raking Algorithm to One-Class
Classification
- Seunghwan Park ,
- Hae-Wwan Lee ,
- Jongho Im ,
- Hae-Hwan Lee
Abstract
We consider the binary classification of imbalanced data. A dataset is
imbalanced if the proportion of classes are heavily skewed. Imbalanced
data classification is often challengeable, especially for
high-dimensional data, because unequal classes deteriorate classifier
performance. Under sampling the majority class or oversampling the
minority class are popular methods to construct balanced samples,
facilitating classification performance improvement. However, many
existing sampling methods cannot be easily extended to high-dimensional
data and mixed data, including categorical variables, because they often
require approximating the attribute distributions, which becomes another
critical issue. In this paper, we propose a new sampling strategy
employing raking and relabeling procedures, such that the attribute
values of the majority class are imputed for the values of the minority
class in the construction of balanced samples. The proposed algorithms
produce comparable performance as existing popular methods but are more
flexible regarding the data shape and attribute size. The sampling
algorithm is attractive in practice, considering that it does not
require density estimation for synthetic data generation in oversampling
and is not bothered by mixed-type variables. In addition, the proposed
sampling strategy is robust to classifiers in the sense that
classification performance is not sensitive to choosing the classifiers.