Abstract
One of most commonly occurring phenomena in ML methods is that of class imbalance, wherein one class dominates the entire class distribution in terms of frequency. Over a period of time, many methods have been proposed to deal with the issue of class imbalance. Primarily, these methods to address class imbalance can be done through either one of the combinations of the following:
-
1.
Sampling Procedure
-
2.
Algorithms
This paper analyzes the design options across both the available options in combination and provides guidance on suitable Algorithms to be used for a minority class scenario in conjunction with the right sampling technique to improve the accuracy of prediction. While the results have been obtained with an objective to optimize F1 score, the paper also analyzes the pattern of Precision and Recall values with respect to each of the algorithms under various sampling techniques. Later, the paper also explores a few loss functions for tree-based algorithms and their corresponding variations to validation measures. We use the open-source Credit Card Fraud dataset, hosted on Kaggle [1]. We will use F1 metric as the model evaluation criteria, as it is generally suited to class imbalance problems and captures trade-off between precision and recall [2]. We do report our insights on precision and recall as well in this study. We also explore a more semi-supervised approach using Autoencoders and evaluate performance with other (more traditional) Machine Learning methods. The credit card fraud dataset used here contains transactions made with credit cards in September 2013 by European cardholders. This dataset presents transactions that occurred in 2 days, where there are 492 frauds out of 284,807 transactions. The dataset is highly unbalanced, the positive class (frauds) accounts for only 0.172% of all transactions.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
https://sebastianraschka.com/faq/docs/computing-the-f1-score.html
L. Rosasco, E. D. De Vito, A. Caponnetto, M. Piana, A. Verri, Are loss functions all the same? (PDF). Neural Comput. 16(5): 1063–1076 (2004). CiteSeerX 10.1.1.109.6786. https://doi.org/10.1162/089976604773135104.pmid15070510
https://docs.microsoft.com/en-us/azure/machine-learning/studio-module-reference/smote
https://scikit-learn.org/stable/modules/model_evaluation.html#hinge-loss
https://scikit-learn.org/stable/modules/model_evaluation.html#mean-squared-error
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
https://ml-cheatsheet.readthedocs.io/en/latest/loss_functions.html
C. Wang, C. Deng, S. Wang, Leveraging weighted and focal losses for binary label-imbalanced classification with XGBoost
S. Bhattacharyya, S. Jha, K. Tharakunnel, J. Westland, Data mining for credit card fraud: a comparative study. 50(3), 602–613. Elsevier (2011)
R. Patidar, L. Sharma Int. J. Soft Comput. Eng. 1, NCAI2011 (2011)
R. M. Jamail Esmaily, Intrusion detection system based on multilayer perceptron neural networks and decision tree, in International Conference on Information and Knowledge Technology, 2015
S. Kamaruddin, V. Ravi, Credit card fraud detection using big data analytics: use of PSOAANN based one-class classification, in Proceedings of International Conference on Information Analysis, pp. 1–8, Aug 2016
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2021 Springer Nature Singapore Pte Ltd.
About this paper
Cite this paper
Tiwari, R., Sen, A., Dey, K. (2021). Design Thinking for Class Imbalance Problems Using Compound Techniques. In: Sharma, N., Chakrabarti, A., Balas, V.E., Martinovic, J. (eds) Data Management, Analytics and Innovation. Advances in Intelligent Systems and Computing, vol 1175. Springer, Singapore. https://doi.org/10.1007/978-981-15-5619-7_27
Download citation
DOI: https://doi.org/10.1007/978-981-15-5619-7_27
Published:
Publisher Name: Springer, Singapore
Print ISBN: 978-981-15-5618-0
Online ISBN: 978-981-15-5619-7
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)