Design Thinking for Class Imbalance Problems Using Compound Techniques

Tiwari, Rajneesh; Sen, Aritra; Dey, Kaushik

doi:10.1007/978-981-15-5619-7_27

Rajneesh Tiwari¹⁸,
Aritra Sen¹⁸ &
Kaushik Dey¹⁸

Part of the book series: Advances in Intelligent Systems and Computing ((AISC,volume 1175))

838 Accesses

Abstract

One of most commonly occurring phenomena in ML methods is that of class imbalance, wherein one class dominates the entire class distribution in terms of frequency. Over a period of time, many methods have been proposed to deal with the issue of class imbalance. Primarily, these methods to address class imbalance can be done through either one of the combinations of the following:

1.
Sampling Procedure
2.
Algorithms

This paper analyzes the design options across both the available options in combination and provides guidance on suitable Algorithms to be used for a minority class scenario in conjunction with the right sampling technique to improve the accuracy of prediction. While the results have been obtained with an objective to optimize F1 score, the paper also analyzes the pattern of Precision and Recall values with respect to each of the algorithms under various sampling techniques. Later, the paper also explores a few loss functions for tree-based algorithms and their corresponding variations to validation measures. We use the open-source Credit Card Fraud dataset, hosted on Kaggle [1]. We will use F1 metric as the model evaluation criteria, as it is generally suited to class imbalance problems and captures trade-off between precision and recall [2]. We do report our insights on precision and recall as well in this study. We also explore a more semi-supervised approach using Autoencoders and evaluate performance with other (more traditional) Machine Learning methods. The credit card fraud dataset used here contains transactions made with credit cards in September 2013 by European cardholders. This dataset presents transactions that occurred in 2 days, where there are 492 frauds out of 284,807 transactions. The dataset is highly unbalanced, the positive class (frauds) accounts for only 0.172% of all transactions.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 129.00; Price excludes VAT (USA)

Softcover Book: USD 169.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

https://www.kaggle.com/mlg-ulb/creditcardfraud
https://sebastianraschka.com/faq/docs/computing-the-f1-score.html
L. Rosasco, E. D. De Vito, A. Caponnetto, M. Piana, A. Verri, Are loss functions all the same? (PDF). Neural Comput. 16(5): 1063–1076 (2004). CiteSeerX 10.1.1.109.6786. https://doi.org/10.1162/089976604773135104.pmid15070510
https://docs.microsoft.com/en-us/azure/machine-learning/studio-module-reference/smote
https://scikit-learn.org/stable/modules/tree.html#tree
https://scikit-learn.org/stable/modules/model_evaluation.html#hinge-loss
https://scikit-learn.org/stable/modules/model_evaluation.html#mean-squared-error
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
https://ml-cheatsheet.readthedocs.io/en/latest/loss_functions.html
C. Wang, C. Deng, S. Wang, Leveraging weighted and focal losses for binary label-imbalanced classification with XGBoost
Google Scholar
S. Bhattacharyya, S. Jha, K. Tharakunnel, J. Westland, Data mining for credit card fraud: a comparative study. 50(3), 602–613. Elsevier (2011)
Google Scholar
R. Patidar, L. Sharma Int. J. Soft Comput. Eng. 1, NCAI2011 (2011)
Google Scholar
R. M. Jamail Esmaily, Intrusion detection system based on multilayer perceptron neural networks and decision tree, in International Conference on Information and Knowledge Technology, 2015
Google Scholar
S. Kamaruddin, V. Ravi, Credit card fraud detection using big data analytics: use of PSOAANN based one-class classification, in Proceedings of International Conference on Information Analysis, pp. 1–8, Aug 2016
Google Scholar

Download references

Author information

Authors and Affiliations

Ericsson, Kolkata, India
Rajneesh Tiwari, Aritra Sen & Kaushik Dey

Authors

Rajneesh Tiwari
View author publications
You can also search for this author in PubMed Google Scholar
Aritra Sen
View author publications
You can also search for this author in PubMed Google Scholar
Kaushik Dey
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Aritra Sen .

Editor information

Editors and Affiliations

Society for Data Science, Pune, Maharashtra, India
Neha Sharma
A.K. Choudhury School of Information Technology, University of Calcutta, Kolkata, West Bengal, India
Amlan Chakrabarti
Department of Automatics and Applied Software, Faculty of Engineering, University of Arad, Arad, Romania
Valentina Emilia Balas
IT4Innovations, VSB-Technical University of Ostrava, Ostrava, Czech Republic
Jan Martinovic

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Tiwari, R., Sen, A., Dey, K. (2021). Design Thinking for Class Imbalance Problems Using Compound Techniques. In: Sharma, N., Chakrabarti, A., Balas, V.E., Martinovic, J. (eds) Data Management, Analytics and Innovation. Advances in Intelligent Systems and Computing, vol 1175. Springer, Singapore. https://doi.org/10.1007/978-981-15-5619-7_27

Download citation

DOI: https://doi.org/10.1007/978-981-15-5619-7_27
Published: 19 September 2020
Publisher Name: Springer, Singapore
Print ISBN: 978-981-15-5618-0
Online ISBN: 978-981-15-5619-7
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)

Publish with us

Policies and ethics