Abstract
The predictive performance of machine learning models tends to deteriorate in the presence of class imbalance. Multiple strategies have been proposed to address this issue. A popular strategy consists of oversampling the minority class. Classic approaches such as SMOTE utilize techniques like nearest neighbor search and linear interpolation, which can pose difficulties when dealing with datasets that have a large number of dimensions and intricate data distributions. As a way to create synthetic examples in the minority class, Generative Adversarial Networks (GANs) have been suggested as an alternative technique due to their ability to simulate complex data distributions. However, most GAN-based oversampling methods tend to ignore data uncertainty. In this paper, we propose a novel GAN-based oversampling method using evidence theory. An auxiliary evidential classifier is incorporated in the GAN architecture in order to guide the training process of the generative model. The objective is to push GAN to generate minority objects at the borderline of the minority class, near difficult-to-classify objects. Through extensive analysis, we demonstrate that the proposed approach provides better performance, compared to other popular methods.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Asuncion, A., Newman, D.: UCI machine learning repository (2007)
Aung, M.H., Seluka, P.T., Fuata, J.T.R., Tikoisuva, M.J., Cabealawa, M.S., Nand, R.: Random forest classifier for detecting credit card fraud based on performance metrics. In: 2020 IEEE Asia-Pacific Conference on Computer Science and Data Engineering (CSDE), pp. 1–6 (2020). https://doi.org/10.1109/CSDE50874.2020.9411563
Barandela, R., Valdovinos, R.M., Sánchez, J.S.: New applications of ensembles of classifiers. Pattern Anal. Appl. 6(3), 245–256 (2003)
Bradley, A.P.: The use of the area under the roc curve in the evaluation of machine learning algorithms. Pattern Recognit. 30(7), 1145–1159 (1997)
Breiman, L.: Random forests. Mach. Learn. 45(1), 5–32 (2001)
Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: Smote: synthetic minority over-sampling technique. J. Artif. Intell. Res. 16, 321–357 (2002)
Chen, J., Pi, D., Wu, Z., Zhao, X., Pan, Y., Zhang, Q.: Imbalanced satellite telemetry data anomaly detection model based on Bayesian LSTM. Acta Astronaut. 180, 232–242 (2021)
Cover, T., Hart, P.: Nearest neighbor pattern classification. IEEE Trans. Inf. Theory 13(1), 21–27 (1967)
Cui, J., Zong, L., Xie, J., Tang, M.: A novel multi-module integrated intrusion detection system for high-dimensional imbalanced data. Appl. Intell. 53(1), 272–288 (2023)
Dempster, A.P.: A generalization of Bayesian inference. J. R. Stat. Soc. Ser. B (Methodol.) 30(2), 205–232 (1968)
Engelmann, J., Lessmann, S.: Conditional Wasserstein GAN-based oversampling of tabular data for imbalanced learning. Expert Syst. Appl. 174, 114582 (2021)
Fernández, A., García, S., Galar, M., Prati, R.C., Krawczyk, B., Herrera, F.: Learning from Imbalanced Data Sets. Springer, Cham (2018). https://doi.org/10.1007/978-3-319-98074-4
Goodfellow, I., et al.: Generative adversarial nets. In: Ghahramani, Z., Welling, M., Cortes, C., Lawrence, N., Weinberger, K. (eds.) Advances in Neural Information Processing Systems, vol. 27, pp. 2672–2680 (2014)
Grina, F., Elouedi, Z., Lefevre, E.: Re-sampling of multi-class imbalanced data using belief function theory and ensemble learning. Int. J. Approx. Reason. 156, 1–15 (2023)
Han, H., Wang, W.-Y., Mao, B.-H.: Borderline-SMOTE: a new over-sampling method in imbalanced data sets learning. In: Huang, D.-S., Zhang, X.-P., Huang, G.-B. (eds.) ICIC 2005. LNCS, vol. 3644, pp. 878–887. Springer, Heidelberg (2005). https://doi.org/10.1007/11538059_91
He, H., Bai, Y., Garcia, E.A., Li, S.: ADASYN: adaptive synthetic sampling approach for imbalanced learning. In: 2008 IEEE International Joint Conference on Neural Networks (IEEE World Congress on Computational Intelligence), pp. 1322–1328. IEEE (2008)
Jøsang, A.: Subjective Logic: A Formalism for Reasoning Under Uncertainty. Artificial Intelligence: Foundations, Theory, and Algorithms, 1st ed. Springer Publishing Company, Cham (2016). Incorporated, https://doi.org/10.1007/978-3-319-42337-1
Kingma, D.P., Ba, J.: Adam: a method for stochastic optimization. In: ICLR (2015)
Koziarski, M., Bellinger, C., Woźniak, M.: RB-CCR: radial-based combined cleaning and resampling algorithm for imbalanced data classification. Mach. Learn. 110, 3059–3093 (2021)
Lee, K., Lee, H., Lee, K., Shin, J.: Training confidence-calibrated classifiers for detecting out-of-distribution samples. arXiv preprint arXiv:1711.09325 (2017)
Lemaître, G., Nogueira, F., Aridas, C.K.: Imbalanced-learn: a python toolbox to tackle the curse of imbalanced datasets in machine learning. J. Mach. Learn. Res. 18(1), 559–563 (2017)
Li, D., Zheng, C., Zhao, J., Liu, Y.: Diagnosis of heart failure from imbalance datasets using multi-level classification. Biomed. Signal Process. Control 81, 104538 (2023)
Li, X., Metsis, V., Wang, H., Ngu, A.H.H.: TTS-GAN: a transformer-based time-series generative adversarial network. In: Michalowski, M., Abidi, S.S.R., Abidi, S. (eds.) Artificial Intelligence in Medicine. AIME 2022. LNCS, vol. 13263, pp. 133–143. Springer, Cham (2022). https://doi.org/10.1007/978-3-031-09342-5_13
Mirza, M., Osindero, S.: Conditional generative adversarial nets. arXiv preprint arXiv:1411.1784 (2014)
Mottini, A., Lheritier, A., Acuna-Agost, R.: Airline passenger name record generation using generative adversarial networks. arXiv preprint arXiv:1807.06657 (2018)
Mroueh, Y., Sercu, T.: Fisher GAN. Adv. Neural. Inf. Process. Syst. 30, 2513–2523 (2017)
Odena, A., Olah, C., Shlens, J.: Conditional image synthesis with auxiliary classifier GANs. In: International Conference on Machine Learning, pp. 2642–2651. PMLR (2017)
Park, N., Mohammadi, M., Gorde, K., Jajodia, S., Park, H., Kim, Y.: Data synthesis based on generative adversarial networks. arXiv preprint arXiv:1806.03384 (2018)
Paszke, A., et al.: Pytorch: an imperative style, high-performance deep learning library. In: Advances in Neural Information Processing Systems, vol. 32, pp. 8024–8035. Curran Associates, Inc. (2019). https://papers.neurips.cc/paper/9015-pytorch-an-imperative-style-high-performance-deep-learning-library.pdf
Pedregosa, F., et al.: Scikit-learn: machine learning in Python. J. Mach. Learn. Res. 12, 2825–2830 (2011)
Sensoy, M., Kaplan, L., Kandemir, M.: Evidential deep learning to quantify classification uncertainty. Adv. Neural Inf. Process. Syst. 31 (2018)
Shafer, G.: A Mathematical Theory of Evidence, vol. 42. Princeton University Press, Princeton (1976)
Smets, P.: The transferable belief model for quantified belief representation. In: Smets, P. (ed.) Quantified Representation of Uncertainty and Imprecision. HDRUMS, vol. 1, pp. 267–301. Springer, Dordrecht (1998). https://doi.org/10.1007/978-94-017-1735-9_9
Torbunov, D., et al.: UVCGAN: unet vision transformer cycle-consistent GAN for unpaired image-to-image translation. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, pp. 702–712 (2023)
Vuttipittayamongkol, P., Elyan, E., Petrovski, A.: On the class overlap problem in imbalanced data classification. Knowl.-Based Syst. 212 (2020)
Wang, R., Fu, B., Fu, G., Wang, M.: Deep & cross network for ad click predictions. In: Proceedings of the ADKDD’17, pp. 1–7 (2017)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2024 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Grina, F., Elouedi, Z., Lefevre, E. (2024). Evidential Generative Adversarial Networks for Handling Imbalanced Learning. In: Bouraoui, Z., Vesic, S. (eds) Symbolic and Quantitative Approaches to Reasoning with Uncertainty. ECSQARU 2023. Lecture Notes in Computer Science(), vol 14294. Springer, Cham. https://doi.org/10.1007/978-3-031-45608-4_20
Download citation
DOI: https://doi.org/10.1007/978-3-031-45608-4_20
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-45607-7
Online ISBN: 978-3-031-45608-4
eBook Packages: Computer ScienceComputer Science (R0)