Skip to main content

Learning from Imbalanced Data Using an Evidential Undersampling-Based Ensemble

  • Conference paper
  • First Online:
Scalable Uncertainty Management (SUM 2022)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 13562))

Included in the following conference series:

  • 340 Accesses

Abstract

In many real-world binary classification problems, one class tends to be heavily underrepresented when it consists of far fewer observations than the other class. This results in creating a biased model with undesirable performance. Different techniques, such as undersampling, have been proposed to fix this issue. Ensemble methods have also been proven to be a good strategy to improve the performance of the resulting model in the case of class imbalance. In this paper, we propose an evidential undersampling-based ensemble approach. To alleviate the issue of losing important data, our undersampling technique assigns soft evidential labels to each majority instance, which are later used to discard only the unwanted observations, such as noisy and ambiguous examples. Finally, to improve the final results, the proposed undersampling approach is incorporated into an evidential classifier fusion-based ensemble. The comparative study against well-known ensemble methods reveal that our method is efficient according to the G-Mean and F-Score measures.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 89.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 119.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Alcala-Fdez, J., et al.: Keel data-mining software tool: data set repository, integration of algorithms and experimental analysis framework. J. Multiple-Valued Log. Soft Comput. 17, 255–287 (2010)

    Google Scholar 

  2. Barandela, R., Valdovinos, R.M., Sánchez, J.S.: New applications of ensembles of classifiers. Pattern Anal. Appl. 6(3), 245–256 (2003). https://doi.org/10.1007/s10044-003-0192-z

    Article  MathSciNet  Google Scholar 

  3. Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: Smote: synthetic minority over-sampling technique. J. Artif. Intell. Res. 16, 321–357 (2002)

    Article  Google Scholar 

  4. Chawla, N.V., Lazarevic, A., Hall, L.O., Bowyer, K.W.: SMOTEBoost: improving prediction of the minority class in boosting. In: Lavrač, N., Gamberger, D., Todorovski, L., Blockeel, H. (eds.) PKDD 2003. LNCS (LNAI), vol. 2838, pp. 107–119. Springer, Heidelberg (2003). https://doi.org/10.1007/978-3-540-39804-2_12

    Chapter  Google Scholar 

  5. Dablain, D., Krawczyk, B., Chawla, N.V.: DeepSMOTE: fusing deep learning and SMOTE for imbalanced data. IEEE Trans. Neural Netw. Learn. Syst. (2022)

    Google Scholar 

  6. Dempster, A.P.: A generalization of Bayesian inference. J. Roy. Stat. Soc.: Ser. B (Methodol.) 30(2), 205–232 (1968)

    MathSciNet  MATH  Google Scholar 

  7. Douzas, G., Bacao, F., Last, F.: Improving imbalanced learning through a heuristic oversampling method based on k-means and SMOTE. Inf. Sci. 465, 1–20 (2018)

    Article  Google Scholar 

  8. Dubois, D., Prade, H., Smets, P.: A definition of subjective possibility. Int. J. Approximate Reasoning 48(2), 352–364 (2008)

    Article  MathSciNet  Google Scholar 

  9. Freund, Y., Schapire, R.E.: A decision-theoretic generalization of on-line learning and an application to boosting. J. Comput. Syst. Sci. 55(1), 119–139 (1997)

    Article  MathSciNet  Google Scholar 

  10. Fu, Y., Du, Y., Cao, Z., Li, Q., Xiang, W.: A deep learning model for network intrusion detection with imbalanced data. Electronics 11(6), 898 (2022)

    Article  Google Scholar 

  11. Grina, F., Elouedi, Z., Lefevre, E.: Evidential undersampling approach for imbalanced datasets with class-overlapping and noise. In: Torra, V., Narukawa, Y. (eds.) MDAI 2021. LNCS (LNAI), vol. 12898, pp. 181–192. Springer, Cham (2021). https://doi.org/10.1007/978-3-030-85529-1_15

    Chapter  MATH  Google Scholar 

  12. Haixiang, G., Yijing, L., Shang, J., Mingyun, G., Yuanyue, H., Bing, G.: Learning from class-imbalanced data: review of methods and applications. Expert Syst. Appl. 73, 220–239 (2017)

    Article  Google Scholar 

  13. Han, H., Wang, W.-Y., Mao, B.-H.: Borderline-SMOTE: a new over-sampling method in imbalanced data sets learning. In: Huang, D.-S., Zhang, X.-P., Huang, G.-B. (eds.) ICIC 2005. LNCS, vol. 3644, pp. 878–887. Springer, Heidelberg (2005). https://doi.org/10.1007/11538059_91

    Chapter  Google Scholar 

  14. He, H., Garcia, E.A.: Learning from imbalanced data. IEEE Trans. Knowl. Data Eng. 21(9), 1263–1284 (2009)

    Article  Google Scholar 

  15. Huynh, T., Nibali, A., He, Z.: Semi-supervised learning for medical image classification using imbalanced training data. Comput. Methods Programs Biomed. 106628 (2022)

    Google Scholar 

  16. Ivan, T.: Two modification of CNN. IEEE Trans. Syst. Man Commun. SMC 6, 769–772 (1976)

    MathSciNet  MATH  Google Scholar 

  17. Jung, I., Ji, J., Cho, C.: EmSM: ensemble mixed sampling method for classifying imbalanced intrusion detection data. Electronics 11(9), 1346 (2022)

    Article  Google Scholar 

  18. Koziarski, M., Woźniak, M., Krawczyk, B.: Combined cleaning and resampling algorithm for multi-class imbalanced data with label noise. Knowl.-Based Syst. 204, 106223 (2020)

    Article  Google Scholar 

  19. Krawczyk, B., Galar, M., Jeleń, Ł, Herrera, F.: Evolutionary undersampling boosting for imbalanced classification of breast cancer malignancy. Appl. Soft Comput. 38, 714–726 (2016)

    Article  Google Scholar 

  20. Li, X., Gong, H.: Robust optimization for multilingual translation with imbalanced data. Adv. Neural Inf. Process. Syst. 34 (2021)

    Google Scholar 

  21. Lin, W.C., Tsai, C.F., Hu, Y.H., Jhang, J.S.: Clustering-based undersampling in class-imbalanced data. Inf. Sci. 409–410, 17–26 (2017)

    Article  Google Scholar 

  22. Liu, X.Y., Wu, J., Zhou, Z.H.: Exploratory undersampling for class-imbalance learning. IEEE Trans. Syst. Man Cybern. Part B (Cybern.) 39(2), 539–550 (2008)

    Google Scholar 

  23. Liu, Y., et al.: Pick and choose: a GNN-based imbalanced learning approach for fraud detection. In: Proceedings of the Web Conference 2021, pp. 3168–3177 (2021)

    Google Scholar 

  24. Liu, Z.G., Pan, Q., Dezert, J., Mercier, G.: Credal classification rule for uncertain data based on belief functions. Pattern Recogn. 47(7), 2532–2541 (2014)

    Article  Google Scholar 

  25. Niu, J., Liu, Z.: Imbalance data classification based on belief function theory. In: Denœux, T., Lefèvre, E., Liu, Z., Pichon, F. (eds.) BELIEF 2021. LNCS (LNAI), vol. 12915, pp. 96–104. Springer, Cham (2021). https://doi.org/10.1007/978-3-030-88601-1_10

    Chapter  Google Scholar 

  26. Platt, J.: Probabilistic outputs for SVMs and comparisons to regularized likehood methods. In: Advances in Large Margin Classifiers. MIT Press (1999)

    Google Scholar 

  27. Quost, B., Masson, M.H., Denœux, T.: Classifier fusion in the dempster-Shafer framework using optimized t-norm based combination rules. Int. J. Approximate Reasoning 52(3), 353–374 (2011)

    Article  MathSciNet  Google Scholar 

  28. Sáez, J.A., Luengo, J., Stefanowski, J., Herrera, F.: SMOTE-IPF: addressing the noisy and borderline examples problem in imbalanced classification by a re-sampling method with filtering. Inf. Sci. 291, 184–203 (2015)

    Article  Google Scholar 

  29. Sağlam, F., Cengiz, M.A.: A novel smote-based resampling technique trough noise detection and the boosting procedure. Expert Syst. Appl. 200, 117023 (2022)

    Google Scholar 

  30. Seiffert, C., Khoshgoftaar, T.M., Van Hulse, J., Napolitano, A.: RUSBoost: a hybrid approach to alleviating class imbalance. IEEE Trans. Syst. Man Cybern.-Part A Syst. Hum. 40(1), 185–197 (2009)

    Article  Google Scholar 

  31. Shafer, G.: A Mathematical Theory of Evidence, vol. 42. Princeton University Press, Princeton (1976)

    Book  Google Scholar 

  32. Smets, P.: The nature of the unnormalized beliefs encountered in the transferable belief model. In: Uncertainty in Artificial Intelligence, pp. 292–297. Elsevier (1992)

    Google Scholar 

  33. Smets, P.: The transferable belief model for quantified belief representation. In: Smets, P. (ed.) Quantified Representation of Uncertainty and Imprecision. HDRUMS, vol. 1, pp. 267–301. Springer, Dordrecht (1998). https://doi.org/10.1007/978-94-017-1735-9_9

    Chapter  MATH  Google Scholar 

  34. Vapnik, V.: The Nature of Statistical Learning Theory. Springer, Heidelberg (2013). https://doi.org/10.1007/978-1-4757-2440-0

    Book  MATH  Google Scholar 

  35. Vuttipittayamongkol, P., Elyan, E., Petrovski, A.: On the class overlap problem in imbalanced data classification. Knowl.-Based Syst. 212, 106631 (2021)

    Google Scholar 

  36. Wallace, B.C., Small, K., Brodley, C.E., Trikalinos, T.A.: Class imbalance, redux. In: 2011 IEEE 11th International Conference on Data Mining, pp. 754–763. IEEE (2011)

    Google Scholar 

  37. Wang, S., Yao, X.: Diversity analysis on imbalanced data sets by using ensemble models. In: 2009 IEEE Symposium on Computational Intelligence and Data Mining, pp. 324–331. IEEE (2009)

    Google Scholar 

  38. Wilcoxon, F.: Individual comparisons by ranking methods. In: Kotz, S., Johnson, N.L. (eds.) Breakthroughs in Statistics, pp. 196–202. Springer, New York (1992). https://doi.org/10.1007/978-1-4612-4380-9_16

    Chapter  Google Scholar 

  39. Wilson, D.L.: Asymptotic properties of nearest neighbor rules using edited data. IEEE Trans. Syst. Man Cybern. 3, 408–421 (1972)

    Article  MathSciNet  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Fares Grina .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2022 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Grina, F., Elouedi, Z., Lefevre, E. (2022). Learning from Imbalanced Data Using an Evidential Undersampling-Based Ensemble. In: Dupin de Saint-Cyr, F., Öztürk-Escoffier, M., Potyka, N. (eds) Scalable Uncertainty Management. SUM 2022. Lecture Notes in Computer Science(), vol 13562. Springer, Cham. https://doi.org/10.1007/978-3-031-18843-5_16

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-18843-5_16

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-18842-8

  • Online ISBN: 978-3-031-18843-5

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics