Learning from Imbalanced Data Using an Evidential Undersampling-Based Ensemble

Grina, Fares; Elouedi, Zied; Lefevre, Eric

doi:10.1007/978-3-031-18843-5_16

Fares Grina^10,11,
Zied Elouedi¹⁰ &
Eric Lefevre¹¹

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 13562))

Included in the following conference series:

International Conference on Scalable Uncertainty Management

340 Accesses

Abstract

In many real-world binary classification problems, one class tends to be heavily underrepresented when it consists of far fewer observations than the other class. This results in creating a biased model with undesirable performance. Different techniques, such as undersampling, have been proposed to fix this issue. Ensemble methods have also been proven to be a good strategy to improve the performance of the resulting model in the case of class imbalance. In this paper, we propose an evidential undersampling-based ensemble approach. To alleviate the issue of losing important data, our undersampling technique assigns soft evidential labels to each majority instance, which are later used to discard only the unwanted observations, such as noisy and ambiguous examples. Finally, to improve the final results, the proposed undersampling approach is incorporated into an evidential classifier fusion-based ensemble. The comparative study against well-known ensemble methods reveal that our method is efficient according to the G-Mean and F-Score measures.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 89.00; Price excludes VAT (USA)

Softcover Book: USD 119.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Alcala-Fdez, J., et al.: Keel data-mining software tool: data set repository, integration of algorithms and experimental analysis framework. J. Multiple-Valued Log. Soft Comput. 17, 255–287 (2010)
Google Scholar
Barandela, R., Valdovinos, R.M., Sánchez, J.S.: New applications of ensembles of classifiers. Pattern Anal. Appl. 6(3), 245–256 (2003). https://doi.org/10.1007/s10044-003-0192-z
Article MathSciNet Google Scholar
Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: Smote: synthetic minority over-sampling technique. J. Artif. Intell. Res. 16, 321–357 (2002)
Article Google Scholar
Chawla, N.V., Lazarevic, A., Hall, L.O., Bowyer, K.W.: SMOTEBoost: improving prediction of the minority class in boosting. In: Lavrač, N., Gamberger, D., Todorovski, L., Blockeel, H. (eds.) PKDD 2003. LNCS (LNAI), vol. 2838, pp. 107–119. Springer, Heidelberg (2003). https://doi.org/10.1007/978-3-540-39804-2_12
Chapter Google Scholar
Dablain, D., Krawczyk, B., Chawla, N.V.: DeepSMOTE: fusing deep learning and SMOTE for imbalanced data. IEEE Trans. Neural Netw. Learn. Syst. (2022)
Google Scholar
Dempster, A.P.: A generalization of Bayesian inference. J. Roy. Stat. Soc.: Ser. B (Methodol.) 30(2), 205–232 (1968)
MathSciNet MATH Google Scholar
Douzas, G., Bacao, F., Last, F.: Improving imbalanced learning through a heuristic oversampling method based on k-means and SMOTE. Inf. Sci. 465, 1–20 (2018)
Article Google Scholar
Dubois, D., Prade, H., Smets, P.: A definition of subjective possibility. Int. J. Approximate Reasoning 48(2), 352–364 (2008)
Article MathSciNet Google Scholar
Freund, Y., Schapire, R.E.: A decision-theoretic generalization of on-line learning and an application to boosting. J. Comput. Syst. Sci. 55(1), 119–139 (1997)
Article MathSciNet Google Scholar
Fu, Y., Du, Y., Cao, Z., Li, Q., Xiang, W.: A deep learning model for network intrusion detection with imbalanced data. Electronics 11(6), 898 (2022)
Article Google Scholar
Grina, F., Elouedi, Z., Lefevre, E.: Evidential undersampling approach for imbalanced datasets with class-overlapping and noise. In: Torra, V., Narukawa, Y. (eds.) MDAI 2021. LNCS (LNAI), vol. 12898, pp. 181–192. Springer, Cham (2021). https://doi.org/10.1007/978-3-030-85529-1_15
Chapter MATH Google Scholar
Haixiang, G., Yijing, L., Shang, J., Mingyun, G., Yuanyue, H., Bing, G.: Learning from class-imbalanced data: review of methods and applications. Expert Syst. Appl. 73, 220–239 (2017)
Article Google Scholar
Han, H., Wang, W.-Y., Mao, B.-H.: Borderline-SMOTE: a new over-sampling method in imbalanced data sets learning. In: Huang, D.-S., Zhang, X.-P., Huang, G.-B. (eds.) ICIC 2005. LNCS, vol. 3644, pp. 878–887. Springer, Heidelberg (2005). https://doi.org/10.1007/11538059_91
Chapter Google Scholar
He, H., Garcia, E.A.: Learning from imbalanced data. IEEE Trans. Knowl. Data Eng. 21(9), 1263–1284 (2009)
Article Google Scholar
Huynh, T., Nibali, A., He, Z.: Semi-supervised learning for medical image classification using imbalanced training data. Comput. Methods Programs Biomed. 106628 (2022)
Google Scholar
Ivan, T.: Two modification of CNN. IEEE Trans. Syst. Man Commun. SMC 6, 769–772 (1976)
MathSciNet MATH Google Scholar
Jung, I., Ji, J., Cho, C.: EmSM: ensemble mixed sampling method for classifying imbalanced intrusion detection data. Electronics 11(9), 1346 (2022)
Article Google Scholar
Koziarski, M., Woźniak, M., Krawczyk, B.: Combined cleaning and resampling algorithm for multi-class imbalanced data with label noise. Knowl.-Based Syst. 204, 106223 (2020)
Article Google Scholar
Krawczyk, B., Galar, M., Jeleń, Ł, Herrera, F.: Evolutionary undersampling boosting for imbalanced classification of breast cancer malignancy. Appl. Soft Comput. 38, 714–726 (2016)
Article Google Scholar
Li, X., Gong, H.: Robust optimization for multilingual translation with imbalanced data. Adv. Neural Inf. Process. Syst. 34 (2021)
Google Scholar
Lin, W.C., Tsai, C.F., Hu, Y.H., Jhang, J.S.: Clustering-based undersampling in class-imbalanced data. Inf. Sci. 409–410, 17–26 (2017)
Article Google Scholar
Liu, X.Y., Wu, J., Zhou, Z.H.: Exploratory undersampling for class-imbalance learning. IEEE Trans. Syst. Man Cybern. Part B (Cybern.) 39(2), 539–550 (2008)
Google Scholar
Liu, Y., et al.: Pick and choose: a GNN-based imbalanced learning approach for fraud detection. In: Proceedings of the Web Conference 2021, pp. 3168–3177 (2021)
Google Scholar
Liu, Z.G., Pan, Q., Dezert, J., Mercier, G.: Credal classification rule for uncertain data based on belief functions. Pattern Recogn. 47(7), 2532–2541 (2014)
Article Google Scholar
Niu, J., Liu, Z.: Imbalance data classification based on belief function theory. In: Denœux, T., Lefèvre, E., Liu, Z., Pichon, F. (eds.) BELIEF 2021. LNCS (LNAI), vol. 12915, pp. 96–104. Springer, Cham (2021). https://doi.org/10.1007/978-3-030-88601-1_10
Chapter Google Scholar
Platt, J.: Probabilistic outputs for SVMs and comparisons to regularized likehood methods. In: Advances in Large Margin Classifiers. MIT Press (1999)
Google Scholar
Quost, B., Masson, M.H., Denœux, T.: Classifier fusion in the dempster-Shafer framework using optimized t-norm based combination rules. Int. J. Approximate Reasoning 52(3), 353–374 (2011)
Article MathSciNet Google Scholar
Sáez, J.A., Luengo, J., Stefanowski, J., Herrera, F.: SMOTE-IPF: addressing the noisy and borderline examples problem in imbalanced classification by a re-sampling method with filtering. Inf. Sci. 291, 184–203 (2015)
Article Google Scholar
Sağlam, F., Cengiz, M.A.: A novel smote-based resampling technique trough noise detection and the boosting procedure. Expert Syst. Appl. 200, 117023 (2022)
Google Scholar
Seiffert, C., Khoshgoftaar, T.M., Van Hulse, J., Napolitano, A.: RUSBoost: a hybrid approach to alleviating class imbalance. IEEE Trans. Syst. Man Cybern.-Part A Syst. Hum. 40(1), 185–197 (2009)
Article Google Scholar
Shafer, G.: A Mathematical Theory of Evidence, vol. 42. Princeton University Press, Princeton (1976)
Book Google Scholar
Smets, P.: The nature of the unnormalized beliefs encountered in the transferable belief model. In: Uncertainty in Artificial Intelligence, pp. 292–297. Elsevier (1992)
Google Scholar
Smets, P.: The transferable belief model for quantified belief representation. In: Smets, P. (ed.) Quantified Representation of Uncertainty and Imprecision. HDRUMS, vol. 1, pp. 267–301. Springer, Dordrecht (1998). https://doi.org/10.1007/978-94-017-1735-9_9
Chapter MATH Google Scholar
Vapnik, V.: The Nature of Statistical Learning Theory. Springer, Heidelberg (2013). https://doi.org/10.1007/978-1-4757-2440-0
Book MATH Google Scholar
Vuttipittayamongkol, P., Elyan, E., Petrovski, A.: On the class overlap problem in imbalanced data classification. Knowl.-Based Syst. 212, 106631 (2021)
Google Scholar
Wallace, B.C., Small, K., Brodley, C.E., Trikalinos, T.A.: Class imbalance, redux. In: 2011 IEEE 11th International Conference on Data Mining, pp. 754–763. IEEE (2011)
Google Scholar
Wang, S., Yao, X.: Diversity analysis on imbalanced data sets by using ensemble models. In: 2009 IEEE Symposium on Computational Intelligence and Data Mining, pp. 324–331. IEEE (2009)
Google Scholar
Wilcoxon, F.: Individual comparisons by ranking methods. In: Kotz, S., Johnson, N.L. (eds.) Breakthroughs in Statistics, pp. 196–202. Springer, New York (1992). https://doi.org/10.1007/978-1-4612-4380-9_16
Chapter Google Scholar
Wilson, D.L.: Asymptotic properties of nearest neighbor rules using edited data. IEEE Trans. Syst. Man Cybern. 3, 408–421 (1972)
Article MathSciNet Google Scholar

Download references

Author information

Authors and Affiliations

Université de Tunis, Institut Supérieur de Gestion de Tunis, LARODEC, Tunis, Tunisia
Fares Grina & Zied Elouedi
Univ. Artois, UR 3926, Laboratoire de Génie Informatique et d’Automatique de l’Artois (LGI2A), 62400, Béthune, France
Fares Grina & Eric Lefevre

Authors

Fares Grina
View author publications
You can also search for this author in PubMed Google Scholar
Zied Elouedi
View author publications
You can also search for this author in PubMed Google Scholar
Eric Lefevre
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Fares Grina .

Editor information

Editors and Affiliations

IRIT-Université Toulouse, Toulouse, France
Florence Dupin de Saint-Cyr
Université Paris-Dauphine - PSL, Paris, France
Meltem Öztürk-Escoffier
Imperial College London, London, UK
Nico Potyka

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Grina, F., Elouedi, Z., Lefevre, E. (2022). Learning from Imbalanced Data Using an Evidential Undersampling-Based Ensemble. In: Dupin de Saint-Cyr, F., Öztürk-Escoffier, M., Potyka, N. (eds) Scalable Uncertainty Management. SUM 2022. Lecture Notes in Computer Science(), vol 13562. Springer, Cham. https://doi.org/10.1007/978-3-031-18843-5_16

Download citation

DOI: https://doi.org/10.1007/978-3-031-18843-5_16
Published: 10 October 2022
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-18842-8
Online ISBN: 978-3-031-18843-5
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Learning from Imbalanced Data Using an Evidential Undersampling-Based Ensemble