Abstract
In many real-world binary classification problems, one class tends to be heavily underrepresented when it consists of far fewer observations than the other class. This results in creating a biased model with undesirable performance. Different techniques, such as undersampling, have been proposed to fix this issue. Ensemble methods have also been proven to be a good strategy to improve the performance of the resulting model in the case of class imbalance. In this paper, we propose an evidential undersampling-based ensemble approach. To alleviate the issue of losing important data, our undersampling technique assigns soft evidential labels to each majority instance, which are later used to discard only the unwanted observations, such as noisy and ambiguous examples. Finally, to improve the final results, the proposed undersampling approach is incorporated into an evidential classifier fusion-based ensemble. The comparative study against well-known ensemble methods reveal that our method is efficient according to the G-Mean and F-Score measures.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Alcala-Fdez, J., et al.: Keel data-mining software tool: data set repository, integration of algorithms and experimental analysis framework. J. Multiple-Valued Log. Soft Comput. 17, 255–287 (2010)
Barandela, R., Valdovinos, R.M., Sánchez, J.S.: New applications of ensembles of classifiers. Pattern Anal. Appl. 6(3), 245–256 (2003). https://doi.org/10.1007/s10044-003-0192-z
Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: Smote: synthetic minority over-sampling technique. J. Artif. Intell. Res. 16, 321–357 (2002)
Chawla, N.V., Lazarevic, A., Hall, L.O., Bowyer, K.W.: SMOTEBoost: improving prediction of the minority class in boosting. In: Lavrač, N., Gamberger, D., Todorovski, L., Blockeel, H. (eds.) PKDD 2003. LNCS (LNAI), vol. 2838, pp. 107–119. Springer, Heidelberg (2003). https://doi.org/10.1007/978-3-540-39804-2_12
Dablain, D., Krawczyk, B., Chawla, N.V.: DeepSMOTE: fusing deep learning and SMOTE for imbalanced data. IEEE Trans. Neural Netw. Learn. Syst. (2022)
Dempster, A.P.: A generalization of Bayesian inference. J. Roy. Stat. Soc.: Ser. B (Methodol.) 30(2), 205–232 (1968)
Douzas, G., Bacao, F., Last, F.: Improving imbalanced learning through a heuristic oversampling method based on k-means and SMOTE. Inf. Sci. 465, 1–20 (2018)
Dubois, D., Prade, H., Smets, P.: A definition of subjective possibility. Int. J. Approximate Reasoning 48(2), 352–364 (2008)
Freund, Y., Schapire, R.E.: A decision-theoretic generalization of on-line learning and an application to boosting. J. Comput. Syst. Sci. 55(1), 119–139 (1997)
Fu, Y., Du, Y., Cao, Z., Li, Q., Xiang, W.: A deep learning model for network intrusion detection with imbalanced data. Electronics 11(6), 898 (2022)
Grina, F., Elouedi, Z., Lefevre, E.: Evidential undersampling approach for imbalanced datasets with class-overlapping and noise. In: Torra, V., Narukawa, Y. (eds.) MDAI 2021. LNCS (LNAI), vol. 12898, pp. 181–192. Springer, Cham (2021). https://doi.org/10.1007/978-3-030-85529-1_15
Haixiang, G., Yijing, L., Shang, J., Mingyun, G., Yuanyue, H., Bing, G.: Learning from class-imbalanced data: review of methods and applications. Expert Syst. Appl. 73, 220–239 (2017)
Han, H., Wang, W.-Y., Mao, B.-H.: Borderline-SMOTE: a new over-sampling method in imbalanced data sets learning. In: Huang, D.-S., Zhang, X.-P., Huang, G.-B. (eds.) ICIC 2005. LNCS, vol. 3644, pp. 878–887. Springer, Heidelberg (2005). https://doi.org/10.1007/11538059_91
He, H., Garcia, E.A.: Learning from imbalanced data. IEEE Trans. Knowl. Data Eng. 21(9), 1263–1284 (2009)
Huynh, T., Nibali, A., He, Z.: Semi-supervised learning for medical image classification using imbalanced training data. Comput. Methods Programs Biomed. 106628 (2022)
Ivan, T.: Two modification of CNN. IEEE Trans. Syst. Man Commun. SMC 6, 769–772 (1976)
Jung, I., Ji, J., Cho, C.: EmSM: ensemble mixed sampling method for classifying imbalanced intrusion detection data. Electronics 11(9), 1346 (2022)
Koziarski, M., Woźniak, M., Krawczyk, B.: Combined cleaning and resampling algorithm for multi-class imbalanced data with label noise. Knowl.-Based Syst. 204, 106223 (2020)
Krawczyk, B., Galar, M., Jeleń, Ł, Herrera, F.: Evolutionary undersampling boosting for imbalanced classification of breast cancer malignancy. Appl. Soft Comput. 38, 714–726 (2016)
Li, X., Gong, H.: Robust optimization for multilingual translation with imbalanced data. Adv. Neural Inf. Process. Syst. 34 (2021)
Lin, W.C., Tsai, C.F., Hu, Y.H., Jhang, J.S.: Clustering-based undersampling in class-imbalanced data. Inf. Sci. 409–410, 17–26 (2017)
Liu, X.Y., Wu, J., Zhou, Z.H.: Exploratory undersampling for class-imbalance learning. IEEE Trans. Syst. Man Cybern. Part B (Cybern.) 39(2), 539–550 (2008)
Liu, Y., et al.: Pick and choose: a GNN-based imbalanced learning approach for fraud detection. In: Proceedings of the Web Conference 2021, pp. 3168–3177 (2021)
Liu, Z.G., Pan, Q., Dezert, J., Mercier, G.: Credal classification rule for uncertain data based on belief functions. Pattern Recogn. 47(7), 2532–2541 (2014)
Niu, J., Liu, Z.: Imbalance data classification based on belief function theory. In: Denœux, T., Lefèvre, E., Liu, Z., Pichon, F. (eds.) BELIEF 2021. LNCS (LNAI), vol. 12915, pp. 96–104. Springer, Cham (2021). https://doi.org/10.1007/978-3-030-88601-1_10
Platt, J.: Probabilistic outputs for SVMs and comparisons to regularized likehood methods. In: Advances in Large Margin Classifiers. MIT Press (1999)
Quost, B., Masson, M.H., Denœux, T.: Classifier fusion in the dempster-Shafer framework using optimized t-norm based combination rules. Int. J. Approximate Reasoning 52(3), 353–374 (2011)
Sáez, J.A., Luengo, J., Stefanowski, J., Herrera, F.: SMOTE-IPF: addressing the noisy and borderline examples problem in imbalanced classification by a re-sampling method with filtering. Inf. Sci. 291, 184–203 (2015)
Sağlam, F., Cengiz, M.A.: A novel smote-based resampling technique trough noise detection and the boosting procedure. Expert Syst. Appl. 200, 117023 (2022)
Seiffert, C., Khoshgoftaar, T.M., Van Hulse, J., Napolitano, A.: RUSBoost: a hybrid approach to alleviating class imbalance. IEEE Trans. Syst. Man Cybern.-Part A Syst. Hum. 40(1), 185–197 (2009)
Shafer, G.: A Mathematical Theory of Evidence, vol. 42. Princeton University Press, Princeton (1976)
Smets, P.: The nature of the unnormalized beliefs encountered in the transferable belief model. In: Uncertainty in Artificial Intelligence, pp. 292–297. Elsevier (1992)
Smets, P.: The transferable belief model for quantified belief representation. In: Smets, P. (ed.) Quantified Representation of Uncertainty and Imprecision. HDRUMS, vol. 1, pp. 267–301. Springer, Dordrecht (1998). https://doi.org/10.1007/978-94-017-1735-9_9
Vapnik, V.: The Nature of Statistical Learning Theory. Springer, Heidelberg (2013). https://doi.org/10.1007/978-1-4757-2440-0
Vuttipittayamongkol, P., Elyan, E., Petrovski, A.: On the class overlap problem in imbalanced data classification. Knowl.-Based Syst. 212, 106631 (2021)
Wallace, B.C., Small, K., Brodley, C.E., Trikalinos, T.A.: Class imbalance, redux. In: 2011 IEEE 11th International Conference on Data Mining, pp. 754–763. IEEE (2011)
Wang, S., Yao, X.: Diversity analysis on imbalanced data sets by using ensemble models. In: 2009 IEEE Symposium on Computational Intelligence and Data Mining, pp. 324–331. IEEE (2009)
Wilcoxon, F.: Individual comparisons by ranking methods. In: Kotz, S., Johnson, N.L. (eds.) Breakthroughs in Statistics, pp. 196–202. Springer, New York (1992). https://doi.org/10.1007/978-1-4612-4380-9_16
Wilson, D.L.: Asymptotic properties of nearest neighbor rules using edited data. IEEE Trans. Syst. Man Cybern. 3, 408–421 (1972)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2022 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Grina, F., Elouedi, Z., Lefevre, E. (2022). Learning from Imbalanced Data Using an Evidential Undersampling-Based Ensemble. In: Dupin de Saint-Cyr, F., Öztürk-Escoffier, M., Potyka, N. (eds) Scalable Uncertainty Management. SUM 2022. Lecture Notes in Computer Science(), vol 13562. Springer, Cham. https://doi.org/10.1007/978-3-031-18843-5_16
Download citation
DOI: https://doi.org/10.1007/978-3-031-18843-5_16
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-18842-8
Online ISBN: 978-3-031-18843-5
eBook Packages: Computer ScienceComputer Science (R0)