Genetic-Novelty Oversampling Technique for Imbalanced Data

Ait Addi, Hajar; Ezzahir, Redouane; Boukhlik, Nouhaila

doi:10.1007/978-3-031-28387-1_16

Hajar Ait Addi¹⁴,
Redouane Ezzahir¹⁴ &
Nouhaila Boukhlik¹⁴

Part of the book series: Lecture Notes in Networks and Systems ((LNNS,volume 625))

Included in the following conference series:

International Conference On Big Data and Internet of Things

253 Accesses

Abstract

Imbalance data is in important topic vexed researchers in practice of classification problems. A data is imbalanced if the distributions of categories are not approximately equally represented. The class with small samples is called minority class, while the other classes form the majority class. Standard learning classifiers tend to misclassify the minority samples; they assume that the distribution of data is relatively balanced. However in real world application, the corrected prediction of minority samples is more valuable than correctly classify samples belonging to the majority class. In this paper, we propose GNOT a novel oversampling strategy that combines algorithm genetic concept and novelty detection technique to generate consistent with the original distribution of the minority class while avoiding outliers. We tested GNOT on seven real-world imbalanced datasets. Our experimental analysis shows that GNOT can effectively improve the performance of classifiers in terms of G-mean and F1-measure.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 229.00; Price excludes VAT (USA)

Softcover Book: USD 299.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Alcalá-Fdez, J., et al.: KEEL: a software tool to assess evolutionary algorithms for data mining problems. Soft. Comput. 13, 307–318 (2008). https://doi.org/10.1007/s00500-008-0323-y
Article Google Scholar
At, E., Aljourf, M., Al-Mohanna, F., Shoukri, M.R.: Classification of imbalance data using Tomek link(T-Link) combined with random under-sampling (RUS) as a data reduction method (2016)
Google Scholar
Baatar, N., Zhang, D., Koh, C.: An improved differential evolution algorithm adopting \(\lambda \) -best mutation strategy for global optimization of electromagnetic devices. IEEE Trans. Magn. 49(5), 2097–2100 (2013)
Article Google Scholar
Bernard, T., Nakib, A.: Adaptive ECG signal filtering using Bayesian based evolutionary algorithm. In: Metaheuristics for Medicine and Biology, pp. 187–211 (2017). https://doi.org/10.1007/978-3-662-54428-0_11
Breunig, M.M., Kriegel, H., Ng, R.T., Sander, J.: LOF: identifying density-based local outliers. In: Proceedings of the 2000 ACM SIGMOD International Conference on Management of Data, 16–18 May 2000, Dallas, Texas, USA, pp. 93–104 (2000)
Google Scholar
Cervantes, J., Li, X., Yu, W.: Using genetic algorithm to improve classification accuracy on imbalanced data. In: 2013 IEEE International Conference on Systems, Man, and Cybernetics, pp. 2659–2664, October 2013
Google Scholar
Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: SMOTE: synthetic minority over-sampling technique. J. Artif. Intell. Res. 16, 321–357 (2002)
Article MATH Google Scholar
Desforges, M.J., Jacob, P.J., Ball, A.D.: Fault detection in rotating machinery using kernel-based probability density estimation. Int. J. Syst. Sci. 31(11), 1411–1426 (2000)
Article MATH Google Scholar
Dua, D., Graff, C.: UCI machine learning repository (2017). http://archive.ics.uci.edu/ml
Estabrooks, A., Jo, T., Japkowicz, N.: A multiple resampling method for learning from imbalanced data sets. Comput. Intell. 20, 18–36 (2004)
Article MathSciNet Google Scholar
Guan, D., Yuan, W., Lee, Y., Lee, S.: Nearest neighbor editing aided by unlabeled data. Inf. Sci. 179(13), 2273–2282 (2009)
Article Google Scholar
Han, H., Wang, W.-Y., Mao, B.-H.: Borderline-SMOTE: a new over-sampling method in imbalanced data sets learning. In: Huang, D.-S., Zhang, X.-P., Huang, G.-B. (eds.) ICIC 2005. LNCS, vol. 3644, pp. 878–887. Springer, Heidelberg (2005). https://doi.org/10.1007/11538059_91
Chapter Google Scholar
Hawkins, D.M.: Identification of Outliers. Monographs on Applied Probability and Statistics, Springer, Cham (1980). https://doi.org/10.1007/978-94-015-3994-4
Book MATH Google Scholar
Jiang, K., Lu, J., Xia, K.: A novel algorithm for imbalance data classification based on genetic algorithm improved smote. Arab. J. Sci. Eng. 41, 3255–3266 (2016)
Article Google Scholar
Karia, V., Zhang, W., Naeim, A., Ramezani, R.: Gensample: a genetic algorithm for oversampling in imbalanced datasets. CoRR abs/1910.10806 (2019)
Google Scholar
Kotsiantis, S.B., Kanellopoulos, D., Pintelas, P.E.: Handling imbalanced datasets: a review (2006)
Google Scholar
Kubat, M., Holte, R.C., Matwin, S.: Machine learning for the detection of oil spills in satellite radar images. Mach. Learn. 30, 195–215 (1998)
Article Google Scholar
Laza, R., Pavón, R., Reboiro-Jato, M., Fdez-Riverola, F.: Evaluating the effect of unbalanced data in biomedical document classification. J. Integr. Bioinform. 8(3), 105–117 (2011)
Article Google Scholar
Lemaître, G., Nogueira, F., Aridas, C.K.: Imbalanced-learn: a python toolbox to tackle the curse of imbalanced datasets in machine learning. J. Mach. Learn. Res. 18(17), 1–5 (2017). http://jmlr.org/papers/v18/16-365.html
Li, Y., Guo, H., Zhang, Q., Mingyun, G., Yang, J.: Imbalanced text sentiment classification using universal and domain-specific knowledge. Knowl.-Based Syst. 160, 1–15 (2018)
Article Google Scholar
Markou, M., Singh, S.: Novelty detection: a review - part 1: statistical approaches. Sig. Process. 83(12), 2481–2497 (2003)
Article MATH Google Scholar
Mena, L.J., Gonzalez, J.A.: Machine learning for imbalanced datasets: application in medical diagnostic. In: FLAIRS Conference (2006)
Google Scholar
Miljkovic, D.: Review of novelty detection methods. In: The 33rd International Convention MIPRO, pp. 593–598, May 2010
Google Scholar
Pedregosa, F., Varoquaux, G., Gramfort, A., Michel, V., Thirion, B., Grisel, O., Blondel, M., Prettenhofer, P., Weiss, R., Dubourg, V., Vanderplas, J., Passos, A., Cournapeau, D., Brucher, M., Perrot, M., Duchesnay, E.: Scikit-learn: machine learning in python. J. Mach. Learn. Res. 12, 2825–2830 (2011)
Google Scholar
Rout, N., Mishra, D., Mallick, M.K.: Handling imbalanced data: a survey. In: Reddy, M.S., Viswanath, K., K.M., S.P. (eds.) International Proceedings on Advances in Soft Computing, Intelligent Systems and Applications. AISC, vol. 628, pp. 431–443. Springer, Singapore (2018). https://doi.org/10.1007/978-981-10-5272-9_39
Chapter Google Scholar
Phua, C., Alahakoon, D., Lee, V.C.S.: Minority report in fraud detection: classification of skewed data. SIGKDD Explor. 6, 50–59 (2004)
Article Google Scholar
Saladi, P.S.M., Dash, T.: Genetic algorithm-based oversampling technique to learn from imbalanced data. In: Bansal, J.C., Das, K.N., Nagar, A., Deep, K., Ojha, A.K. (eds.) Soft Computing for Problem Solving, pp. 387–397. Springer Singapore, Singapore (2019). https://doi.org/10.1007/978-981-13-1592-3_30
Chapter Google Scholar
Tomasev, N., Mladenic, D.: Class imbalance and the curse of minority hubs. Knowl.-Based Syst. 53, 157–172 (2013)
Article Google Scholar
V., C.N.: Data mining for imbalanced datasets: an overview. In: Maimon, O., Rokach, L. (eds.) Data Mining and Knowledge Discovery Handbook. Springer, Boston, pp. 853–867. Springer, Boston (2005). https://doi.org/10.1007/978-0-387-09823-4_45
VALUATIONS, E.: A review on evaluation metrics for data classification evaluations (2015)
Google Scholar
Wilson, D.L.: Asymptotic properties of nearest neighbor rules using edited data. IEEE Trans. Syst. Man Cybern. 2(3), 408–421 (1972)
Article MathSciNet MATH Google Scholar
Wright, A.H.: Genetic algorithms for real parameter optimization. In: Proceedings of the First Workshop on Foundations of Genetic Algorithms. Bloomington Campus, Indiana, USA, 15–18 July 1990, pp. 205–218 (1990)
Google Scholar
Zewdu, T., HiLCoE, T.B.: Prediction of HIV status in Addis Ababa using data mining technology (2015)
Google Scholar

Download references

Author information

Authors and Affiliations

MAISI/ENSA, University of Ibn Zohr, Agadir, Morocco
Hajar Ait Addi, Redouane Ezzahir & Nouhaila Boukhlik

Authors

Hajar Ait Addi
View author publications
You can also search for this author in PubMed Google Scholar
Redouane Ezzahir
View author publications
You can also search for this author in PubMed Google Scholar
Nouhaila Boukhlik
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Hajar Ait Addi .

Editor information

Editors and Affiliations

ENSIAS, Mohammed V University, Rabat, Morocco
Mohamed Lazaar
FST, Abdelmalek Essaâdi University, Tangier, Morocco
El Mokhtar En-Naimi
FST, Abdelmalek Essaâdi University, Tangier, Morocco
Abdelhamid Zouhair
ENSA, Abdelmalek Essaâdi University, Tetuan, Morocco
Mohammed Al Achhab
ENSA, Abdelmalek Essaadi University, Tetouan, Morocco
Oussama Mahboub

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Ait Addi, H., Ezzahir, R., Boukhlik, N. (2023). Genetic-Novelty Oversampling Technique for Imbalanced Data. In: Lazaar, M., En-Naimi, E.M., Zouhair, A., Al Achhab, M., Mahboub, O. (eds) Proceedings of the 6th International Conference on Big Data and Internet of Things. BDIoT 2022. Lecture Notes in Networks and Systems, vol 625. Springer, Cham. https://doi.org/10.1007/978-3-031-28387-1_16

Download citation

DOI: https://doi.org/10.1007/978-3-031-28387-1_16
Published: 29 March 2023
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-28386-4
Online ISBN: 978-3-031-28387-1
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)

Publish with us

Policies and ethics

Genetic-Novelty Oversampling Technique for Imbalanced Data