Neural network with absent minority class samples and boundary shifting for imbalanced data classification

Huang, Zhan ao; Sang, Yongsheng; Sun, Yanan; Lv, Jiancheng

doi:10.1007/s00521-022-08135-y

Neural network with absent minority class samples and boundary shifting for imbalanced data classification

Original Article
Published: 08 January 2023

Volume 35, pages 8937–8953, (2023)
Cite this article

Neural Computing and Applications Aims and scope Submit manuscript

Zhan ao Huang¹,
Yongsheng Sang ORCID: orcid.org/0000-0002-6266-2638¹,
Yanan Sun¹ &
…
Jiancheng Lv¹

471 Accesses
1 Altmetric
Explore all metrics

Abstract

Neural networks handling data imbalance heavily rely on resampling or reweighting strategies. However, existing resampling and reweighting approaches mainly focus on rebalancing known data, which ignore the essence of the data imbalance problem, namely, the problem of insufficient empirical representation of the minority class caused by the small number of samples. Therefore, we propose a new solution for neural networks classifying imbalanced data by sampling absent minority class samples. Specifically, an improved Metropolis Hasting (IMH) algorithm is developed to sample absent minority class samples by collecting samples rejected by the majority class approximation process. The sampled absent minority samples are then provided to neural networks to address the data imbalance problem. For IMH, in order to accelerate the sampling process and reduce the vague class definition of the sampled minority class samples, line segment transition kernel and class probability constraint are proposed. For neural networks, two boundary shifting strategies are supported to operate on different application modes of sampled absent minority class samples. In experiments, the proposed method is validated on 34 imbalanced datasets. Comparable AUC, G-MEAN, and MACC results are achieved. These results demonstrate the effectiveness of sampling absent minority class samples for neural networks solving the imbalanced data problem.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

ND-S: an oversampling algorithm based on natural neighbor and density peaks clustering

Article 23 December 2022

Addressing the Big Data Multi-class Imbalance Problem with Oversampling and Deep Learning Neural Networks

Towards Deeper Insights into Deep Learning from Imbalanced Data

Data availability

The datasets generated during and/or analyzed during the current study are available from the corresponding author on reasonable request.

References

Patel H, Thakur G (2019) An improved fuzzy k-nearest neighbor algorithm for imbalanced data using adaptive approach. IETE J Res 65(6):780–789
Article Google Scholar
Patel H, Singh Rajput D, Thippa Reddy G, Iwendi C, Kashif Bashir A, Jo O (2020) A review on classification of imbalanced data for wireless sensor networks. Int J Distrib Sens Netw 16(4):1550147720916404
Article Google Scholar
Hannun AY, Rajpurkar P, Haghpanahi M, Tison GH, Bourn C, Turakhia MP, Ng AY (2019) Cardiologist-level arrhythmia detection and classification in ambulatory electrocardiograms using a deep neural network. Nat Med 25(1):65–69
Article Google Scholar
Polat K (2018) Similarity-based attribute weighting methods via clustering algorithms in the classification of imbalanced medical datasets. Neural Comput Appl 30(3):987–1013
Article Google Scholar
Zhang C, Song D, Chen Y, Feng X, Lumezanu C, Cheng W, Ni J, Zong B, Chen H, Chawla NV (2019) A deep neural network for unsupervised anomaly detection and diagnosis in multivariate time series data. In: Proceedings of the AAAI conference on artificial intelligence, vol 33, pp 1409–1416
Wang Z, Wang H, Chen T, Wang Z, Ma K (2021) Troubleshooting blind image quality models in the wild. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 16256–16265
Tavana M, Abtahi A-R, Di Caprio D, Poortarigh M (2018) An artificial neural network and Bayesian network model for liquidity risk assessment in banking. Neurocomputing 275:2525–2554
Article Google Scholar
Lv JC, Yi Z, Li Y (2014) Non-divergence of stochastic discrete time algorithms for pca neural networks. IEEE Trans Neural Netw Learn Syst 26(2):394–399
Article MathSciNet Google Scholar
Lv JC, Tan KK, Yi Z et al (2009) A family of fuzzy learning algorithms for robust principal component analysis neural networks. IEEE Trans Fuzzy Syst 18(1):217–226
Google Scholar
LeCun Y, Kavukcuoglu K, Farabet C (2010) Convolutional networks and applications in vision. In: Proceedings of 2010 IEEE international symposium on circuits and systems, pp 253–256. IEEE
Aydogan EK, Ozmen M, Delice Y (2019) Cbr-pso: cost-based rough particle swarm optimization approach for high-dimensional imbalanced problems. Neural Comput Appl 31(10):6345–6363
Article Google Scholar
Chan TK, Chin CS (2019) Health stages diagnostics of underwater thruster using sound features with imbalanced dataset. Neural Comput Appl 31(10):5767–5782
Article MathSciNet Google Scholar
Wei C, Sohn K, Mellina C, Yuille A, Yang F (2021) Crest: A class-rebalancing self-training framework for imbalanced semi-supervised learning. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 10857–10866
Dong Q, Gong S, Zhu X (2018) Imbalanced deep learning by minority class incremental rectification. IEEE Trans Pattern Anal Mach Intell 41(6):1367–1381
Article Google Scholar
Kim J, Jeong J, Shin J (2020) M2m: imbalanced classification via major-to-minor translation. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 13896–13905
Borisyak M, Ryzhikov A, Ustyuzhanin A, Derkach D, Ratnikov F, Mineeva O (2020) \((1+\epsilon )\)-class classification: an anomaly detection method for highly imbalanced or incomplete data sets. J Mach Learn Res 21(72):1–22
MathSciNet MATH Google Scholar
Pourhabib A, Mallick BK, Ding Y (2015) Absent data generating classifier for imbalanced class sizes. 1foldr Import 2019-10-08 Batch 4
Fernández A, García S, Galar M, Prati RC, Krawczyk B, Herrera F (2018) Learning from imbalanced data sets, vol 10. Springer, Berlin
Book Google Scholar
Kaur H, Pannu HS, Malhi AK (2019) A systematic review on imbalanced data challenges in machine learning: applications and solutions. ACM Comput Surv (CSUR) 52(4):1–36
Google Scholar
Lin T-Y, Goyal P, Girshick R, He K, Dollár P (2017) Focal loss for dense object detection. In: Proceedings of the IEEE international conference on computer vision, pp 2980–2988
Cao K, Wei C, Gaidon A, Arechiga N, Ma T (2019) Learning imbalanced datasets with label-distribution-aware margin loss. In: Advances in neural information processing systems, vol 32
Du J, Zhou Y, Liu P, Vong C-M, Wang T (2021) Parameter-free loss for class-imbalanced deep learning in image classification. IEEE Trans Neural Netw Learn Syst
Chawla NV, Bowyer KW, Hall LO, Kegelmeyer WP (2002) Smote: synthetic minority over-sampling technique. J Artif Intell Res 16:321–357
Article MATH Google Scholar
Han H, Wang W-Y, Mao B-H (2005) Borderline-smote: a new over-sampling method in imbalanced data sets learning. In: International conference on intelligent computing, pp 878–887. Springer
He H, Bai Y, Garcia EA, Li S (2008) Adasyn: adaptive synthetic sampling approach for imbalanced learning. In: 2008 IEEE international joint conference on neural networks (IEEE world congress on computational intelligence), pp 1322–1328. IEEE
Sáez JA, Luengo J, Stefanowski J, Herrera F (2015) Smote-ipf: addressing the noisy and borderline examples problem in imbalanced classification by a re-sampling method with filtering. Inf Sci 291:184–203
Article Google Scholar
Xie X, Liu H, Zeng S, Lin L, Li W (2021) A novel progressively undersampling method based on the density peaks sequence for imbalanced data. Knowl Based Syst 213:106689
Article Google Scholar
Hoyos-Osorio J, Alvarez-Meza A, Daza-Santacoloma G, Orozco-Gutierrez A, Castellanos-Dominguez G (2021) Relevant information undersampling to support imbalanced data classification. Neurocomputing 436:136–146
Article Google Scholar
Wang Z, Cao C, Zhu Y (2020) Entropy and confidence-based undersampling boosting random forests for imbalanced problems. IEEE Trans Neural Netw Learn Syst 31(12):5178–5191
Article Google Scholar
Jin L, Lazarow J, Tu Z (2017) Introspective classification with convolutional nets. In: Advances in neural information processing systems, vol 30
Andrieu C, De Freitas N, Doucet A, Jordan MI (2003) An introduction to mcmc for machine learning. Mach Learn 50(1):5–43
Article MATH Google Scholar
Huang C, Li Y, Loy CC, Tang X (2019) Deep imbalanced learning for face recognition and attribute prediction. IEEE Trans Pattern Anal Mach Intell 42(11):2781–2794
Article Google Scholar
Cui J, Zhong Z, Liu S, Yu B, Jia J (2021) Parametric contrastive learning. In: Proceedings of the IEEE/CVF international conference on computer vision, pp 715–724
Zhong Y, Deng W, Wang M, Hu J, Peng J, Tao X, Huang, Y (2019) Unequal-training for deep face recognition with long-tailed noisy data. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 7812–7821
Yang Y, Xu Z (2020) Rethinking the value of labels for improving class-imbalanced learning. Adv Neural Inf Process Syst 33:19290–19301
Google Scholar
Li T, Wang L, Wu G (2021) Self supervision to distillation for long-tailed visual recognition. In: Proceedings of the IEEE/CVF international conference on computer vision, pp 630–639
Yu W, Yang T, Chen C (2021) Towards resolving the challenge of long-tail distribution in uav images for object detection. In: Proceedings of the IEEE/CVF winter conference on applications of computer vision, pp 3258–3267
Koziarski M, Krawczyk B, Woźniak, M (2017) Radial-based approach to imbalanced data oversampling. In: International conference on hybrid artificial intelligence systems, pp 318–327. Springer
Wang X, Xu J, Zeng T, Jing L (2021) Local distribution-based adaptive minority oversampling for imbalanced data classification. Neurocomputing 422:200–213
Article Google Scholar
Goodfellow I, Pouget-Abadie J, Mirza M, Xu B, Warde-Farley D, Ozair S, Courville A, Bengio Y (2020) Generative adversarial networks. Commun ACM 63(11):139–144
Article MathSciNet Google Scholar
Ali-Gombe A, Elyan E (2019) Mfc-gan: class-imbalanced dataset classification using multiple fake class generative adversarial network. Neurocomputing 361:212–221
Article Google Scholar
Hao J, Wang C, Yang G, Gao Z, Zhang J, Zhang H (2021) Annealing genetic gan for imbalanced web data learning. IEEE Trans Multimed
Li Y, Shi Z, Liu C, Tian W, Kong Z, Williams CB (2021) Augmented time regularized generative adversarial network (atr-gan) for data augmentation in online process anomaly detection. IEEE Trans Autom Sci Eng
de Morais RF, Vasconcelos GC (2019) Boosting the performance of over-sampling algorithms through under-sampling the minority class. Neurocomputing 343:3–18
Article Google Scholar
Schroff F, Kalenichenko D, Philbin J (2015) Facenet: a unified embedding for face recognition and clustering. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 815–823
Ng WW, Zeng G, Zhang J, Yeung DS, Pedrycz W (2016) Dual autoencoders features for imbalance classification problem. Pattern Recognit 60:875–889
Article Google Scholar
Gong D, Liu L, Le V, Saha B, Mansour MR, Venkatesh S, Hengel Avd (2019) Memorizing normality to detect anomaly: memory-augmented deep autoencoder for unsupervised anomaly detection. In: Proceedings of the IEEE/CVF international conference on computer vision (ICCV)
Chang Y, Tu Z, Xie W, Yuan J (2020) Clustering driven deep autoencoder for video anomaly detection. In: European conference on computer vision, pp 329–345. Springer
Huang C, Li Y, Loy CC, Tang X (2016) Learning deep representation for imbalanced classification. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 5375–5384
Chawla NV, Lazarevic A, Hall LO, Bowyer KW (2003) Smoteboost: Improving prediction of the minority class in boosting. In: European conference on principles of data mining and knowledge discovery, pp 107–119. Springer
Liu X-Y, Wu J, Zhou Z-H (2008) Exploratory undersampling for class-imbalance learning. IEEE Trans Syst Man Cybern Part B (Cybern) 39(2):539–550
Google Scholar
Sun Y, Kamel MS, Wong AK, Wang Y (2007) Cost-sensitive boosting for classification of imbalanced data. Pattern Recognit 40(12):3358–3378
Article MATH Google Scholar
Alcalá-Fdez J, Fernández A, Luengo J, Derrac J, García S, Sánchez L, Herrera F (2011) Keel data-mining software tool: data set repository, integration of algorithms and experimental analysis framework. J Multiple-Valued Log Soft Comput 17

Download references

Acknowledgements

This work is supported by the Key Program of National Natural Science Fund of China (Grant No. 61836006), the National Key Research and Development Program of China (Grant No. 2019YFC1510705), and the Science and Technology Major Project of Sichuan province (Grant No. 2019ZDZX0006).

Author information

Authors and Affiliations

College of Computer Science, Sichuan University, Chengdu, 610065, China
Zhan ao Huang, Yongsheng Sang, Yanan Sun & Jiancheng Lv

Authors

Zhan ao Huang
View author publications
You can also search for this author in PubMed Google Scholar
Yongsheng Sang
View author publications
You can also search for this author in PubMed Google Scholar
Yanan Sun
View author publications
You can also search for this author in PubMed Google Scholar
Jiancheng Lv
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Yongsheng Sang.

Ethics declarations

Conflict of interest

The authors declare that they have no conflict of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Huang, Z.a., Sang, Y., Sun, Y. et al. Neural network with absent minority class samples and boundary shifting for imbalanced data classification. Neural Comput & Applic 35, 8937–8953 (2023). https://doi.org/10.1007/s00521-022-08135-y

Download citation

Received: 23 March 2022
Accepted: 29 November 2022
Published: 08 January 2023
Issue Date: April 2023
DOI: https://doi.org/10.1007/s00521-022-08135-y

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Neural network with absent minority class samples and boundary shifting for imbalanced data classification

Abstract

Access this article

Similar content being viewed by others

ND-S: an oversampling algorithm based on natural neighbor and density peaks clustering

Addressing the Big Data Multi-class Imbalance Problem with Oversampling and Deep Learning Neural Networks

Towards Deeper Insights into Deep Learning from Imbalanced Data

Data availability

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Neural network with absent minority class samples and boundary shifting for imbalanced data classification

Abstract

Access this article

Similar content being viewed by others

ND-S: an oversampling algorithm based on natural neighbor and density peaks clustering

Addressing the Big Data Multi-class Imbalance Problem with Oversampling and Deep Learning Neural Networks

Towards Deeper Insights into Deep Learning from Imbalanced Data

Data availability

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation