Skip to main content
Log in

Neural network with absent minority class samples and boundary shifting for imbalanced data classification

  • Original Article
  • Published:
Neural Computing and Applications Aims and scope Submit manuscript

Abstract

Neural networks handling data imbalance heavily rely on resampling or reweighting strategies. However, existing resampling and reweighting approaches mainly focus on rebalancing known data, which ignore the essence of the data imbalance problem, namely, the problem of insufficient empirical representation of the minority class caused by the small number of samples. Therefore, we propose a new solution for neural networks classifying imbalanced data by sampling absent minority class samples. Specifically, an improved Metropolis Hasting (IMH) algorithm is developed to sample absent minority class samples by collecting samples rejected by the majority class approximation process. The sampled absent minority samples are then provided to neural networks to address the data imbalance problem. For IMH, in order to accelerate the sampling process and reduce the vague class definition of the sampled minority class samples, line segment transition kernel and class probability constraint are proposed. For neural networks, two boundary shifting strategies are supported to operate on different application modes of sampled absent minority class samples. In experiments, the proposed method is validated on 34 imbalanced datasets. Comparable AUC, G-MEAN, and MACC results are achieved. These results demonstrate the effectiveness of sampling absent minority class samples for neural networks solving the imbalanced data problem.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5

Similar content being viewed by others

Data availability

The datasets generated during and/or analyzed during the current study are available from the corresponding author on reasonable request.

References

  1. Patel H, Thakur G (2019) An improved fuzzy k-nearest neighbor algorithm for imbalanced data using adaptive approach. IETE J Res 65(6):780–789

    Article  Google Scholar 

  2. Patel H, Singh Rajput D, Thippa Reddy G, Iwendi C, Kashif Bashir A, Jo O (2020) A review on classification of imbalanced data for wireless sensor networks. Int J Distrib Sens Netw 16(4):1550147720916404

    Article  Google Scholar 

  3. Hannun AY, Rajpurkar P, Haghpanahi M, Tison GH, Bourn C, Turakhia MP, Ng AY (2019) Cardiologist-level arrhythmia detection and classification in ambulatory electrocardiograms using a deep neural network. Nat Med 25(1):65–69

    Article  Google Scholar 

  4. Polat K (2018) Similarity-based attribute weighting methods via clustering algorithms in the classification of imbalanced medical datasets. Neural Comput Appl 30(3):987–1013

    Article  Google Scholar 

  5. Zhang C, Song D, Chen Y, Feng X, Lumezanu C, Cheng W, Ni J, Zong B, Chen H, Chawla NV (2019) A deep neural network for unsupervised anomaly detection and diagnosis in multivariate time series data. In: Proceedings of the AAAI conference on artificial intelligence, vol 33, pp 1409–1416

  6. Wang Z, Wang H, Chen T, Wang Z, Ma K (2021) Troubleshooting blind image quality models in the wild. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 16256–16265

  7. Tavana M, Abtahi A-R, Di Caprio D, Poortarigh M (2018) An artificial neural network and Bayesian network model for liquidity risk assessment in banking. Neurocomputing 275:2525–2554

    Article  Google Scholar 

  8. Lv JC, Yi Z, Li Y (2014) Non-divergence of stochastic discrete time algorithms for pca neural networks. IEEE Trans Neural Netw Learn Syst 26(2):394–399

    Article  MathSciNet  Google Scholar 

  9. Lv JC, Tan KK, Yi Z et al (2009) A family of fuzzy learning algorithms for robust principal component analysis neural networks. IEEE Trans Fuzzy Syst 18(1):217–226

    Google Scholar 

  10. LeCun Y, Kavukcuoglu K, Farabet C (2010) Convolutional networks and applications in vision. In: Proceedings of 2010 IEEE international symposium on circuits and systems, pp 253–256. IEEE

  11. Aydogan EK, Ozmen M, Delice Y (2019) Cbr-pso: cost-based rough particle swarm optimization approach for high-dimensional imbalanced problems. Neural Comput Appl 31(10):6345–6363

    Article  Google Scholar 

  12. Chan TK, Chin CS (2019) Health stages diagnostics of underwater thruster using sound features with imbalanced dataset. Neural Comput Appl 31(10):5767–5782

    Article  MathSciNet  Google Scholar 

  13. Wei C, Sohn K, Mellina C, Yuille A, Yang F (2021) Crest: A class-rebalancing self-training framework for imbalanced semi-supervised learning. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 10857–10866

  14. Dong Q, Gong S, Zhu X (2018) Imbalanced deep learning by minority class incremental rectification. IEEE Trans Pattern Anal Mach Intell 41(6):1367–1381

    Article  Google Scholar 

  15. Kim J, Jeong J, Shin J (2020) M2m: imbalanced classification via major-to-minor translation. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 13896–13905

  16. Borisyak M, Ryzhikov A, Ustyuzhanin A, Derkach D, Ratnikov F, Mineeva O (2020) \((1+\epsilon )\)-class classification: an anomaly detection method for highly imbalanced or incomplete data sets. J Mach Learn Res 21(72):1–22

    MathSciNet  MATH  Google Scholar 

  17. Pourhabib A, Mallick BK, Ding Y (2015) Absent data generating classifier for imbalanced class sizes. 1foldr Import 2019-10-08 Batch 4

  18. Fernández A, García S, Galar M, Prati RC, Krawczyk B, Herrera F (2018) Learning from imbalanced data sets, vol 10. Springer, Berlin

    Book  Google Scholar 

  19. Kaur H, Pannu HS, Malhi AK (2019) A systematic review on imbalanced data challenges in machine learning: applications and solutions. ACM Comput Surv (CSUR) 52(4):1–36

    Google Scholar 

  20. Lin T-Y, Goyal P, Girshick R, He K, Dollár P (2017) Focal loss for dense object detection. In: Proceedings of the IEEE international conference on computer vision, pp 2980–2988

  21. Cao K, Wei C, Gaidon A, Arechiga N, Ma T (2019) Learning imbalanced datasets with label-distribution-aware margin loss. In: Advances in neural information processing systems, vol 32

  22. Du J, Zhou Y, Liu P, Vong C-M, Wang T (2021) Parameter-free loss for class-imbalanced deep learning in image classification. IEEE Trans Neural Netw Learn Syst

  23. Chawla NV, Bowyer KW, Hall LO, Kegelmeyer WP (2002) Smote: synthetic minority over-sampling technique. J Artif Intell Res 16:321–357

    Article  MATH  Google Scholar 

  24. Han H, Wang W-Y, Mao B-H (2005) Borderline-smote: a new over-sampling method in imbalanced data sets learning. In: International conference on intelligent computing, pp 878–887. Springer

  25. He H, Bai Y, Garcia EA, Li S (2008) Adasyn: adaptive synthetic sampling approach for imbalanced learning. In: 2008 IEEE international joint conference on neural networks (IEEE world congress on computational intelligence), pp 1322–1328. IEEE

  26. Sáez JA, Luengo J, Stefanowski J, Herrera F (2015) Smote-ipf: addressing the noisy and borderline examples problem in imbalanced classification by a re-sampling method with filtering. Inf Sci 291:184–203

    Article  Google Scholar 

  27. Xie X, Liu H, Zeng S, Lin L, Li W (2021) A novel progressively undersampling method based on the density peaks sequence for imbalanced data. Knowl Based Syst 213:106689

    Article  Google Scholar 

  28. Hoyos-Osorio J, Alvarez-Meza A, Daza-Santacoloma G, Orozco-Gutierrez A, Castellanos-Dominguez G (2021) Relevant information undersampling to support imbalanced data classification. Neurocomputing 436:136–146

    Article  Google Scholar 

  29. Wang Z, Cao C, Zhu Y (2020) Entropy and confidence-based undersampling boosting random forests for imbalanced problems. IEEE Trans Neural Netw Learn Syst 31(12):5178–5191

    Article  Google Scholar 

  30. Jin L, Lazarow J, Tu Z (2017) Introspective classification with convolutional nets. In: Advances in neural information processing systems, vol 30

  31. Andrieu C, De Freitas N, Doucet A, Jordan MI (2003) An introduction to mcmc for machine learning. Mach Learn 50(1):5–43

    Article  MATH  Google Scholar 

  32. Huang C, Li Y, Loy CC, Tang X (2019) Deep imbalanced learning for face recognition and attribute prediction. IEEE Trans Pattern Anal Mach Intell 42(11):2781–2794

    Article  Google Scholar 

  33. Cui J, Zhong Z, Liu S, Yu B, Jia J (2021) Parametric contrastive learning. In: Proceedings of the IEEE/CVF international conference on computer vision, pp 715–724

  34. Zhong Y, Deng W, Wang M, Hu J, Peng J, Tao X, Huang, Y (2019) Unequal-training for deep face recognition with long-tailed noisy data. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 7812–7821

  35. Yang Y, Xu Z (2020) Rethinking the value of labels for improving class-imbalanced learning. Adv Neural Inf Process Syst 33:19290–19301

    Google Scholar 

  36. Li T, Wang L, Wu G (2021) Self supervision to distillation for long-tailed visual recognition. In: Proceedings of the IEEE/CVF international conference on computer vision, pp 630–639

  37. Yu W, Yang T, Chen C (2021) Towards resolving the challenge of long-tail distribution in uav images for object detection. In: Proceedings of the IEEE/CVF winter conference on applications of computer vision, pp 3258–3267

  38. Koziarski M, Krawczyk B, Woźniak, M (2017) Radial-based approach to imbalanced data oversampling. In: International conference on hybrid artificial intelligence systems, pp 318–327. Springer

  39. Wang X, Xu J, Zeng T, Jing L (2021) Local distribution-based adaptive minority oversampling for imbalanced data classification. Neurocomputing 422:200–213

    Article  Google Scholar 

  40. Goodfellow I, Pouget-Abadie J, Mirza M, Xu B, Warde-Farley D, Ozair S, Courville A, Bengio Y (2020) Generative adversarial networks. Commun ACM 63(11):139–144

    Article  MathSciNet  Google Scholar 

  41. Ali-Gombe A, Elyan E (2019) Mfc-gan: class-imbalanced dataset classification using multiple fake class generative adversarial network. Neurocomputing 361:212–221

    Article  Google Scholar 

  42. Hao J, Wang C, Yang G, Gao Z, Zhang J, Zhang H (2021) Annealing genetic gan for imbalanced web data learning. IEEE Trans Multimed

  43. Li Y, Shi Z, Liu C, Tian W, Kong Z, Williams CB (2021) Augmented time regularized generative adversarial network (atr-gan) for data augmentation in online process anomaly detection. IEEE Trans Autom Sci Eng

  44. de Morais RF, Vasconcelos GC (2019) Boosting the performance of over-sampling algorithms through under-sampling the minority class. Neurocomputing 343:3–18

    Article  Google Scholar 

  45. Schroff F, Kalenichenko D, Philbin J (2015) Facenet: a unified embedding for face recognition and clustering. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 815–823

  46. Ng WW, Zeng G, Zhang J, Yeung DS, Pedrycz W (2016) Dual autoencoders features for imbalance classification problem. Pattern Recognit 60:875–889

    Article  Google Scholar 

  47. Gong D, Liu L, Le V, Saha B, Mansour MR, Venkatesh S, Hengel Avd (2019) Memorizing normality to detect anomaly: memory-augmented deep autoencoder for unsupervised anomaly detection. In: Proceedings of the IEEE/CVF international conference on computer vision (ICCV)

  48. Chang Y, Tu Z, Xie W, Yuan J (2020) Clustering driven deep autoencoder for video anomaly detection. In: European conference on computer vision, pp 329–345. Springer

  49. Huang C, Li Y, Loy CC, Tang X (2016) Learning deep representation for imbalanced classification. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 5375–5384

  50. Chawla NV, Lazarevic A, Hall LO, Bowyer KW (2003) Smoteboost: Improving prediction of the minority class in boosting. In: European conference on principles of data mining and knowledge discovery, pp 107–119. Springer

  51. Liu X-Y, Wu J, Zhou Z-H (2008) Exploratory undersampling for class-imbalance learning. IEEE Trans Syst Man Cybern Part B (Cybern) 39(2):539–550

    Google Scholar 

  52. Sun Y, Kamel MS, Wong AK, Wang Y (2007) Cost-sensitive boosting for classification of imbalanced data. Pattern Recognit 40(12):3358–3378

    Article  MATH  Google Scholar 

  53. Alcalá-Fdez J, Fernández A, Luengo J, Derrac J, García S, Sánchez L, Herrera F (2011) Keel data-mining software tool: data set repository, integration of algorithms and experimental analysis framework. J Multiple-Valued Log Soft Comput 17

Download references

Acknowledgements

This work is supported by the Key Program of National Natural Science Fund of China (Grant No. 61836006), the National Key Research and Development Program of China (Grant No. 2019YFC1510705), and the Science and Technology Major Project of Sichuan province (Grant No. 2019ZDZX0006).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Yongsheng Sang.

Ethics declarations

Conflict of interest

The authors declare that they have no conflict of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Huang, Z.a., Sang, Y., Sun, Y. et al. Neural network with absent minority class samples and boundary shifting for imbalanced data classification. Neural Comput & Applic 35, 8937–8953 (2023). https://doi.org/10.1007/s00521-022-08135-y

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00521-022-08135-y

Keywords

Navigation