Skip to main content
Log in

ADA-INCVAE: Improved data generation using variational autoencoder for imbalanced classification

  • Published:
Applied Intelligence Aims and scope Submit manuscript

Abstract

Increasing the number of minority samples by data generation can effectively improve the performance of mining minority samples using a classifier in imbalanced problems. In this paper, we proposed an effective data generation algorithm for minority samples called the Adaptive Increase dimension of Variational AutoEncoder (ADA-INCVAE). Complementary to prior studies, a theoretical study is conducted from the perspective of multi-task learning to solve the posterior collapse for VAE. Afterward, by using the theoretical support, it proposed a novel training method by increasing the dimension of data to avoid the occurrence of posterior collapse. Aiming at restricting the range of synthetic data for different minority samples, an adaptive reconstruction loss weight is proposed according to the distance distribution of majority samples around the minority class samples. In the data generation stage, the generation proportion of different sample points is determined by the local information of the minority class. The experimental results based on 12 imbalanced datasets indicated that the algorithm could help the classifier to effectively improve F1-measure and G-mean, which verifies the effectiveness of synthetic data generated by ADA-INCVAE.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1
Fig. 2
Fig. 3
Fig. 4

Similar content being viewed by others

References

  1. Dong Q, Gong S, Zhu X (2018) Imbalanced deep learning by minority class incremental rectification. IEEE Trans Pattern Anal Mach Intell 1–1

  2. Sainin MS, Alfred R, Adnan F, Ahmad F (2017) Combining sampling and ensemble classifier for multiclass imbalance data learning. In: International conference on computational science and technology. Springer, pp 262–272

  3. Pouyanfar S, Chen SC (2017) Automatic video event detection for imbalance data using enhanced ensemble deep learning. Int J Semant Comput 11(01):85–109

    Article  Google Scholar 

  4. Zhang X, Han Y, Xu W, Wang Q (2019) Hoba: A novel feature engineering methodology for credit card fraud detection with a deep learning architecture. Information Sciences

  5. Le T, Vo B, Fujita H, Nguyen NT, Baik SW (2019) A fast and accurate approach for bankruptcy forecasting using squared logistics loss with gpu-based extreme gradient boosting. Information Sciences

  6. Sun JA, Li HB, Fh C, Fu BD, Ai WE (2020) Class-imbalanced dynamic financial distress prediction based on adaboost-svm ensemble combined with smote and time weighting. Inform Fusion 54:128–144

    Article  Google Scholar 

  7. Tang B, He H (2015) Kerneladasyn: Kernel based adaptive synthetic data generation for imbalanced learning. In: 2015 IEEE Congress on evolutionary computation (CEC). IEEE, pp 664–671

  8. Chawla NV, Bowyer KW, Hall LO, Kegelmeyer WP (2002) Smote: synthetic minority oversampling technique. J Artif Intell Res 16:321–357

    Article  Google Scholar 

  9. Kingma DP, Welling M (2014) Auto-encoding variational bayes. In: ICLR

  10. Han H, Wang WY, Mao BH (2005) Borderline-smote: a new oversampling method in imbalanced data sets learning. In: International conference on intelligent computing. Springer, pp 878–887

  11. Bellinger C (2016) Beyond the boundaries of SMOTE: a framework for manifold-based synthetic oversampling. Ph.D. thesis, Université d’Ottawa/University of Ottawa

  12. Li J, Fong S, Wong RK, Chu VW (2018) Adaptive multi-objective swarm fusion for imbalanced data classification. Inform Fusion 39:1–24

    Article  Google Scholar 

  13. Cervantes J, Garcia-Lamont F, Rodriguez L, López A, Castilla JR, Trueba A (2017) Pso-based method for svm classification on skewed data sets. Neurocomputing 228:187–197

    Article  Google Scholar 

  14. Raghuwanshi BS, Shukla S (2020) Smote based class-specific extreme learning machine for imbalanced learning. Knowl-Based Syst 187:104814

    Article  Google Scholar 

  15. Batista GE, Prati RC, Monard MC (2004) A study of the behavior of several methods for balancing machine learning training data. ACM SIGKDD Explor Newslett 6(1):20–29

    Article  Google Scholar 

  16. Guan H, Zhang Y, Xian M, Cheng H, Tang X (2020) Smote-wenn: Solving class imbalance and small sample problems by oversampling and distance scaling. Appl Intell 1–16

  17. Zhang C, Bi J, Xu S, Ramentol E, Fan G, Qiao B, Fujita H (2019) Multi-imbalance: an open-source software for multi-class imbalance learning. Knowl-Based Syst 174(JUN.15):137–143

    Article  Google Scholar 

  18. Kovacs G (2019) Smote-variants: a python implementation of 85 minority oversampling techniques. Neurocomputing 366(Nov.13):352–354

    Article  Google Scholar 

  19. Goodfellow JI, Pouget-Abadie J, Mirza M, Xu B, Warde-Farley D, Ozair S, Courville CA, Bengio Y (2014) Generative adversarial nets. Adv Neural Inform Process Syst 27(NIPS 2014):2672–2680

    Google Scholar 

  20. Zhang C, Zhou Y, Chen Y, Deng Y, Wang X, Dong L, Wei H (2018) Oversampling algorithm based on vae in imbalanced classification. In: International conference on cloud computing. Springer, pp 334–344

  21. Kim J, Oh TH, Lee S, Pan F, Kweon IS (2019) Variational prototyping-encoder: One-shot learning with prototypical images. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 9462–9470

  22. Mocanu DC, Mocanu E (2018) One-shot learning using mixture of variational autoencoders: a generalization learning approach. In: Proceedings of the 17th international conference on autonomous agents and multiagent systems. International Foundation for Autonomous Agents and Multiagent Systems, pp 2016–2018

  23. He J, Spokoyny D, Neubig G, Berg-Kirkpatrick T (2018) Lagging inference networks and posterior collapse in variational autoencoders. In: International conference on learning representations

  24. Zhu Q, Bi W, Liu X, Ma X, Li X, Wu D (2020) A batch normalized inference network keeps the kl vanishing away. In: Proceedings of the 58th annual meeting of the association for computational linguistics. pp 2636–2649

  25. Shen X, Su H, Niu S, Demberg V (2018) Improving variational encoder-decoders in dialogue generation. In: Proceedings of the AAAI conference on artificial intelligence, vol 32

  26. Bowman SR, Vilnis L, Vinyals O, Dai A, Bengio S (2016) Generating sentences from a continuous space. In: Proceedings of The 20th SIGNLL conference on computational natural language learning

  27. Kingma DP, Salimans T, Jozefowicz R, Chen X, Sutskever I, Welling M (2016) Improved variational inference with inverse autoregressive flow. In: Advances in neural information processing systems, pp 4743–4751

  28. Lin X, Zhen HL, Li Z, Zhang QF, Kwong S (2019) Pareto multi-task learning. In: Advances in neural information processing systems. pp 12060–12070

  29. He H, Bai Y, Garcia EA, Li S (2008) Adasyn: Adaptive synthetic sampling approach for imbalanced learning. In: 2008 IEEE International joint conference on neural networks (IEEE world congress on computational intelligence). IEEE, pp 1322–1328

  30. Douzas G, Bacao F, Last F (2018) Improving imbalanced learning through a heuristic oversampling method based on k-means and smote. Inf Sci 465:1–20

    Article  Google Scholar 

  31. Douzas G, Bacao F (2019) Geometric smote a geometrically enhanced drop-in replacement for smote. Inform Sci 501:118– 135

    Article  Google Scholar 

  32. Chen B, Xia S, Chen Z, Wang B, Wang G (2020) Rsmote: A self-adaptive robust smote for imbalanced problems with label noise. Information Sciences

  33. Higgins I, Matthey L, Pal A, Burgess CP, Glorot X, Botvinick M, Mohamed S, Lerchner A (2017) beta-vae: Learning basic visual concepts with a constrained variational framework. In: ICLR

  34. Dupont E (2018) Learning disentangled joint continuous and discrete representations. In: Proceedings of the 32nd international conference on neural information processing systems. pp 708–718

  35. van den Oord A, Vinyals O, Kavukcuoglu K (2017) Neural discrete representation learning. In: NIPS

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Kai Huang.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Huang, K., Wang, X. ADA-INCVAE: Improved data generation using variational autoencoder for imbalanced classification. Appl Intell 52, 2838–2853 (2022). https://doi.org/10.1007/s10489-021-02566-1

Download citation

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10489-021-02566-1

Keywords

Navigation