ADA-INCVAE: Improved data generation using variational autoencoder for imbalanced classification

Huang, Kai; Wang, Xiaoguo

doi:10.1007/s10489-021-02566-1

ADA-INCVAE: Improved data generation using variational autoencoder for imbalanced classification

Published: 24 June 2021

Volume 52, pages 2838–2853, (2022)
Cite this article

Applied Intelligence Aims and scope Submit manuscript

1079 Accesses
16 Citations
Explore all metrics

Abstract

Increasing the number of minority samples by data generation can effectively improve the performance of mining minority samples using a classifier in imbalanced problems. In this paper, we proposed an effective data generation algorithm for minority samples called the Adaptive Increase dimension of Variational AutoEncoder (ADA-INCVAE). Complementary to prior studies, a theoretical study is conducted from the perspective of multi-task learning to solve the posterior collapse for VAE. Afterward, by using the theoretical support, it proposed a novel training method by increasing the dimension of data to avoid the occurrence of posterior collapse. Aiming at restricting the range of synthetic data for different minority samples, an adaptive reconstruction loss weight is proposed according to the distance distribution of majority samples around the minority class samples. In the data generation stage, the generation proportion of different sample points is determined by the local information of the minority class. The experimental results based on 12 imbalanced datasets indicated that the algorithm could help the classifier to effectively improve F1-measure and G-mean, which verifies the effectiveness of synthetic data generated by ADA-INCVAE.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Conditional Variational Autoencoder-Based Sampling

Synthesizing Data Using Variational Autoencoders for Handling Class Imbalanced Deep Learning

D-AE: A Discriminant Encode-Decode Nets for Data Generation

References

Dong Q, Gong S, Zhu X (2018) Imbalanced deep learning by minority class incremental rectification. IEEE Trans Pattern Anal Mach Intell 1–1
Sainin MS, Alfred R, Adnan F, Ahmad F (2017) Combining sampling and ensemble classifier for multiclass imbalance data learning. In: International conference on computational science and technology. Springer, pp 262–272
Pouyanfar S, Chen SC (2017) Automatic video event detection for imbalance data using enhanced ensemble deep learning. Int J Semant Comput 11(01):85–109
Article Google Scholar
Zhang X, Han Y, Xu W, Wang Q (2019) Hoba: A novel feature engineering methodology for credit card fraud detection with a deep learning architecture. Information Sciences
Le T, Vo B, Fujita H, Nguyen NT, Baik SW (2019) A fast and accurate approach for bankruptcy forecasting using squared logistics loss with gpu-based extreme gradient boosting. Information Sciences
Sun JA, Li HB, Fh C, Fu BD, Ai WE (2020) Class-imbalanced dynamic financial distress prediction based on adaboost-svm ensemble combined with smote and time weighting. Inform Fusion 54:128–144
Article Google Scholar
Tang B, He H (2015) Kerneladasyn: Kernel based adaptive synthetic data generation for imbalanced learning. In: 2015 IEEE Congress on evolutionary computation (CEC). IEEE, pp 664–671
Chawla NV, Bowyer KW, Hall LO, Kegelmeyer WP (2002) Smote: synthetic minority oversampling technique. J Artif Intell Res 16:321–357
Article Google Scholar
Kingma DP, Welling M (2014) Auto-encoding variational bayes. In: ICLR
Han H, Wang WY, Mao BH (2005) Borderline-smote: a new oversampling method in imbalanced data sets learning. In: International conference on intelligent computing. Springer, pp 878–887
Bellinger C (2016) Beyond the boundaries of SMOTE: a framework for manifold-based synthetic oversampling. Ph.D. thesis, Université d’Ottawa/University of Ottawa
Li J, Fong S, Wong RK, Chu VW (2018) Adaptive multi-objective swarm fusion for imbalanced data classification. Inform Fusion 39:1–24
Article Google Scholar
Cervantes J, Garcia-Lamont F, Rodriguez L, López A, Castilla JR, Trueba A (2017) Pso-based method for svm classification on skewed data sets. Neurocomputing 228:187–197
Article Google Scholar
Raghuwanshi BS, Shukla S (2020) Smote based class-specific extreme learning machine for imbalanced learning. Knowl-Based Syst 187:104814
Article Google Scholar
Batista GE, Prati RC, Monard MC (2004) A study of the behavior of several methods for balancing machine learning training data. ACM SIGKDD Explor Newslett 6(1):20–29
Article Google Scholar
Guan H, Zhang Y, Xian M, Cheng H, Tang X (2020) Smote-wenn: Solving class imbalance and small sample problems by oversampling and distance scaling. Appl Intell 1–16
Zhang C, Bi J, Xu S, Ramentol E, Fan G, Qiao B, Fujita H (2019) Multi-imbalance: an open-source software for multi-class imbalance learning. Knowl-Based Syst 174(JUN.15):137–143
Article Google Scholar
Kovacs G (2019) Smote-variants: a python implementation of 85 minority oversampling techniques. Neurocomputing 366(Nov.13):352–354
Article Google Scholar
Goodfellow JI, Pouget-Abadie J, Mirza M, Xu B, Warde-Farley D, Ozair S, Courville CA, Bengio Y (2014) Generative adversarial nets. Adv Neural Inform Process Syst 27(NIPS 2014):2672–2680
Google Scholar
Zhang C, Zhou Y, Chen Y, Deng Y, Wang X, Dong L, Wei H (2018) Oversampling algorithm based on vae in imbalanced classification. In: International conference on cloud computing. Springer, pp 334–344
Kim J, Oh TH, Lee S, Pan F, Kweon IS (2019) Variational prototyping-encoder: One-shot learning with prototypical images. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 9462–9470
Mocanu DC, Mocanu E (2018) One-shot learning using mixture of variational autoencoders: a generalization learning approach. In: Proceedings of the 17th international conference on autonomous agents and multiagent systems. International Foundation for Autonomous Agents and Multiagent Systems, pp 2016–2018
He J, Spokoyny D, Neubig G, Berg-Kirkpatrick T (2018) Lagging inference networks and posterior collapse in variational autoencoders. In: International conference on learning representations
Zhu Q, Bi W, Liu X, Ma X, Li X, Wu D (2020) A batch normalized inference network keeps the kl vanishing away. In: Proceedings of the 58th annual meeting of the association for computational linguistics. pp 2636–2649
Shen X, Su H, Niu S, Demberg V (2018) Improving variational encoder-decoders in dialogue generation. In: Proceedings of the AAAI conference on artificial intelligence, vol 32
Bowman SR, Vilnis L, Vinyals O, Dai A, Bengio S (2016) Generating sentences from a continuous space. In: Proceedings of The 20th SIGNLL conference on computational natural language learning
Kingma DP, Salimans T, Jozefowicz R, Chen X, Sutskever I, Welling M (2016) Improved variational inference with inverse autoregressive flow. In: Advances in neural information processing systems, pp 4743–4751
Lin X, Zhen HL, Li Z, Zhang QF, Kwong S (2019) Pareto multi-task learning. In: Advances in neural information processing systems. pp 12060–12070
He H, Bai Y, Garcia EA, Li S (2008) Adasyn: Adaptive synthetic sampling approach for imbalanced learning. In: 2008 IEEE International joint conference on neural networks (IEEE world congress on computational intelligence). IEEE, pp 1322–1328
Douzas G, Bacao F, Last F (2018) Improving imbalanced learning through a heuristic oversampling method based on k-means and smote. Inf Sci 465:1–20
Article Google Scholar
Douzas G, Bacao F (2019) Geometric smote a geometrically enhanced drop-in replacement for smote. Inform Sci 501:118– 135
Article Google Scholar
Chen B, Xia S, Chen Z, Wang B, Wang G (2020) Rsmote: A self-adaptive robust smote for imbalanced problems with label noise. Information Sciences
Higgins I, Matthey L, Pal A, Burgess CP, Glorot X, Botvinick M, Mohamed S, Lerchner A (2017) beta-vae: Learning basic visual concepts with a constrained variational framework. In: ICLR
Dupont E (2018) Learning disentangled joint continuous and discrete representations. In: Proceedings of the 32nd international conference on neural information processing systems. pp 708–718
van den Oord A, Vinyals O, Kavukcuoglu K (2017) Neural discrete representation learning. In: NIPS

Download references

Author information

Authors and Affiliations

College of Electronic and Information Engineering, Tongji University, Shanghai, 201804, China
Kai Huang & Xiaoguo Wang

Authors

Kai Huang
View author publications
You can also search for this author in PubMed Google Scholar
Xiaoguo Wang
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Kai Huang.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Huang, K., Wang, X. ADA-INCVAE: Improved data generation using variational autoencoder for imbalanced classification. Appl Intell 52, 2838–2853 (2022). https://doi.org/10.1007/s10489-021-02566-1

Download citation

Accepted: 25 May 2021
Published: 24 June 2021
Issue Date: February 2022
DOI: https://doi.org/10.1007/s10489-021-02566-1

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

ADA-INCVAE: Improved data generation using variational autoencoder for imbalanced classification

Abstract

Access this article

Similar content being viewed by others

Conditional Variational Autoencoder-Based Sampling

Synthesizing Data Using Variational Autoencoders for Handling Class Imbalanced Deep Learning

D-AE: A Discriminant Encode-Decode Nets for Data Generation

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher’s note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

ADA-INCVAE: Improved data generation using variational autoencoder for imbalanced classification

Abstract

Access this article

Similar content being viewed by others

Conditional Variational Autoencoder-Based Sampling

Synthesizing Data Using Variational Autoencoders for Handling Class Imbalanced Deep Learning

D-AE: A Discriminant Encode-Decode Nets for Data Generation

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher’s note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation