Abstract
Class imbalance frequently arises in the context of image classification. Conventional generative adversarial networks (GANs) have a tendency to produce samples from the majority class when trained on class-imbalanced datasets. To address this issue, the Balancing GAN with gradient penalty (BAGAN-GP) has been proposed, but the outcomes may still exhibit a bias toward the majority categories when the similarity between images from different categories is substantial. In this study, we introduce a novel approach called the Pre-trained Gated Variational Autoencoder with Self-attention for Balancing Generative Adversarial Network (SGBGAN) as an image augmentation technique for generating high-quality images. The proposed method utilizes a Gated Variational Autoencoder with Self-attention (SA-GVAE) to initialize the GAN and transfers pre-trained SA-GVAE weights to the GAN. Our experimental results on Fashion-MNIST, CIFAR-10, and a highly unbalanced medical image dataset demonstrate that the SGBGAN outperforms other state-of-the-art methods. Results on Fréchet inception distance (FID) and structural similarity measures (SSIM) show that our model overcomes the instability problems that exist in other GANs. Especially on the Cells dataset, the FID of a minority class increases up to 23.09% compared to the latest BAGAN-GP, and the SSIM of a minority class increases up to 10.81%. It is proved that SGBGAN overcomes the class imbalance restriction and generates high-quality minority class images.
Graphical abstract
The diagram provides an overview of the technical approach employed in this research paper. To address the issue of class imbalance within the dataset, a novel technique called the Gated Variational Autoencoder with Self-attention (SA-GVAE) is proposed. This SA-GVAE is utilized to initialize the Generative Adversarial Network (GAN), with the pre-trained weights from SA-GVAE being transferred to the GAN. Consequently, a Pre-trained Gated Variational Autoencoder with Self-attention for Balancing GAN (SGBGAN) is formed, serving as an image augmentation tool to generate high-quality images. Ultimately, the generation of minority samples is employed to restore class balance within the dataset.
Similar content being viewed by others
References
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2015)
Xiao, H., Rasul, K., Vollgraf, R.: Fashion-mnist: a novel image dataset for benchmarking machine learning algorithms. arXiv preprint arXiv:1708.07747 (2017)
Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009)
Deng, J., Dong, W., Socher, R., Li, L.-J., Li, K., Fei-Fei, L.: Imagenet: A large-scale hierarchical image database. In: 2009 IEEE Conference on Computer Vision and Pattern Recognition, pp. 248–255 (2009)
Braytee, A., Liu, W., Anaissi, A., Kennedy, P.J.: Correlated multi-label classification with incomplete label space and class imbalance. ACM Trans. Intell. Syst. Technol. (TIST) 10, 1–26 (2019)
Johnson, J.M., Khoshgoftaar, T.M.: Survey on deep learning with class imbalance. J. Big Data 6(1), 1–54 (2019)
Rezaei, M., Uemura, T., Näppi, J., Yoshida, H., Lippert, C., Meinel, C.: Generative synthetic adversarial network for internal bias correction and handling class imbalance problem in medical image diagnosis. In: Medical Imaging 2020: Computer-Aided Diagnosis, vol. 11314, pp. 82–89. SPIE (2020)
Arjovsky, M., Bottou, L.: Towards principled methods for training generative adversarial networks. arXiv preprint arXiv:1701.04862 (2017)
Dauphin, Y.N., Fan, A., Auli, M., Grangier, D.: Language modeling with gated convolutional networks. In: Proceedings of the 34th International Conference on Machine Learning-Volume 70, pp. 933–941 (2017)
Adiga, N., Pantazis, Y., Tsiaras, V., Stylianou, Y.: Speech enhancement for noise-robust speech synthesis using wasserstein gan. In: INTERSPEECH, pp. 1821–1825 (2019)
Zhang, H., Goodfellow, I.J., Metaxas, D.N., Odena, A.: Self-attention generative adversarial networks. arXiv:1805.08318 (2018)
Gurunlu, B., Ozturk, S.: Efficient approach for block-based copy-move forgery detection. In: Smart Trends in Computing and Communications: Proceedings of SmartCom 2021, pp. 167–174. Springer (2022)
Mariani, G., Scheidegger, F., Istrate, R., Bekas, C., Malossi, A.C.I.: Bagan: Data augmentation with balancing gan. arXiv:1803.09655 (2018)
Huang, G., Jafari, A.H.: Enhanced balancing gan: minority-class image generation. Neural Comput. Appl. 35, 5145–5154 (2023)
Zhang, M., Xiao, T.Z., Paige, B., Barber, D.: Improving vae-based representation learning. arXiv preprint arXiv:2205.14539 (2022)
Taghanaki, S.A., Havaei, M., Lamb, A., Sanghi, A., Danielyan, A., Custis, T.: Jigsaw-vae: Towards balancing features in variational autoencoders. arXiv:2005.05496 (2020)
Zheng, Y., Ma, Y., Tian, C.: Tmrn-glu: A transformer-based automatic classification recognition network improved by gate linear unit. Electronics 11(10), 1554 (2022)
Li, Z., Jin, Y., Li, Y., Lin, Z., Wang, S.: Imbalanced adversarial learning for weather image generation and classification. In: 2018 14th IEEE International Conference on Signal Processing (ICSP), pp. 1093–1097 (2018)
Shoohi, L.M., Saud, J.H.: Dcgan for handling imbalanced malaria dataset based on over-sampling technique and using cnn. Med. Leg. Update 20, 1079–1085 (2020)
Wang, Q., Zhou, X., Wang, C., Liu, Z., Huang, J., Zhou, Y., Li, C., Zhuang, H., Cheng, J.-Z.: Wgan-based synthetic minority over-sampling technique: Improving semantic fine-grained classification for lung nodules in ct images. IEEE Access 7, 18450–18463 (2019)
Rai, H., Shukla, N.: Unpaired image-to-image translation using cycle-consistent adversarial networks (2018)
Balasubramanian, R., Sowmya, V., Gopalakrishnan, E.A., Menon, V.K., Variyar, V.V.S., Soman, K.P.: Analysis of adversarial based augmentation for diabetic retinopathy disease grading. In: 2020 11th International Conference on Computing, Communication and Networking Technologies (ICCCNT), pp. 1–5 (2020)
Waheed, A., Goyal, M., Gupta, D., Khanna, A., Al-turjman, F., Pinheiro, P.R.: Covidgan: Data augmentation using auxiliary classifier gan for improved covid-19 detection. IEEE Access 8, 91916–91923 (2020)
Sampath, V., Maurtua, I., Martín, J.J.A., Gutierrez, A.: A survey on generative adversarial networks for imbalance problems in computer vision tasks. J. Big Data 8 (2020)
Chen, J., Tam, D., Raffel, C., Bansal, M., Yang, D.: An empirical survey of data augmentation for limited data learning in nlp. Trans. Assoc. Comput. Linguist. 11, 191–211 (2023)
Xu, M., Yoon, S., Fuentes, A., Park, D.S.: A comprehensive survey of image augmentation techniques for deep learning. Pattern Recognit. 109347 (2023)
Zheng, M., Li, T., Zhu, R., Tang, Y., Tang, M., Lin, L., Ma, Z.: Conditional wasserstein generative adversarial network-gradient penalty-based approach to alleviating imbalanced data classification. Inf. Sci. 512, 1009–1023 (2020)
Dai, W., Li, D., Tang, D., Wang, H., Peng, Y.: Deep learning approach for defective spot welds classification using small and class-imbalanced datasets. Neurocomputing 477, 46–60 (2022)
Xu, M., Chen, Y., Wang, Y., Wang, D., Liu, Z., Zhang, L.: Bwgan-gp: An eeg data generation method for class imbalance problem in rsvp tasks. IEEE Trans. Neural Syst. Rehabil. Eng. 30, 251–263 (2022)
Ding, N., Zhang, G., Zhang, L., Shen, Z., Yin, L., Zhou, S., Deng, Y.: Engineering an ai-based forward-reverse platform for the design of cross-ribosome binding sites of a transcription factor biosensor. Comput. Struct. Biotechnol. J. 21, 2929–2939 (2023)
Snoussi, R., Youssef, H.: Vae-based latent representations learning for botnet detection in iot networks. J. Netw. Syst. Manag. 31(1), 4 (2023)
Zhao, H., Jia, J., Koltun, V.: Exploring self-attention for image recognition. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 10076–10085 (2020)
Dauphin, Y., Fan, A., Auli, M., Grangier, D.: Language modeling with gated convolutional networks. In: International Conference on Machine Learning (2016)
Yao, Y., Wangr, X.L., Ma, Y., Fang, H., Wei, J., Chen, L., Anaissi, A., Braytee, A.: Conditional variational autoencoder with balanced pre-training for generative adversarial networks. In: 2022 IEEE 9th International Conference on Data Science and Advanced Analytics (DSAA), pp. 1–10 (2022)
Goodfellow, I.J., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A.C., Bengio, Y.: Generative adversarial nets. In: NIPS (2014)
Kodali, N., Hays, J., Abernethy, J.D., Kira, Z.: On convergence and stability of gans. arXiv:Artificial Intelligence (2018)
Arjovsky, M., Chintala, S., Bottou, L.: Wasserstein generative adversarial networks. In: International Conference on Machine Learning (2017)
Willmott, C.J., Matsuura, K.: Advantages of the mean absolute error (mae) over the root mean square error (rmse) in assessing average model performance. Clim. Res. 30(1), 79–82 (2005)
Kingma, D.P., Welling, M.: Auto-encoding variational bayes. CoRR arXiv:1312.6114 (2013)
Yeung, M., Sala, E., Schönlieb, C.-B., Rundo, L.: Unified focal loss: Generalising dice and cross entropy-based losses to handle class imbalanced medical image segmentation. Comput. Med. Imaging Graph. 95, 102026 (2022)
Sara, U., Akter, M., Uddin, M.S.: Image quality assessment through fsim, ssim, mse and psnr-a comparative study. J. Comput. Commun. 7(3), 8–18 (2019)
Heusel, M., Ramsauer, H., Unterthiner, T., Nessler, B., Hochreiter, S.: Gans trained by a two time-scale update rule converge to a local nash equilibrium. In: NIPS (2017)
Wang, Z., Bovik, A.C., Sheikh, H.R., Simoncelli, E.P.: Image quality assessment: From error visibility to structural similarity. IEEE Trans. Image Process. 13(4), 600–612 (2004)
Radford, A., Metz, L., Chintala, S.: Unsupervised representation learning with deep convolutional generative adversarial networks. CoRR arXiv:1511.06434 (2015)
Wattenberg, M., Viégas, F., Johnson, I.: How to use t-sne effectively. Distill 1(10), 2 (2016)
Acknowledgements
The research is supported by the National Natural Science Foundation of China under Grant No. 62072468.
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
The authors have no relevant financial or non-financial interests to disclose.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Wan, Q., Guo, W. & Wang, Y. SGBGAN: minority class image generation for class-imbalanced datasets. Machine Vision and Applications 35, 22 (2024). https://doi.org/10.1007/s00138-023-01506-y
Received:
Revised:
Accepted:
Published:
DOI: https://doi.org/10.1007/s00138-023-01506-y