Skip to main content

A GAN-BERT Based Approach for Bengali Text Classification with a Few Labeled Examples

  • Conference paper
  • First Online:

Part of the book series: Lecture Notes in Networks and Systems ((LNNS,volume 583))

Abstract

Basic machine learning algorithms or transfer learning models work well for language categorization, but these models require a vast volume of annotated data. We need a better model to tackle the problem because labeled data is scarce. This problem may have a solution in GAN-BERT. To classify Bengali text, we have developed a GAN-BERT based model, which is an adapted version of BERT. We have used two different datasets for this purpose. One is a hate speech dataset, while the other is a fake news dataset. To understand how the GAN-BERT and traditional BERT models behave with Bangla datasets, we have experimented with both. With a small quantity of data, we are able to get a satisfactory result using GAN-BERT. We have also demonstrated how the accuracy increases as the number of training samples increases. A comparison of performance between traditional BERT based Bangla-BERT and our GAN-Bangla-BERT model is also shown here, where we can see how these models react to a small number of labeled data.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   169.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   219.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Notes

  1. 1.

    https://huggingface.co/sagorsarker/bangla-bert-base.

  2. 2.

    https://huggingface.co/monsoon-nlp/bangla-electra.

References

  1. Adib, Q.A.R., Mehedi, M.H.K., Sakib, M.S., Patwary, K.K., Hossain, M.S., Rasel, A.A.: A deep hybrid learning approach to detect bangla fake news. In: 2021 5th International Symposium on Multidisciplinary Studies and Innovative Technologies (ISMSIT), pp. 442–447 (2021). https://doi.org/10.1109/ISMSIT52890.2021.9604712

  2. Breazzano, C., Croce, D., Basili, R.: MT-GAN-BERT: multi-task and generative adversarial learning for sustainable language processing. In: Cabrio, E., Croce, D., Passaro, L.C., Sprugnoli, R. (eds.) Proceedings of the Fifth Workshop on Natural Language for Artificial Intelligence (NL4AI 2021) co-located with 20th International Conference of the Italian Association for Artificial Intelligence (AI*IA 2021), Online event, November 29, 2021. CEUR Workshop Proceedings, vol. 3015. CEUR-WS.org. https://ceur-ws.org/Vol-3015/paper133.pdf (2021)

  3. Croce, D., Castellucci, G., Basili, R.: GAN-BERT: Generative adversarial learning for robust text classification with a bunch of labeled examples. In: Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pp. 2114–2119. Association for Computational Linguistics, Online (Jul 2020). https://doi.org/10.18653/v1/2020.acl-main.191. https://aclanthology.org/2020.acl-main.191

  4. Das, A.K., Al Asif, A., Paul, A., Hossain, M.N.: Bangla hate speech detection on social media using attention-based recurrent neural network. J. Intell. Syst. 30(1), 578–591 (2021)

    Article  Google Scholar 

  5. Devlin, J., Chang, M., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding (2018). https://arxiv.org/abs/1810.04805

  6. Girshick, R., Donahue, J., Darrell, T., Malik, J.: Rich feature hierarchies for accurate object detection and semantic segmentation (2014)

    Google Scholar 

  7. Goodfellow, I.J., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., Bengio, Y.: Generative adversarial networks (2014)

    Google Scholar 

  8. Hossain, M.Z., Rahman, M.A., Islam, M.S., Kar, S.: BanFakeNews: a dataset for detecting fake news in Bangla. In: Proceedings of the 12th Language Resources and Evaluation Conference, pp. 2862–2871. European Language Resources Association, Marseille, France (2020). https://aclanthology.org/2020.lrec-1.349

  9. Karim, M.R., Chakravarti, B.R., P. McCrae, J., Cochez, M.: Classification benchmarks for under-resourced Bengali language based on multichannel convolutional-lstm network. In: 7th IEEE International Conference on Data Science and Advanced Analytics (IEEE DSAA 2020. IEEE (2020)

    Google Scholar 

  10. Karim, M.R., Dey, S.K., Islam, T., Sarker, S., Menon, M.H., Hossain, K., Hossain, M.A., Decker, S.: Deephateexplainer: Explainable hate speech detection in under-resourced bengali language. In: 2021 IEEE 8th International Conference on Data Science and Advanced Analytics (DSAA), pp. 1–10 (2021). https://doi.org/10.1109/DSAA53316.2021.9564230

  11. Salimans, T., Goodfellow, I., Zaremba, W., Cheung, V., Radford, A., Chen, X.: Improved techniques for training gans. In: Proceedings of the 30th International Conference on Neural Information Processing Systems, pp. 2234–2242. NIPS’16, Curran Associates Inc., Red Hook, NY, USA (2016)

    Google Scholar 

  12. Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, L., Polosukhin, I.: Attention is all you need (2017). https://arxiv.org/abs/1706.03762

Download references

Author information

Authors and Affiliations

Authors

Contributions

Raihan Tanvir and Md Tanvir Rouf Shawon have equal contributions.

Corresponding author

Correspondence to Raihan Tanvir .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Tanvir, R., Shawon, M.T.R., Mehedi, M.H.K., Mahtab, M.M., Rasel, A.A. (2023). A GAN-BERT Based Approach for Bengali Text Classification with a Few Labeled Examples. In: Omatu, S., Mehmood, R., Sitek, P., Cicerone, S., Rodríguez, S. (eds) Distributed Computing and Artificial Intelligence, 19th International Conference. DCAI 2022. Lecture Notes in Networks and Systems, vol 583. Springer, Cham. https://doi.org/10.1007/978-3-031-20859-1_3

Download citation

Publish with us

Policies and ethics