A GAN-BERT Based Approach for Bengali Text Classification with a Few Labeled Examples

Tanvir, Raihan; Shawon, Md Tanvir Rouf; Mehedi, Md Humaion Kabir; Mahtab, Md Motahar; Rasel, Annajiat Alim

doi:10.1007/978-3-031-20859-1_3

A GAN-BERT Based Approach for Bengali Text Classification with a Few Labeled Examples

Raihan Tanvir¹⁴,
Md Tanvir Rouf Shawon¹⁴,
Md Humaion Kabir Mehedi¹⁴,
Md Motahar Mahtab¹⁴ &
…
Annajiat Alim Rasel¹⁴

Conference paper
First Online: 13 December 2022

271 Accesses
3 Citations

Part of the book series: Lecture Notes in Networks and Systems ((LNNS,volume 583))

Abstract

Basic machine learning algorithms or transfer learning models work well for language categorization, but these models require a vast volume of annotated data. We need a better model to tackle the problem because labeled data is scarce. This problem may have a solution in GAN-BERT. To classify Bengali text, we have developed a GAN-BERT based model, which is an adapted version of BERT. We have used two different datasets for this purpose. One is a hate speech dataset, while the other is a fake news dataset. To understand how the GAN-BERT and traditional BERT models behave with Bangla datasets, we have experimented with both. With a small quantity of data, we are able to get a satisfactory result using GAN-BERT. We have also demonstrated how the accuracy increases as the number of training samples increases. A comparison of performance between traditional BERT based Bangla-BERT and our GAN-Bangla-BERT model is also shown here, where we can see how these models react to a small number of labeled data.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 169.00; Price excludes VAT (USA)

Softcover Book: USD 219.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Notes

References

Adib, Q.A.R., Mehedi, M.H.K., Sakib, M.S., Patwary, K.K., Hossain, M.S., Rasel, A.A.: A deep hybrid learning approach to detect bangla fake news. In: 2021 5th International Symposium on Multidisciplinary Studies and Innovative Technologies (ISMSIT), pp. 442–447 (2021). https://doi.org/10.1109/ISMSIT52890.2021.9604712
Breazzano, C., Croce, D., Basili, R.: MT-GAN-BERT: multi-task and generative adversarial learning for sustainable language processing. In: Cabrio, E., Croce, D., Passaro, L.C., Sprugnoli, R. (eds.) Proceedings of the Fifth Workshop on Natural Language for Artificial Intelligence (NL4AI 2021) co-located with 20th International Conference of the Italian Association for Artificial Intelligence (AI*IA 2021), Online event, November 29, 2021. CEUR Workshop Proceedings, vol. 3015. CEUR-WS.org. https://ceur-ws.org/Vol-3015/paper133.pdf (2021)
Croce, D., Castellucci, G., Basili, R.: GAN-BERT: Generative adversarial learning for robust text classification with a bunch of labeled examples. In: Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pp. 2114–2119. Association for Computational Linguistics, Online (Jul 2020). https://doi.org/10.18653/v1/2020.acl-main.191. https://aclanthology.org/2020.acl-main.191
Das, A.K., Al Asif, A., Paul, A., Hossain, M.N.: Bangla hate speech detection on social media using attention-based recurrent neural network. J. Intell. Syst. 30(1), 578–591 (2021)
Article Google Scholar
Devlin, J., Chang, M., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding (2018). https://arxiv.org/abs/1810.04805
Girshick, R., Donahue, J., Darrell, T., Malik, J.: Rich feature hierarchies for accurate object detection and semantic segmentation (2014)
Google Scholar
Goodfellow, I.J., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., Bengio, Y.: Generative adversarial networks (2014)
Google Scholar
Hossain, M.Z., Rahman, M.A., Islam, M.S., Kar, S.: BanFakeNews: a dataset for detecting fake news in Bangla. In: Proceedings of the 12th Language Resources and Evaluation Conference, pp. 2862–2871. European Language Resources Association, Marseille, France (2020). https://aclanthology.org/2020.lrec-1.349
Karim, M.R., Chakravarti, B.R., P. McCrae, J., Cochez, M.: Classification benchmarks for under-resourced Bengali language based on multichannel convolutional-lstm network. In: 7th IEEE International Conference on Data Science and Advanced Analytics (IEEE DSAA 2020. IEEE (2020)
Google Scholar
Karim, M.R., Dey, S.K., Islam, T., Sarker, S., Menon, M.H., Hossain, K., Hossain, M.A., Decker, S.: Deephateexplainer: Explainable hate speech detection in under-resourced bengali language. In: 2021 IEEE 8th International Conference on Data Science and Advanced Analytics (DSAA), pp. 1–10 (2021). https://doi.org/10.1109/DSAA53316.2021.9564230
Salimans, T., Goodfellow, I., Zaremba, W., Cheung, V., Radford, A., Chen, X.: Improved techniques for training gans. In: Proceedings of the 30th International Conference on Neural Information Processing Systems, pp. 2234–2242. NIPS’16, Curran Associates Inc., Red Hook, NY, USA (2016)
Google Scholar
Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, L., Polosukhin, I.: Attention is all you need (2017). https://arxiv.org/abs/1706.03762

Download references

Author information

Authors and Affiliations

BRAC University, 66 Mohakhali, Dhaka, 1212, Bangladesh
Raihan Tanvir, Md Tanvir Rouf Shawon, Md Humaion Kabir Mehedi, Md Motahar Mahtab & Annajiat Alim Rasel

Authors

Raihan Tanvir
View author publications
You can also search for this author in PubMed Google Scholar
Md Tanvir Rouf Shawon
View author publications
You can also search for this author in PubMed Google Scholar
Md Humaion Kabir Mehedi
View author publications
You can also search for this author in PubMed Google Scholar
Md Motahar Mahtab
View author publications
You can also search for this author in PubMed Google Scholar
Annajiat Alim Rasel
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

Raihan Tanvir and Md Tanvir Rouf Shawon have equal contributions.

Corresponding author

Correspondence to Raihan Tanvir .

Editor information

Editors and Affiliations

Hiroshima University, Hiroshima, Japan
Sigeru Omatu
King Abdulaziz University, Jeddah, Saudi Arabia
Rashid Mehmood
Kielce University of Technology, Kielce, Poland
Pawel Sitek
Palazzo Camponeschi, University of L'Aquila, L'Aquila, Italy
Serafino Cicerone
BISITE, Edificio I+D+i, University of Salamanca, Salamanca, Spain
Sara Rodríguez

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Tanvir, R., Shawon, M.T.R., Mehedi, M.H.K., Mahtab, M.M., Rasel, A.A. (2023). A GAN-BERT Based Approach for Bengali Text Classification with a Few Labeled Examples. In: Omatu, S., Mehmood, R., Sitek, P., Cicerone, S., Rodríguez, S. (eds) Distributed Computing and Artificial Intelligence, 19th International Conference. DCAI 2022. Lecture Notes in Networks and Systems, vol 583. Springer, Cham. https://doi.org/10.1007/978-3-031-20859-1_3

Download citation

DOI: https://doi.org/10.1007/978-3-031-20859-1_3
Published: 13 December 2022
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-20858-4
Online ISBN: 978-3-031-20859-1
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)

Publish with us

Policies and ethics