Skip to main content

Supervised Machine Learning for Multi-label Classification of Bangla Articles

  • Conference paper
  • First Online:
Cyber Security and Computer Science (ICONCS 2020)

Abstract

Multi-label text classification has been a key point of research in the area of text classification latterly. But to the best of our knowledge, there have been very few research on multi-label text classification for Bangla text. There is also inadequacy of proper dataset for multi-label classification on Bangla text. Multi-label classification has many applications in the real world. One of them is automated labeling of articles of online news portals so that readers can easily look up other news articles on similar topics by clicking on hyperlinks. We applied supervised multi-label classification techniques on Bangla news articles for automated tag generation to predict related topics. We have built a new dataset from scratch and applied various problem transformation methods for multi-label classification with naive bayes classifier, logistic regression and SVM. We have analyzed the performance of these algorithms on Bangla news articles with precision, recall, f1-score and hamming loss. The dataset and the analysis of the results can be valuable for further research on multi-label text classification of Bangla text. We have open-sourced the dataset and the source code of this work (http://bit.ly/34cSNCR).

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 119.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 159.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

Notes

  1. 1.

    www.prothomalo.com/sports.

  2. 2.

    https://jupyter.org/.

  3. 3.

    https://scikit-learn.org/.

References

  1. Katakis, I., Tsoumakas, G., Vlahavas, I.: Multilabel text classification for automated tag suggestion. In: Proceedings of the ECML/PKDD, vol. 18, p. 5 (2008)

    Google Scholar 

  2. Zhan, M.L., Zhou, Z.H.: Multilabel neural networks with applications to functional genomics and text categorization. IEEE Trans. Knowl. Data Eng. 18(10), 1338–1351 (2006)

    Article  Google Scholar 

  3. Wei, Z., Zhang, H., Zhang, Z., Li, W., Miao, D.: A naive Bayesian multi-label classification algorithm with application to visualize text search results. Int. J. Adv. Intell. 3(2), 173–188 (2011)

    Google Scholar 

  4. Li, T., Ogihara, M.: Toward intelligent music information retrieval. IEEE Trans. Multimedia 8(3), 564–574 (2006)

    Article  Google Scholar 

  5. Wieczorkowska, A., Synak, P., Raś, Z.W.: Multi-label classification of emotions in music. In: Kłopotek, M.A., Wierzchoń, S.T., Trojanowski, K. (eds.) Intelligent Information Processing and Web Mining. Advances in Soft Computing, vol. 35, pp. 307–315. Springer, Heidelberg (2006)

    Chapter  Google Scholar 

  6. Trohidis, K., Tsoumakas, G., Kalliris, G., Vlahavas, I.P.: Multi-label classification of music into emotions. In: ISMIR, vol. 8, pp. 325–330 (2008)

    Google Scholar 

  7. Qi, G.J., et al.: Correlative multi-label video annotation. In: Proceedings of the 15th ACM International Conference on Multimedia, pp. 17–26. ACM (2007)

    Google Scholar 

  8. Snoek, C.G., Worring, M., Van Gemert, J.C., Geusebroek, J.M., Smeulders, A.W.: The challenge problem for automated detection of 101 semantic concepts in multimedia. In: Proceedings of the 14th ACM International Conference on Multimedia, pp. 421–430. ACM (2006)

    Google Scholar 

  9. Zhang, Y., Burer, S., Street, W.N.: Ensemble pruning via semi-definite programming. J. Mach. Learn. Res. 7(Jul), 1315–1338 (2006)

    MathSciNet  MATH  Google Scholar 

  10. Tsoumakas, G., Katakis, I.: Multi-label classification: an overview. Int. J. Data Warehouse. Min. (IJDWM) 3(3), 1–13 (2007)

    Article  Google Scholar 

  11. Madjarov, G., Kocev, D., Gjorgjevikj, D., Džeroski, S.: An extensive experimental comparison of methods for multi-label learning. Pattern Recogn. 45(9), 3084–3104 (2012)

    Article  Google Scholar 

  12. Alazaidah, R., Ahmad, F.K.: Trending challenges in multi label classification. Int. J. Adv. Comput. Sci. Appl. 7(10), 127–131 (2016)

    Google Scholar 

  13. Mansur, M.: Analysis of n-gram based text categorization for Bangla in a newspaper corpus (Doctoral dissertation, BRAC University) (2006)

    Google Scholar 

  14. Mandal, A.K., Sen, R.: Supervised learning methods for Bangla web document categorization. arXiv preprint arXiv:1410.2045 (2014)

  15. Chy, A.N., Seddiqui, M.H., Das, S.: Bangla news classification using naive Bayes classifier. In: 16th International Conference on Computer and Information Technology, pp. 366–371. IEEE (2014)

    Google Scholar 

  16. Kabir, F., Siddique, S., Kotwal, M.R.A., Huda, M.N.: Bangla text document categorization using stochastic gradient descent (SGD) classifier. In: 2015 International Conference on Cognitive Computing and Information Processing (CCIP), pp. 1–4. IEEE (2015)

    Google Scholar 

  17. Alam, M.T., Islam, M.M.: BARD: Bangla article classification using a new comprehensive dataset. In: 2018 International Conference on Bangla Speech and Language Processing (ICBSLP), pp. 1–5. IEEE (2018)

    Google Scholar 

  18. Islam, M., Jubayer, F.E.M., Ahmed, S.I.: A comparative study on different types of approaches to Bengali document categorization. arXiv preprint arXiv:1701.08694 (2017)

  19. Dhar, A., Mukherjee, H., Dash, N.S., Roy, K.: Performance of classifiers in Bangla text categorization. In: 2018 International Conference on Innovations in Science, Engineering and Technology (ICISET), pp. 168–173. IEEE (2018)

    Google Scholar 

  20. Dhar, A., Dash, N.S., Roy, K.: Classification of Bangla text documents based on inverse class frequency. In: 2018 3rd International Conference on Internet of Things: Smart Innovation and Usages (IoT-SIU), pp. 1–6. IEEE (2018)

    Google Scholar 

  21. Read, J.: A pruned problem transformation method for multi-label classification. In: Proceedings of 2008 New Zealand Computer Science Research Student Conference (NZCSRS 2008), vol. 143150, p. 41 (2008)

    Google Scholar 

  22. Tsoumakas, G., Katakis, I., Vlahavas, I.: Random k-label sets for multilabel classification. IEEE Trans. Knowl. Data Eng. 23(7), 1079–1089 (2010)

    Article  Google Scholar 

  23. Read, J.: Scalable multi-label classification (Doctoral dissertation, University of Waikato) (2010)

    Google Scholar 

  24. Abe, S.: Fuzzy support vector machines for multilabel classification. Pattern Recogn. 48(6), 2110–2117 (2015)

    Article  Google Scholar 

  25. Liu, J., Chang, W.C., Wu, Y., Yang, Y.: Deep learning for extreme multi-label text classification. In: Proceedings of the 40th International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 115–124. ACM (2017)

    Google Scholar 

  26. Ahmed, N.A., Shehab, M.A., Al-Ayyoub, M., Hmeidi, I.: Scalable multi-label Arabic text classification. In: 2015 6th International Conference on Information and Communication Systems (ICICS), pp. 212–217. IEEE (2015)

    Google Scholar 

  27. Hasan, M.N., Bhowmik, S., Rahaman, M.M.: Multi-label sentence classification using Bengali word embedding model. In: 2017 3rd International Conference on Electrical Information and Communication Technology (EICT), pp. 1–6. IEEE (2017)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Dip Bhakta .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2020 ICST Institute for Computer Sciences, Social Informatics and Telecommunications Engineering

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Bhakta, D., Dash, A.A., Bari, M.F., Shatabda, S. (2020). Supervised Machine Learning for Multi-label Classification of Bangla Articles. In: Bhuiyan, T., Rahman, M.M., Ali, M.A. (eds) Cyber Security and Computer Science. ICONCS 2020. Lecture Notes of the Institute for Computer Sciences, Social Informatics and Telecommunications Engineering, vol 325. Springer, Cham. https://doi.org/10.1007/978-3-030-52856-0_38

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-52856-0_38

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-52855-3

  • Online ISBN: 978-3-030-52856-0

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics