Abstract
Multi-label text classification has been a key point of research in the area of text classification latterly. But to the best of our knowledge, there have been very few research on multi-label text classification for Bangla text. There is also inadequacy of proper dataset for multi-label classification on Bangla text. Multi-label classification has many applications in the real world. One of them is automated labeling of articles of online news portals so that readers can easily look up other news articles on similar topics by clicking on hyperlinks. We applied supervised multi-label classification techniques on Bangla news articles for automated tag generation to predict related topics. We have built a new dataset from scratch and applied various problem transformation methods for multi-label classification with naive bayes classifier, logistic regression and SVM. We have analyzed the performance of these algorithms on Bangla news articles with precision, recall, f1-score and hamming loss. The dataset and the analysis of the results can be valuable for further research on multi-label text classification of Bangla text. We have open-sourced the dataset and the source code of this work (http://bit.ly/34cSNCR).
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Katakis, I., Tsoumakas, G., Vlahavas, I.: Multilabel text classification for automated tag suggestion. In: Proceedings of the ECML/PKDD, vol. 18, p. 5 (2008)
Zhan, M.L., Zhou, Z.H.: Multilabel neural networks with applications to functional genomics and text categorization. IEEE Trans. Knowl. Data Eng. 18(10), 1338–1351 (2006)
Wei, Z., Zhang, H., Zhang, Z., Li, W., Miao, D.: A naive Bayesian multi-label classification algorithm with application to visualize text search results. Int. J. Adv. Intell. 3(2), 173–188 (2011)
Li, T., Ogihara, M.: Toward intelligent music information retrieval. IEEE Trans. Multimedia 8(3), 564–574 (2006)
Wieczorkowska, A., Synak, P., Raś, Z.W.: Multi-label classification of emotions in music. In: Kłopotek, M.A., Wierzchoń, S.T., Trojanowski, K. (eds.) Intelligent Information Processing and Web Mining. Advances in Soft Computing, vol. 35, pp. 307–315. Springer, Heidelberg (2006)
Trohidis, K., Tsoumakas, G., Kalliris, G., Vlahavas, I.P.: Multi-label classification of music into emotions. In: ISMIR, vol. 8, pp. 325–330 (2008)
Qi, G.J., et al.: Correlative multi-label video annotation. In: Proceedings of the 15th ACM International Conference on Multimedia, pp. 17–26. ACM (2007)
Snoek, C.G., Worring, M., Van Gemert, J.C., Geusebroek, J.M., Smeulders, A.W.: The challenge problem for automated detection of 101 semantic concepts in multimedia. In: Proceedings of the 14th ACM International Conference on Multimedia, pp. 421–430. ACM (2006)
Zhang, Y., Burer, S., Street, W.N.: Ensemble pruning via semi-definite programming. J. Mach. Learn. Res. 7(Jul), 1315–1338 (2006)
Tsoumakas, G., Katakis, I.: Multi-label classification: an overview. Int. J. Data Warehouse. Min. (IJDWM) 3(3), 1–13 (2007)
Madjarov, G., Kocev, D., Gjorgjevikj, D., Džeroski, S.: An extensive experimental comparison of methods for multi-label learning. Pattern Recogn. 45(9), 3084–3104 (2012)
Alazaidah, R., Ahmad, F.K.: Trending challenges in multi label classification. Int. J. Adv. Comput. Sci. Appl. 7(10), 127–131 (2016)
Mansur, M.: Analysis of n-gram based text categorization for Bangla in a newspaper corpus (Doctoral dissertation, BRAC University) (2006)
Mandal, A.K., Sen, R.: Supervised learning methods for Bangla web document categorization. arXiv preprint arXiv:1410.2045 (2014)
Chy, A.N., Seddiqui, M.H., Das, S.: Bangla news classification using naive Bayes classifier. In: 16th International Conference on Computer and Information Technology, pp. 366–371. IEEE (2014)
Kabir, F., Siddique, S., Kotwal, M.R.A., Huda, M.N.: Bangla text document categorization using stochastic gradient descent (SGD) classifier. In: 2015 International Conference on Cognitive Computing and Information Processing (CCIP), pp. 1–4. IEEE (2015)
Alam, M.T., Islam, M.M.: BARD: Bangla article classification using a new comprehensive dataset. In: 2018 International Conference on Bangla Speech and Language Processing (ICBSLP), pp. 1–5. IEEE (2018)
Islam, M., Jubayer, F.E.M., Ahmed, S.I.: A comparative study on different types of approaches to Bengali document categorization. arXiv preprint arXiv:1701.08694 (2017)
Dhar, A., Mukherjee, H., Dash, N.S., Roy, K.: Performance of classifiers in Bangla text categorization. In: 2018 International Conference on Innovations in Science, Engineering and Technology (ICISET), pp. 168–173. IEEE (2018)
Dhar, A., Dash, N.S., Roy, K.: Classification of Bangla text documents based on inverse class frequency. In: 2018 3rd International Conference on Internet of Things: Smart Innovation and Usages (IoT-SIU), pp. 1–6. IEEE (2018)
Read, J.: A pruned problem transformation method for multi-label classification. In: Proceedings of 2008 New Zealand Computer Science Research Student Conference (NZCSRS 2008), vol. 143150, p. 41 (2008)
Tsoumakas, G., Katakis, I., Vlahavas, I.: Random k-label sets for multilabel classification. IEEE Trans. Knowl. Data Eng. 23(7), 1079–1089 (2010)
Read, J.: Scalable multi-label classification (Doctoral dissertation, University of Waikato) (2010)
Abe, S.: Fuzzy support vector machines for multilabel classification. Pattern Recogn. 48(6), 2110–2117 (2015)
Liu, J., Chang, W.C., Wu, Y., Yang, Y.: Deep learning for extreme multi-label text classification. In: Proceedings of the 40th International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 115–124. ACM (2017)
Ahmed, N.A., Shehab, M.A., Al-Ayyoub, M., Hmeidi, I.: Scalable multi-label Arabic text classification. In: 2015 6th International Conference on Information and Communication Systems (ICICS), pp. 212–217. IEEE (2015)
Hasan, M.N., Bhowmik, S., Rahaman, M.M.: Multi-label sentence classification using Bengali word embedding model. In: 2017 3rd International Conference on Electrical Information and Communication Technology (EICT), pp. 1–6. IEEE (2017)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2020 ICST Institute for Computer Sciences, Social Informatics and Telecommunications Engineering
About this paper
Cite this paper
Bhakta, D., Dash, A.A., Bari, M.F., Shatabda, S. (2020). Supervised Machine Learning for Multi-label Classification of Bangla Articles. In: Bhuiyan, T., Rahman, M.M., Ali, M.A. (eds) Cyber Security and Computer Science. ICONCS 2020. Lecture Notes of the Institute for Computer Sciences, Social Informatics and Telecommunications Engineering, vol 325. Springer, Cham. https://doi.org/10.1007/978-3-030-52856-0_38
Download citation
DOI: https://doi.org/10.1007/978-3-030-52856-0_38
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-52855-3
Online ISBN: 978-3-030-52856-0
eBook Packages: Computer ScienceComputer Science (R0)