Bengali paper classification using ensemble machine learning algorithms
by Niaz Ashraf Khan; Emrul Hasan Zawad; Rashedur M. Rahman
International Journal of Knowledge Engineering and Soft Data Paradigms (IJKESDP), Vol. 7, No. 2, 2022

Abstract: Text classification is one of the most challenging problems in natural language processing (NLP). Language models are at the heart of NLP. The ability to represent texts as numbers has given rise to many NLP tasks, for example, text categorisation, translation, and summarisation. Unfortunately, NLP for Bengali texts has not reached the state-of-art level of other Languages like English yet, mostly due to the scarcity of resources and the complexities seen in Bengali grammar. Therefore, not much work has been done in this field. In this paper, we have studied one of the word embedding methods, Word2vec, based on continuous bag of words (CBOW) with several ensemble machine learning algorithms, e.g., Adaptive Boosting Classifiers, Light Gradient Boosting Machine, XGboost, and random forest classifiers (RFC). The model is trained on a large corpus of Bengali newspapers of a considerable size that has 99283949 words and 8284804 sentences in 392772 documents. In our experiment, Word2vec CBOW model with XGboost algorithm performed much better than other models and achieved 92.24% accuracy.

Online publication date: Tue, 13-Dec-2022

The full text of this article is only available to individual subscribers or to users at subscribing institutions.

 
Existing subscribers:
Go to Inderscience Online Journals to access the Full Text of this article.

Pay per view:
If you are not a subscriber and you just want to read the full contents of this article, buy online access here.

Complimentary Subscribers, Editors or Members of the Editorial Board of the International Journal of Knowledge Engineering and Soft Data Paradigms (IJKESDP):
Login with your Inderscience username and password:

    Username:        Password:         

Forgotten your password?


Want to subscribe?
A subscription gives you complete access to all articles in the current issue, as well as to all articles in the previous three years (where applicable). See our Orders page to subscribe.

If you still need assistance, please email subs@inderscience.com