ABSTRACT
Word embedding is a technique for understanding the relationship among words by mapping words to numbers. Several kinds of research have been carried out in this field in different languages such as English, Hindi, Bengali etc. but very few works are available in the Nepali language domain. In this work, the word embedding technique using Word2Vec is implemented for Nepali news data. The methodology involved in this work includes Dataset preparation and Word2Vec modelling. Gensim package is used for implementing the Word2Vec model and its output shows the similarity between Nepali words. The work mainly focuses on developing word embedding on Nepali words generated by scraping the health section of Nepali news portals and has shown promising results.
- 2009. Gensim: topic modelling for human. https://radimrehurek.com/gensim/models/word2vec.htmlGoogle Scholar
- 2016. Word2Vec. https://docs.h2o.ai/h2o/latest-stable/h2o-docs/data-science/word2vec.htmlGoogle Scholar
- 2022. NLTK :: Natural Language Toolkit. https://www.nltk.org/Google Scholar
- 2022. Scrapy | A Fast and Powerful Scraping and Web Crawling Framework. https://scrapy.org/Google Scholar
- Samar Al-Saqqa and Arafar Awajan. 2019. The Use of Word2vec Model in Sentiment Analysis: A Survey. In 2019 International Conference on Artificial Intelligence, Robotics and Control (AIRC ’19), Vol. 157. Association for Computing Machinery, Cairo Egypt, 39–43. https://doi.org/10.1145/3388218.3388229Google ScholarDigital Library
- Zafar Ali. 2019. A simple Word2vec tutorial. Retrieved April 18, 2022 from https://medium.com/@zafaralibagh6/a-simple-word2vec-tutorial-61e64e38a6a1Google Scholar
- Derry Jatnika Moch Arif Bijaksana and Arie Ardiyanti Suryani. 2019. Word2Vec Model Analysis for Semantic Similarities in English Words. In 4th International Conference on Computer Science and Computational Intelligence 2019 (ICCSCI), Vol. 157. Procedia Computer Science, 160–167. https://doi.org/10.1016/j.procs.2019.08.153Google ScholarDigital Library
- Daniel Johnson. 2022. Word Embedding Tutorial | Word2vec Model Gensim Example. Retrieved May 18, 2022 from https://www.guru99.com/word-embedding-word2vec.htmlGoogle Scholar
- Dhruvil Karani. 2018. Introduction to Word Embedding and Word2Vec. Retrieved April 18, 2022 from https://towardsdatascience.com/introduction-to-word-embedding-and-word2vec-652d0c2060faGoogle Scholar
- Pravesh Koirala and Nobal B. Niraula. 2021. NPVec1: Word Embeddings for Nepali - Construction and Evaluation. In Proceedings of the 6th Workshop on Representation Learning for NLP (RepL4NLP-2021). Association for Computational Linguistics, Bangkok Thailand, 174–184. https://doi.org/10.18653/v1/2021.repl4nlp-1.18Google ScholarCross Ref
- Ria Kulshrestha. 2019. NLP 101: Word2Vec — Skip-gram and CBOW. Retrieved May 18, 2022 from https://towardsdatascience.com/nlp-101-word2vec-skip-gram-and-cbow-93512ee24314Google Scholar
- Sajadul Hassan Kumhar, Mudasir M. Kirmani, Jitendra Sheetlani, and Mudasir Hassan. 2021. Word Embedding Generation for Urdu Language using Word2vec model. Materials Today: Proceedings(2021). https://doi.org/10.1016/j.matpr.2020.11.766Google ScholarCross Ref
- Zhi Li. 2019. A Beginner’s Guide to Word Embedding with Gensim Word2Vec Model. Retrieved April 18, 2022 from https://towardsdatascience.com/a-beginners-guide-to-word-embedding-with-gensim-word2vec-model-5970fa56cc92Google Scholar
- Sabitra Sankalp Panigrahi Narayan Panigrahiand Biswajit Paul. 2018. Modelling of Topic from Hindi Corpus using Word2Vec. In 2018 Second International Conference on Advances in Computing, Control and Communication Technology (IAC3T). IEEE, 97–100. https://doi.org/10.1109/IAC3T.2018.8674031Google ScholarCross Ref
Index Terms
- Word Embedding in Nepali Language using Word2Vec
Recommendations
Word2vec’s Distributed Word Representation for Hindi Word Sense Disambiguation
Distributed Computing and Internet TechnologyA study of lexical function detection with word2vec and supervised machine learning
Special Section: Applied Machine Learning and Management of Volatility, Uncertainty, Complexity & Ambiguity (V.U.C.A)In this work, we report the results of our experiments on the task of distinguishing the semantics of verb-noun collocations in a Spanish corpus. This semantics was represented by four lexical functions of the Meaning-Text Theory. Each lexical function ...
Word2vec for Arabic Word Sense Disambiguation
Natural Language Processing and Information SystemsAbstractWord embedding, where words are represented as vectors in a continuous space, has recently attracted much attention in natural language processing tasks due to their ability to capture semantic and syntactic relations between words from a huge ...
Comments