skip to main content
10.1145/3582768.3582799acmotherconferencesArticle/Chapter ViewAbstractPublication PagesnlpirConference Proceedingsconference-collections
research-article
Open Access

Word Embedding in Nepali Language using Word2Vec

Published:27 June 2023Publication History

ABSTRACT

Word embedding is a technique for understanding the relationship among words by mapping words to numbers. Several kinds of research have been carried out in this field in different languages such as English, Hindi, Bengali etc. but very few works are available in the Nepali language domain. In this work, the word embedding technique using Word2Vec is implemented for Nepali news data. The methodology involved in this work includes Dataset preparation and Word2Vec modelling. Gensim package is used for implementing the Word2Vec model and its output shows the similarity between Nepali words. The work mainly focuses on developing word embedding on Nepali words generated by scraping the health section of Nepali news portals and has shown promising results.

References

  1. 2009. Gensim: topic modelling for human. https://radimrehurek.com/gensim/models/word2vec.htmlGoogle ScholarGoogle Scholar
  2. 2016. Word2Vec. https://docs.h2o.ai/h2o/latest-stable/h2o-docs/data-science/word2vec.htmlGoogle ScholarGoogle Scholar
  3. 2022. NLTK :: Natural Language Toolkit. https://www.nltk.org/Google ScholarGoogle Scholar
  4. 2022. Scrapy | A Fast and Powerful Scraping and Web Crawling Framework. https://scrapy.org/Google ScholarGoogle Scholar
  5. Samar Al-Saqqa and Arafar Awajan. 2019. The Use of Word2vec Model in Sentiment Analysis: A Survey. In 2019 International Conference on Artificial Intelligence, Robotics and Control (AIRC ’19), Vol. 157. Association for Computing Machinery, Cairo Egypt, 39–43. https://doi.org/10.1145/3388218.3388229Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Zafar Ali. 2019. A simple Word2vec tutorial. Retrieved April 18, 2022 from https://medium.com/@zafaralibagh6/a-simple-word2vec-tutorial-61e64e38a6a1Google ScholarGoogle Scholar
  7. Derry Jatnika Moch Arif Bijaksana and Arie Ardiyanti Suryani. 2019. Word2Vec Model Analysis for Semantic Similarities in English Words. In 4th International Conference on Computer Science and Computational Intelligence 2019 (ICCSCI), Vol. 157. Procedia Computer Science, 160–167. https://doi.org/10.1016/j.procs.2019.08.153Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. Daniel Johnson. 2022. Word Embedding Tutorial | Word2vec Model Gensim Example. Retrieved May 18, 2022 from https://www.guru99.com/word-embedding-word2vec.htmlGoogle ScholarGoogle Scholar
  9. Dhruvil Karani. 2018. Introduction to Word Embedding and Word2Vec. Retrieved April 18, 2022 from https://towardsdatascience.com/introduction-to-word-embedding-and-word2vec-652d0c2060faGoogle ScholarGoogle Scholar
  10. Pravesh Koirala and Nobal B. Niraula. 2021. NPVec1: Word Embeddings for Nepali - Construction and Evaluation. In Proceedings of the 6th Workshop on Representation Learning for NLP (RepL4NLP-2021). Association for Computational Linguistics, Bangkok Thailand, 174–184. https://doi.org/10.18653/v1/2021.repl4nlp-1.18Google ScholarGoogle ScholarCross RefCross Ref
  11. Ria Kulshrestha. 2019. NLP 101: Word2Vec — Skip-gram and CBOW. Retrieved May 18, 2022 from https://towardsdatascience.com/nlp-101-word2vec-skip-gram-and-cbow-93512ee24314Google ScholarGoogle Scholar
  12. Sajadul Hassan Kumhar, Mudasir M. Kirmani, Jitendra Sheetlani, and Mudasir Hassan. 2021. Word Embedding Generation for Urdu Language using Word2vec model. Materials Today: Proceedings(2021). https://doi.org/10.1016/j.matpr.2020.11.766Google ScholarGoogle ScholarCross RefCross Ref
  13. Zhi Li. 2019. A Beginner’s Guide to Word Embedding with Gensim Word2Vec Model. Retrieved April 18, 2022 from https://towardsdatascience.com/a-beginners-guide-to-word-embedding-with-gensim-word2vec-model-5970fa56cc92Google ScholarGoogle Scholar
  14. Sabitra Sankalp Panigrahi Narayan Panigrahiand Biswajit Paul. 2018. Modelling of Topic from Hindi Corpus using Word2Vec. In 2018 Second International Conference on Advances in Computing, Control and Communication Technology (IAC3T). IEEE, 97–100. https://doi.org/10.1109/IAC3T.2018.8674031Google ScholarGoogle ScholarCross RefCross Ref

Index Terms

  1. Word Embedding in Nepali Language using Word2Vec

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in
    • Published in

      cover image ACM Other conferences
      NLPIR '22: Proceedings of the 2022 6th International Conference on Natural Language Processing and Information Retrieval
      December 2022
      241 pages
      ISBN:9781450397629
      DOI:10.1145/3582768

      Copyright © 2022 Owner/Author

      This work is licensed under a Creative Commons Attribution International 4.0 License.

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 27 June 2023

      Check for updates

      Qualifiers

      • research-article
      • Research
      • Refereed limited
    • Article Metrics

      • Downloads (Last 12 months)343
      • Downloads (Last 6 weeks)28

      Other Metrics

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    HTML Format

    View this article in HTML Format .

    View HTML Format