skip to main content
10.1145/3396452.3396460acmotherconferencesArticle/Chapter ViewAbstractPublication PagesicbdeConference Proceedingsconference-collections
research-article

Keywords Extraction Based on Word2Vec and TextRank

Authors Info & Claims
Published:22 May 2020Publication History

ABSTRACT

In order to improve the performance of keyword extraction by enhancing the semantic representations of documents, we propose a method of keyword extraction which exploits the document's internal semantic information and the semantic representations of words pre-trained by massive external documents. Firstly, we utilize the deep learning tool Word2Vec to characterize the external document information, and evaluate the similarity between the words by the cosine distance, thus we obtain the semantic information between words in the external documents. Then, the word-to-word similarity is used to replace the probability transfer matrix in the TextRank of word graph of the target document. At the same time, the information of the title and the abstract of the internal document are exploited to construct the words' semantic graph for keyword extraction. The experiments select the related academic paper data from AMiner as experimental data set. The experimental results show that our method outperforms the TextRank algorithm and the precision, recall and F-score of the five keywords are increased by 28.60%, 10.70% and 12.90% respectively compared to the single TextRank algorithm.

References

  1. Mihalcea, Rada., and Tarau, Paul. 2004. Textrank: bringing order into texts. Emnlp, 404--411. DOI= http://dx.doi.org/Google ScholarGoogle Scholar
  2. Tian, Xia. 2013. Study on keyword extraction using word position weighted textrank. New Technology of Library and Information Service. 237 (09): 30--34.Google ScholarGoogle Scholar
  3. Jianfei, Ning., and Jiangzhen, L. 2016. Using word2vec with text rank to extract keywords. New Technology of Library and Information Service.271 (06): 20--26.Google ScholarGoogle Scholar
  4. Qifei, Liu., and Weiyu, Sheng. 2018. Research of keyword extraction of political news based on word2vec and textrank. Information Research. 248 (06): 26--31. (In Chinese)Google ScholarGoogle Scholar
  5. Zhang, Kuo., Xu, Hui., Tang, Jie., and Juangzi, Li. 2006. Keyword Extraction Using Support Vector Machine. International Conference on Advances in Web-age Information Management. Springer-Verlag. 85--96. DOI=https://doi.org/10.1007/11775300_8Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Liang, Hu., Lei, Xia., and Wei, Li. 2017. Keyword Extraction System Based on Improved TF-IDF Algorithm. Journal of Xiamen University of Technology. 25 (05): 73--78. (In Chinese)Google ScholarGoogle Scholar
  7. Qiang, Jia., et al. 2017. Research on Improved TF-IDF Text Feature Word Extraction Algorithm. Journal of Liaoning University of Petroleum & Chemical Technology. 37 (4): 23--29. (In Chinese)Google ScholarGoogle Scholar
  8. Blei, David M., et al. 2003. Latent dirichlet allocation. Journal of Machine Learning Research. 3993--1022.Google ScholarGoogle Scholar
  9. Alokaili, Areej., Aletras, Nikolaos., Stevenson, Mark. 2019. Re-ranking words to improve interpretability of automatically generated topics. In Proceedings of the 13th International Conference on Computational Semantics - Long Papers. 43--54. DOI=10.18653 / v1 / W19-0404Google ScholarGoogle ScholarCross RefCross Ref
  10. Yijun, Guang., and Tian, Xia. 2014. Study on keyword extraction with lda and textrank combination. New Technology of Library and Information Service. 41--47.Google ScholarGoogle Scholar
  11. Negi, Sumit. 2014. Document Keyphrase Extraction Using Label Information. In Proceedings of {COLING} 2014, the 25th International Conference on Computational Linguistics: Technical Papers. 1468--1476.Google ScholarGoogle Scholar
  12. Jun, Chen., et al. 2018. Keyphrase Generation with Correlation Constraints. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing. 4057--4066. DOI=10.18653/v1/D18-1439Google ScholarGoogle Scholar
  13. Nicosia, Massimo., and Moschitti, Alessandro. 2018. Semantic Linking in Convolutional Neural Networks for Answer Sentence Selection. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing. 1070--1076. DOI= 10.18653/v1/D18-1133Google ScholarGoogle ScholarCross RefCross Ref
  14. Timothy, Niven. and Hung-Yu, Kao. 2019. Probing Neural Network Comprehension of Natural Language Arguments. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics. 4658--4664. DOI=10.18653/v1/P19-1459Google ScholarGoogle Scholar
  15. Guangyi, Li., and Houfeng, Wang. 2014. Improved automatic keyword extraction based on textrank using domain knowledge. Communications in Computer & Information Science. 496:403--413.Google ScholarGoogle ScholarCross RefCross Ref
  16. Xiao-Lei, Bai., Guang-Jun, Huang., and Jian-Hui, Duan. 2014. A keyword extraction method based on bp neural network. Journal of Hefei University of Technology (Natural Science). 37(07):807--811.Google ScholarGoogle Scholar
  17. Johannes, Villmow., Marco, Wrzalik., Dirk, Krechel. 2018. Automatic Keyphrase Extraction Using Recurrent Neural Networks. M. Machine Learning and Data Mining in Pattern Recognition.Google ScholarGoogle Scholar
  18. Page Lawrence., et al. 1999.The PageRank Citation Ranking: Bringing Order to the Web. R. Stanford InfoLab.Google ScholarGoogle Scholar

Index Terms

  1. Keywords Extraction Based on Word2Vec and TextRank

        Recommendations

        Comments

        Login options

        Check if you have access through your login credentials or your institution to get full access on this article.

        Sign in
        • Published in

          cover image ACM Other conferences
          ICBDE '20: Proceedings of the 2020 3rd International Conference on Big Data and Education
          April 2020
          85 pages
          ISBN:9781450374989
          DOI:10.1145/3396452

          Copyright © 2020 ACM

          Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

          Publisher

          Association for Computing Machinery

          New York, NY, United States

          Publication History

          • Published: 22 May 2020

          Permissions

          Request permissions about this article.

          Request Permissions

          Check for updates

          Qualifiers

          • research-article
          • Research
          • Refereed limited

        PDF Format

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader