ABSTRACT
In order to improve the performance of keyword extraction by enhancing the semantic representations of documents, we propose a method of keyword extraction which exploits the document's internal semantic information and the semantic representations of words pre-trained by massive external documents. Firstly, we utilize the deep learning tool Word2Vec to characterize the external document information, and evaluate the similarity between the words by the cosine distance, thus we obtain the semantic information between words in the external documents. Then, the word-to-word similarity is used to replace the probability transfer matrix in the TextRank of word graph of the target document. At the same time, the information of the title and the abstract of the internal document are exploited to construct the words' semantic graph for keyword extraction. The experiments select the related academic paper data from AMiner as experimental data set. The experimental results show that our method outperforms the TextRank algorithm and the precision, recall and F-score of the five keywords are increased by 28.60%, 10.70% and 12.90% respectively compared to the single TextRank algorithm.
- Mihalcea, Rada., and Tarau, Paul. 2004. Textrank: bringing order into texts. Emnlp, 404--411. DOI= http://dx.doi.org/Google Scholar
- Tian, Xia. 2013. Study on keyword extraction using word position weighted textrank. New Technology of Library and Information Service. 237 (09): 30--34.Google Scholar
- Jianfei, Ning., and Jiangzhen, L. 2016. Using word2vec with text rank to extract keywords. New Technology of Library and Information Service.271 (06): 20--26.Google Scholar
- Qifei, Liu., and Weiyu, Sheng. 2018. Research of keyword extraction of political news based on word2vec and textrank. Information Research. 248 (06): 26--31. (In Chinese)Google Scholar
- Zhang, Kuo., Xu, Hui., Tang, Jie., and Juangzi, Li. 2006. Keyword Extraction Using Support Vector Machine. International Conference on Advances in Web-age Information Management. Springer-Verlag. 85--96. DOI=https://doi.org/10.1007/11775300_8Google ScholarDigital Library
- Liang, Hu., Lei, Xia., and Wei, Li. 2017. Keyword Extraction System Based on Improved TF-IDF Algorithm. Journal of Xiamen University of Technology. 25 (05): 73--78. (In Chinese)Google Scholar
- Qiang, Jia., et al. 2017. Research on Improved TF-IDF Text Feature Word Extraction Algorithm. Journal of Liaoning University of Petroleum & Chemical Technology. 37 (4): 23--29. (In Chinese)Google Scholar
- Blei, David M., et al. 2003. Latent dirichlet allocation. Journal of Machine Learning Research. 3993--1022.Google Scholar
- Alokaili, Areej., Aletras, Nikolaos., Stevenson, Mark. 2019. Re-ranking words to improve interpretability of automatically generated topics. In Proceedings of the 13th International Conference on Computational Semantics - Long Papers. 43--54. DOI=10.18653 / v1 / W19-0404Google ScholarCross Ref
- Yijun, Guang., and Tian, Xia. 2014. Study on keyword extraction with lda and textrank combination. New Technology of Library and Information Service. 41--47.Google Scholar
- Negi, Sumit. 2014. Document Keyphrase Extraction Using Label Information. In Proceedings of {COLING} 2014, the 25th International Conference on Computational Linguistics: Technical Papers. 1468--1476.Google Scholar
- Jun, Chen., et al. 2018. Keyphrase Generation with Correlation Constraints. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing. 4057--4066. DOI=10.18653/v1/D18-1439Google Scholar
- Nicosia, Massimo., and Moschitti, Alessandro. 2018. Semantic Linking in Convolutional Neural Networks for Answer Sentence Selection. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing. 1070--1076. DOI= 10.18653/v1/D18-1133Google ScholarCross Ref
- Timothy, Niven. and Hung-Yu, Kao. 2019. Probing Neural Network Comprehension of Natural Language Arguments. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics. 4658--4664. DOI=10.18653/v1/P19-1459Google Scholar
- Guangyi, Li., and Houfeng, Wang. 2014. Improved automatic keyword extraction based on textrank using domain knowledge. Communications in Computer & Information Science. 496:403--413.Google ScholarCross Ref
- Xiao-Lei, Bai., Guang-Jun, Huang., and Jian-Hui, Duan. 2014. A keyword extraction method based on bp neural network. Journal of Hefei University of Technology (Natural Science). 37(07):807--811.Google Scholar
- Johannes, Villmow., Marco, Wrzalik., Dirk, Krechel. 2018. Automatic Keyphrase Extraction Using Recurrent Neural Networks. M. Machine Learning and Data Mining in Pattern Recognition.Google Scholar
- Page Lawrence., et al. 1999.The PageRank Citation Ranking: Bringing Order to the Web. R. Stanford InfoLab.Google Scholar
Index Terms
- Keywords Extraction Based on Word2Vec and TextRank
Recommendations
Unsupervised extraction of keywords from news archives
LTC'09: Proceedings of the 4th conference on Human language technology: challenges for computer science and linguisticsWe present a comparison of four unsupervised algorithms to automatically acquire the set of keywords that best characterise a particular multimedia archive: the Belga News Archive. Such keywords provide the basis of a controlled vocabulary for indexing ...
Sentiment Thesaurus, Synset and Word2Vec Based Improvement in Bigram Model for Classifying Product Reviews
AbstractClassifying product reviews is one of the tasks in Natural Language Processing by which the sentiment of the reviewer towards a product can be identified. This identification is useful for the growth of the business by increasing the number of ...
Word Embedding in Nepali Language using Word2Vec
NLPIR '22: Proceedings of the 2022 6th International Conference on Natural Language Processing and Information RetrievalWord embedding is a technique for understanding the relationship among words by mapping words to numbers. Several kinds of research have been carried out in this field in different languages such as English, Hindi, Bengali etc. but very few works are ...
Comments