ABSTRACT
In this paper, we present a multi-featured supervised automatic keyword extraction system. We extracted salient semantic features which are descriptive of candidate keyphrases, a Random Forest classifier was used for training. The system achieved an accuracy of 58.3 % precision and has shown to outperform two top performing systems when benchmarked on a crowdsourced dataset. Furthermore, our approach achieved a personal best Precision and F-measure score of 32.7 and 25.5 respectively on the Semeval Keyphrase extraction challenge dataset. The paper describes the approaches used as well as the result obtained.
- D. M. Blei, A. Y. Ng, and M. I. Jordan. Latent dirichlet allocation. the Journal of machine Learning research, 3:993--1022, 2003. Google ScholarDigital Library
- G. Boella, L. Di Caro, A. Ruggeri, and L. Robaldo. Learning from syntax generalizations for automatic semantic annotation. Journal of Intelligent Information Systems, 43(2):231--246, 2014. Google ScholarDigital Library
- L. Breiman. Random forests. Machine learning, 45(1):5--32, 2001. Google ScholarDigital Library
- M. Cataldi, L. D. Caro, and C. Schifanella. Personalized emerging topic detection based on a term aging model. ACM Transactions on Intelligent Systems and Technology (TIST), 5(1):7, 2013. Google ScholarDigital Library
- J. Chuang, C. D. Manning, and J. Heer. Şwithout the clutter of unimportant wordsŤ: Descriptive keyphrases for text visualization. ACM Transactions on Computer-Human Interaction (TOCHI), 19(3):19, 2012. Google ScholarDigital Library
- L. Di Caro, K. S. Candan, and M. L. Sapino. Navigating within news collections using tag-flakes. Journal of Visual Languages & Computing, 22(2):120--139, 2011. Google ScholarDigital Library
- M. J. Giarlo. A comparative analysis of keyword extraction techniques. 2005.Google Scholar
- K. S. Hasan and V. Ng. Automatic keyphrase extraction: A survey of the state of the art. In ACL (1), pages 1262--1273, 2014.Google Scholar
- M. A. Hearst. Texttiling: Segmenting text into multi-paragraph subtopic passages. Computational linguistics, 23(1):33--64, 1997. Google ScholarDigital Library
- A. Hulth. Improved automatic keyword extraction given more linguistic knowledge. In Proceedings of the 2003 conference on Empirical methods in natural language processing, pages 216--223. Association for Computational Linguistics, 2003. Google ScholarDigital Library
- A. Hulth and B. B. Megyesi. A study on automatically extracted keywords in text categorization. In Proceedings of the 21st International Conference on Computational Linguistics and the 44th annual meeting of the Association for Computational Linguistics, pages 537--544. Association for Computational Linguistics, 2006. Google ScholarDigital Library
- S. N. Kim, O. Medelyan, M.-Y. Kan, and T. Baldwin. Semeval-2010 task 5: Automatic keyphrase extraction from scientific articles. In Proceedings of the 5th International Workshop on Semantic Evaluation, pages 21--26. Association for Computational Linguistics, 2010. Google ScholarDigital Library
- A. Liaw and M. Wiener. Classification and regression by randomforest. R news, 2(3):18--22, 2002.Google Scholar
- Z. Liu, W. Huang, Y. Zheng, and M. Sun. Automatic keyphrase extraction via topic decomposition. In Proceedings of the 2010 conference on empirical methods in natural language processing, pages 366--376. Association for Computational Linguistics, 2010. Google ScholarDigital Library
- L. Marujo, A. Gershman, J. Carbonell, R. Frederking, and J. P. Neto. Supervised topical key phrase extraction of news stories using crowdsourcing, light filtering and co-reference normalization. arXiv preprint arXiv:1306.4886, 2013.Google Scholar
- Y. Matsuo and M. Ishizuka. Keyword extraction from a single document using word co-occurrence statistical information. International Journal on Artificial Intelligence Tools, 13(01):157--169, 2004.Google ScholarCross Ref
- O. Medelyan. Human-competitive automatic topic indexing. PhD thesis, The University of Waikato, 2009.Google Scholar
- O. Medelyan, V. Perrone, and I. H. Witten. Subject metadata support powered by maui. In Proceedings of the 10th annual joint conference on Digital libraries, pages 407--408. ACM, 2010. Google ScholarDigital Library
- R. Mihalcea and P. Tarau. Textrank: Bringing order into texts. Association for Computational Linguistics, 2004.Google Scholar
- T. Mikolov, K. Chen, G. Corrado, and J. Dean. Efficient estimation of word representations in vector space. arXiv preprint arXiv:1301.3781, 2013.Google Scholar
- T. Mikolov, I. Sutskever, K. Chen, G. S. Corrado, and J. Dean. Distributed representations of words and phrases and their compositionality. In Advances in neural information processing systems, pages 3111--3119, 2013. Google ScholarDigital Library
- T. D. Nguyen and M.-Y. Kan. Keyphrase extraction in scientific publications. In Asian Digital Libraries. Looking Back 10 Years and Forging New Frontiers, pages 317--326. Springer, 2007. Google ScholarDigital Library
- S. Rose, D. Engel, N. Cramer, and W. Cowley. Automatic keyword extraction from individual documents. Text Mining, pages 1--20, 2010.Google ScholarCross Ref
- P. Turney. Learning to extract keyphrases from text. 1999.Google Scholar
- P. D. Turney. Learning algorithms for keyphrase extraction. Information Retrieval, 2(4):303--336, 2000. Google ScholarDigital Library
- P. D. Turney, P. Pantel, et al. From frequency to meaning: Vector space models of semantics. Journal of artificial intelligence research, 37(1):141--188, 2010. Google ScholarDigital Library
- L. Van Der Plas, V. Pallotta, M. Rajman, and H. Ghorbel. Automatic keyword extraction from spoken text. a comparison of two lexical resources: the edr and wordnet. arXiv preprint cs/0410062, 2004.Google Scholar
- I. H. Witten, G. W. Paynter, E. Frank, C. Gutwin, and C. G. Nevill-Manning. Kea: Practical automatic keyphrase extraction. In Proceedings of the fourth ACM conference on Digital libraries, pages 254--255. ACM, 1999. Google ScholarDigital Library
- A Supervised KeyPhrase Extraction System
Recommendations
Domain-specific keyphrase extraction
CIKM '05: Proceedings of the 14th ACM international conference on Information and knowledge managementDocument keyphrases provide semantic metadata characterizing documents and producing an overview of the content of a document. They can be used in many text-mining and knowledge management related applications. This paper describes a Keyphrase ...
Keyphrase extraction from single textual documents based on semantically defined background knowledge and co-occurrence graphs
The keyphrase extraction task is a fundamental and challenging task designed to extract a set of keyphrases from textual documents. Keyphrases are essential to assist publishers in indexing documents and readers in identifying the most relevant ones. They ...
Supervised Learning of Keyphrase Extraction Utilizing Prior Summarization
Towards Open and Trustworthy Digital SocietiesAbstractKeyphrase extraction is the task of selecting a set of phrases that can best represent a given document. Keyphrase extraction is utilized in document indexing and categorization, thus being one of core technologies of digital libraries. Supervised ...
Comments