skip to main content
10.1145/2993318.2993323acmotherconferencesArticle/Chapter ViewAbstractPublication PagessemanticsConference Proceedingsconference-collections
research-article

A Supervised KeyPhrase Extraction System

Authors Info & Claims
Published:12 September 2016Publication History

ABSTRACT

In this paper, we present a multi-featured supervised automatic keyword extraction system. We extracted salient semantic features which are descriptive of candidate keyphrases, a Random Forest classifier was used for training. The system achieved an accuracy of 58.3 % precision and has shown to outperform two top performing systems when benchmarked on a crowdsourced dataset. Furthermore, our approach achieved a personal best Precision and F-measure score of 32.7 and 25.5 respectively on the Semeval Keyphrase extraction challenge dataset. The paper describes the approaches used as well as the result obtained.

References

  1. D. M. Blei, A. Y. Ng, and M. I. Jordan. Latent dirichlet allocation. the Journal of machine Learning research, 3:993--1022, 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. G. Boella, L. Di Caro, A. Ruggeri, and L. Robaldo. Learning from syntax generalizations for automatic semantic annotation. Journal of Intelligent Information Systems, 43(2):231--246, 2014. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. L. Breiman. Random forests. Machine learning, 45(1):5--32, 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. M. Cataldi, L. D. Caro, and C. Schifanella. Personalized emerging topic detection based on a term aging model. ACM Transactions on Intelligent Systems and Technology (TIST), 5(1):7, 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. J. Chuang, C. D. Manning, and J. Heer. Şwithout the clutter of unimportant wordsŤ: Descriptive keyphrases for text visualization. ACM Transactions on Computer-Human Interaction (TOCHI), 19(3):19, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. L. Di Caro, K. S. Candan, and M. L. Sapino. Navigating within news collections using tag-flakes. Journal of Visual Languages & Computing, 22(2):120--139, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. M. J. Giarlo. A comparative analysis of keyword extraction techniques. 2005.Google ScholarGoogle Scholar
  8. K. S. Hasan and V. Ng. Automatic keyphrase extraction: A survey of the state of the art. In ACL (1), pages 1262--1273, 2014.Google ScholarGoogle Scholar
  9. M. A. Hearst. Texttiling: Segmenting text into multi-paragraph subtopic passages. Computational linguistics, 23(1):33--64, 1997. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. A. Hulth. Improved automatic keyword extraction given more linguistic knowledge. In Proceedings of the 2003 conference on Empirical methods in natural language processing, pages 216--223. Association for Computational Linguistics, 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. A. Hulth and B. B. Megyesi. A study on automatically extracted keywords in text categorization. In Proceedings of the 21st International Conference on Computational Linguistics and the 44th annual meeting of the Association for Computational Linguistics, pages 537--544. Association for Computational Linguistics, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. S. N. Kim, O. Medelyan, M.-Y. Kan, and T. Baldwin. Semeval-2010 task 5: Automatic keyphrase extraction from scientific articles. In Proceedings of the 5th International Workshop on Semantic Evaluation, pages 21--26. Association for Computational Linguistics, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. A. Liaw and M. Wiener. Classification and regression by randomforest. R news, 2(3):18--22, 2002.Google ScholarGoogle Scholar
  14. Z. Liu, W. Huang, Y. Zheng, and M. Sun. Automatic keyphrase extraction via topic decomposition. In Proceedings of the 2010 conference on empirical methods in natural language processing, pages 366--376. Association for Computational Linguistics, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. L. Marujo, A. Gershman, J. Carbonell, R. Frederking, and J. P. Neto. Supervised topical key phrase extraction of news stories using crowdsourcing, light filtering and co-reference normalization. arXiv preprint arXiv:1306.4886, 2013.Google ScholarGoogle Scholar
  16. Y. Matsuo and M. Ishizuka. Keyword extraction from a single document using word co-occurrence statistical information. International Journal on Artificial Intelligence Tools, 13(01):157--169, 2004.Google ScholarGoogle ScholarCross RefCross Ref
  17. O. Medelyan. Human-competitive automatic topic indexing. PhD thesis, The University of Waikato, 2009.Google ScholarGoogle Scholar
  18. O. Medelyan, V. Perrone, and I. H. Witten. Subject metadata support powered by maui. In Proceedings of the 10th annual joint conference on Digital libraries, pages 407--408. ACM, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. R. Mihalcea and P. Tarau. Textrank: Bringing order into texts. Association for Computational Linguistics, 2004.Google ScholarGoogle Scholar
  20. T. Mikolov, K. Chen, G. Corrado, and J. Dean. Efficient estimation of word representations in vector space. arXiv preprint arXiv:1301.3781, 2013.Google ScholarGoogle Scholar
  21. T. Mikolov, I. Sutskever, K. Chen, G. S. Corrado, and J. Dean. Distributed representations of words and phrases and their compositionality. In Advances in neural information processing systems, pages 3111--3119, 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. T. D. Nguyen and M.-Y. Kan. Keyphrase extraction in scientific publications. In Asian Digital Libraries. Looking Back 10 Years and Forging New Frontiers, pages 317--326. Springer, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. S. Rose, D. Engel, N. Cramer, and W. Cowley. Automatic keyword extraction from individual documents. Text Mining, pages 1--20, 2010.Google ScholarGoogle ScholarCross RefCross Ref
  24. P. Turney. Learning to extract keyphrases from text. 1999.Google ScholarGoogle Scholar
  25. P. D. Turney. Learning algorithms for keyphrase extraction. Information Retrieval, 2(4):303--336, 2000. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. P. D. Turney, P. Pantel, et al. From frequency to meaning: Vector space models of semantics. Journal of artificial intelligence research, 37(1):141--188, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. L. Van Der Plas, V. Pallotta, M. Rajman, and H. Ghorbel. Automatic keyword extraction from spoken text. a comparison of two lexical resources: the edr and wordnet. arXiv preprint cs/0410062, 2004.Google ScholarGoogle Scholar
  28. I. H. Witten, G. W. Paynter, E. Frank, C. Gutwin, and C. G. Nevill-Manning. Kea: Practical automatic keyphrase extraction. In Proceedings of the fourth ACM conference on Digital libraries, pages 254--255. ACM, 1999. Google ScholarGoogle ScholarDigital LibraryDigital Library
  1. A Supervised KeyPhrase Extraction System

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in
      • Published in

        cover image ACM Other conferences
        SEMANTiCS 2016: Proceedings of the 12th International Conference on Semantic Systems
        September 2016
        207 pages
        ISBN:9781450347525
        DOI:10.1145/2993318

        Copyright © 2016 ACM

        Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        • Published: 12 September 2016

        Permissions

        Request permissions about this article.

        Request Permissions

        Check for updates

        Qualifiers

        • research-article
        • Research
        • Refereed limited

        Acceptance Rates

        SEMANTiCS 2016 Paper Acceptance Rate18of85submissions,21%Overall Acceptance Rate40of182submissions,22%

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader