research-article

A Supervised KeyPhrase Extraction System

Authors:
Adebayo Kolawole John

Interdisciplinary Centre for Security, Reliability and Trust, University of Luxembourg, Luxembourg

Interdisciplinary Centre for Security, Reliability and Trust, University of Luxembourg, Luxembourg
View Profile

,
Luigi Di Caro

Department of Computer Science, University of Torino, Torino, Italy

Department of Computer Science, University of Torino, Torino, Italy
View Profile

,
Guido Boella

Department of Computer Science, University of Torino, Torino, Italy

Department of Computer Science, University of Torino, Torino, Italy
View Profile

SEMANTiCS 2016: Proceedings of the 12th International Conference on Semantic SystemsSeptember 2016Pages 57–62https://doi.org/10.1145/2993318.2993323

Published:12 September 2016Publication History

SEMANTiCS 2016: Proceedings of the 12th International Conference on Semantic Systems

Pages 57–62

ABSTRACT

In this paper, we present a multi-featured supervised automatic keyword extraction system. We extracted salient semantic features which are descriptive of candidate keyphrases, a Random Forest classifier was used for training. The system achieved an accuracy of 58.3 % precision and has shown to outperform two top performing systems when benchmarked on a crowdsourced dataset. Furthermore, our approach achieved a personal best Precision and F-measure score of 32.7 and 25.5 respectively on the Semeval Keyphrase extraction challenge dataset. The paper describes the approaches used as well as the result obtained.

References

D. M. Blei, A. Y. Ng, and M. I. Jordan. Latent dirichlet allocation. the Journal of machine Learning research, 3:993--1022, 2003. Google ScholarDigital Library
G. Boella, L. Di Caro, A. Ruggeri, and L. Robaldo. Learning from syntax generalizations for automatic semantic annotation. Journal of Intelligent Information Systems, 43(2):231--246, 2014. Google ScholarDigital Library
L. Breiman. Random forests. Machine learning, 45(1):5--32, 2001. Google ScholarDigital Library
M. Cataldi, L. D. Caro, and C. Schifanella. Personalized emerging topic detection based on a term aging model. ACM Transactions on Intelligent Systems and Technology (TIST), 5(1):7, 2013. Google ScholarDigital Library
J. Chuang, C. D. Manning, and J. Heer. Şwithout the clutter of unimportant wordsŤ: Descriptive keyphrases for text visualization. ACM Transactions on Computer-Human Interaction (TOCHI), 19(3):19, 2012. Google ScholarDigital Library
L. Di Caro, K. S. Candan, and M. L. Sapino. Navigating within news collections using tag-flakes. Journal of Visual Languages & Computing, 22(2):120--139, 2011. Google ScholarDigital Library
M. J. Giarlo. A comparative analysis of keyword extraction techniques. 2005.Google Scholar
K. S. Hasan and V. Ng. Automatic keyphrase extraction: A survey of the state of the art. In ACL (1), pages 1262--1273, 2014.Google Scholar
M. A. Hearst. Texttiling: Segmenting text into multi-paragraph subtopic passages. Computational linguistics, 23(1):33--64, 1997. Google ScholarDigital Library
A. Hulth. Improved automatic keyword extraction given more linguistic knowledge. In Proceedings of the 2003 conference on Empirical methods in natural language processing, pages 216--223. Association for Computational Linguistics, 2003. Google ScholarDigital Library
A. Hulth and B. B. Megyesi. A study on automatically extracted keywords in text categorization. In Proceedings of the 21st International Conference on Computational Linguistics and the 44th annual meeting of the Association for Computational Linguistics, pages 537--544. Association for Computational Linguistics, 2006. Google ScholarDigital Library
S. N. Kim, O. Medelyan, M.-Y. Kan, and T. Baldwin. Semeval-2010 task 5: Automatic keyphrase extraction from scientific articles. In Proceedings of the 5th International Workshop on Semantic Evaluation, pages 21--26. Association for Computational Linguistics, 2010. Google ScholarDigital Library
A. Liaw and M. Wiener. Classification and regression by randomforest. R news, 2(3):18--22, 2002.Google Scholar
Z. Liu, W. Huang, Y. Zheng, and M. Sun. Automatic keyphrase extraction via topic decomposition. In Proceedings of the 2010 conference on empirical methods in natural language processing, pages 366--376. Association for Computational Linguistics, 2010. Google ScholarDigital Library
L. Marujo, A. Gershman, J. Carbonell, R. Frederking, and J. P. Neto. Supervised topical key phrase extraction of news stories using crowdsourcing, light filtering and co-reference normalization. arXiv preprint arXiv:1306.4886, 2013.Google Scholar
Y. Matsuo and M. Ishizuka. Keyword extraction from a single document using word co-occurrence statistical information. International Journal on Artificial Intelligence Tools, 13(01):157--169, 2004.Google ScholarCross Ref
O. Medelyan. Human-competitive automatic topic indexing. PhD thesis, The University of Waikato, 2009.Google Scholar
O. Medelyan, V. Perrone, and I. H. Witten. Subject metadata support powered by maui. In Proceedings of the 10th annual joint conference on Digital libraries, pages 407--408. ACM, 2010. Google ScholarDigital Library
R. Mihalcea and P. Tarau. Textrank: Bringing order into texts. Association for Computational Linguistics, 2004.Google Scholar
T. Mikolov, K. Chen, G. Corrado, and J. Dean. Efficient estimation of word representations in vector space. arXiv preprint arXiv:1301.3781, 2013.Google Scholar
T. Mikolov, I. Sutskever, K. Chen, G. S. Corrado, and J. Dean. Distributed representations of words and phrases and their compositionality. In Advances in neural information processing systems, pages 3111--3119, 2013. Google ScholarDigital Library
T. D. Nguyen and M.-Y. Kan. Keyphrase extraction in scientific publications. In Asian Digital Libraries. Looking Back 10 Years and Forging New Frontiers, pages 317--326. Springer, 2007. Google ScholarDigital Library
S. Rose, D. Engel, N. Cramer, and W. Cowley. Automatic keyword extraction from individual documents. Text Mining, pages 1--20, 2010.Google ScholarCross Ref
P. Turney. Learning to extract keyphrases from text. 1999.Google Scholar
P. D. Turney. Learning algorithms for keyphrase extraction. Information Retrieval, 2(4):303--336, 2000. Google ScholarDigital Library
P. D. Turney, P. Pantel, et al. From frequency to meaning: Vector space models of semantics. Journal of artificial intelligence research, 37(1):141--188, 2010. Google ScholarDigital Library
L. Van Der Plas, V. Pallotta, M. Rajman, and H. Ghorbel. Automatic keyword extraction from spoken text. a comparison of two lexical resources: the edr and wordnet. arXiv preprint cs/0410062, 2004.Google Scholar
I. H. Witten, G. W. Paynter, E. Frank, C. Gutwin, and C. G. Nevill-Manning. Kea: Practical automatic keyphrase extraction. In Proceedings of the fourth ACM conference on Digital libraries, pages 254--255. ACM, 1999. Google ScholarDigital Library

A Supervised KeyPhrase Extraction System
1. Computing methodologies
  1. Artificial intelligence
2. Information systems
  1. Information retrieval

Recommendations

Domain-specific keyphrase extraction
CIKM '05: Proceedings of the 14th ACM international conference on Information and knowledge management

Document keyphrases provide semantic metadata characterizing documents and producing an overview of the content of a document. They can be used in many text-mining and knowledge management related applications. This paper describes a Keyphrase ...
Read More
Keyphrase extraction from single textual documents based on semantically defined background knowledge and co-occurrence graphs

The keyphrase extraction task is a fundamental and challenging task designed to extract a set of keyphrases from textual documents. Keyphrases are essential to assist publishers in indexing documents and readers in identifying the most relevant ones. They ...
Read More
Supervised Learning of Keyphrase Extraction Utilizing Prior Summarization
Towards Open and Trustworthy Digital Societies
Abstract
Keyphrase extraction is the task of selecting a set of phrases that can best represent a given document. Keyphrase extraction is utilized in document indexing and categorization, thus being one of core technologies of digital libraries. Supervised ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
SEMANTiCS 2016: Proceedings of the 12th International Conference on Semantic Systems
September 2016
207 pages
ISBN:9781450347525
DOI:10.1145/2993318
Editors:
Anna Fensel
STI Innsbruck, University of Innsbruck, Austria
,
Amrapali Zaveri
Stanford University, USA
,
Sebastian Hellmann
AKSW/KILT, Institute for Applied Informatics (InfAI), Leipzig, Germany
,
Tassilo Pellegrini
University of Applied Sciences St. Poelten, Austria
Copyright © 2016 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 12 September 2016
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
Keywords
Random Forest
keyphrase
Qualifiers
- research-article
- Research
- Refereed limited
Conference

Acceptance Rates
SEMANTiCS 2016 Paper Acceptance Rate18of85submissions,21%Overall Acceptance Rate40of182submissions,22%
More
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 8
  Total Citations
  View Citations
- 268
  Total Downloads
- Downloads (Last 12 months)4
- Downloads (Last 6 weeks)0
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

A Supervised KeyPhrase Extraction System

SEMANTiCS 2016: Proceedings of the 12th International Conference on Semantic Systems

ABSTRACT

References

Cited By

Recommendations

Domain-specific keyphrase extraction

Keyphrase extraction from single textual documents based on semantically defined background knowledge and co-occurrence graphs

Supervised Learning of Keyphrase Extraction Utilizing Prior Summarization