MFSRank: An Unsupervised Method to Extract Keyphrases Using Semantic Information

López, Roque Enrique; Barreda, Dennis; Tejada, Javier; Cuadros, Ernesto

doi:10.1007/978-3-642-25324-9_29

Roque Enrique López²¹,
Dennis Barreda²²,
Javier Tejada²² &
…
Ernesto Cuadros²²

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 7094))

Included in the following conference series:

Mexican International Conference on Artificial Intelligence

1364 Accesses
2 Citations

Abstract

This paper presents an unsupervised graph-based method to extract keyphrases using semantic information. The proposed method has two stages. In the first one, we have extracted MFS (Maximal Frequent Sequences) and built the nodes of a graph with them. The weight of the connection between two nodes has been established according to common statistical information and semantic relatedness. In the second stage, we have ranked MFS with traditionally PageRank algorithm; but we have included ConceptNet. This external resource adds an extra weight value between two MFS. The experimental results are competitive with traditional approaches developed in this area. MFSRank overcomes the baseline for top 5 keyphrases in precision, recall and F-score measures.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Jianga, X., Hub, Y., Lib, H.: A ranking Approach to Keyphrase Extraction. In: Proceedings of the 32nd International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR 2009, pp. 756–757 (2009)
Google Scholar
Gelbukh, A., Sidorov, G., Guzmán-Arenas, A.: Use of a Weighted Topic Hierarchy for Document Classification. In: Matoušek, V., Mautner, P., Ocelíková, J., Sojka, P. (eds.) TSD 1999. LNCS (LNAI), vol. 1692, pp. 133–138. Springer, Heidelberg (1999)
Chapter Google Scholar
Ledo Mezquita, Y., Sidorov, G., Gelbukh, A.: Tool for Computer-Aided Spanish Word Sense Disambiguation. In: Gelbukh, A. (ed.) CICLing 2003. LNCS, vol. 2588, pp. 277–280. Springer, Heidelberg (2003)
Chapter Google Scholar
Gelbukh, A., Sidorov, G., Galicia Haro, S., Bolshakov, I.: Environment for Development of a Natural Language Syntactic Analyzer. Acta Academia 2002, 206–213 (2002)
Google Scholar
Kim, S.N., Medelyan, O., Kan, M.Y., Baldwin, T.: SemEval-2010 task 5: Automatic keyphrase extraction from scientific articles. In: Proceedings of the 5th International Workshop on Semantic Evaluation, pp. 21–26 (2010)
Google Scholar
Xiaojun, W., Jianguo, X.: Single document keyphrase extraction using neighborhood knowledge. In: Proceedings of the 23rd National Conference on Artificial Intelligence, vol. 2, pp. 855–860 (2008)
Google Scholar
Rada, M., Paul, T.: TextRank: Bringing order into texts. In: Conference on Empirical Methods in Natural Language Processing, pp. 404–411 (2004)
Google Scholar
Xiaojun, W., Jianwu, Y., Jianguo, X.: Towards an iterative reinforcement approach for simultaneous document summarization and keyword extraction. In: Proceedings of the 45th Annual Meeting of the Association of Computational Linguistics, pp. 552–559 (2007)
Google Scholar
Kazi, S.H., Vincent, N.: Conundrums in unsupervised keyphrase extraction: making sense of the state-of-the-art. In: Proceedings of the 23rd International Conference on Computational Linguistics, pp. 365–373 (2010)
Google Scholar
Roberto, O., David, P., Mireya, T., Héctor, J.: BUAP: An unsupervised approach to automatic keyphrase extraction from scientific articles. In: Proceedings of the 5th International Workshop on Semantic Evaluation (SemEval 2010), pp. 174–177 (2010)
Google Scholar
Page, L., Brin, S., Motwani, R., Winograd, T.: The Pagerank Citation Ranking: Bringing Order to the Web. Technical report, Stanford Digital Libraries (1998)
Google Scholar
Sandra, G., Roxana, D., Paolo, R.: Drug-Drug Interaction Detection: A New Approach Based on Maximal Frequent Sequences. Procesamientto de Lenguje Natural 45 (2010)
Google Scholar
Helena, A.M.: Discovery of Frequent Word Sequences in Text. In: Proceedings of the ESF Exploratory Workshop on Pattern Detection and Discovery, pp. 180–189 (2002)
Google Scholar
Liu, H., Singh, P.: ConceptNet: A Practical Commonsense Reasoning Tool-Kit. BT Technology Journal 22 (2004)
Google Scholar
Liu, H., Singh, P.: Commonsense Reasoning in and Over Natural Language. In: Negoita, M.G., Howlett, R.J., Jain, L.C. (eds.) KES 2004. LNCS (LNAI), vol. 3215, pp. 293–306. Springer, Heidelberg (2004)
Chapter Google Scholar
Ledeneva, Y., Gelbukh, A., García-Hernández, R.: Keeping Maximal Frequent Sequences Facilitates Extractive Summarization. In: Sidorov, G., et al. (eds.) Advances in Computer Science and Engineering, 9th Conference on Computing (CORE 2008), Research in Computing Science, vol. 34, pp. 163–174 (2008)
Google Scholar
Ian, H.W., Gordon, W.P., Eibe, F., Carl, G., Craig, G.: KEA: Practical automatic keyphrase extraction. In: Proceedings of the Fourth ACM Conference on Digital Libraries (DL 1999), pp. 254–255. ACM (1999)
Google Scholar
Chong, H., Yonghong, T., Zhi, Z., Charles, X.L., Tiejun, H.: Keyphrase extraction using semantic networks structure analysis. In: Proc. of the ICDM 2006, pp. 275–284 (2006)
Google Scholar
Peter, D.: Learning Algorithms for Keyphrase Extraction. Inf. Retr. 2(4), 303–336 (2006)
Google Scholar
Porter, M.F.: An Algorithm for Suffix Stripping. Program 14(3), 130–137 (1980)
Article Google Scholar

Download references

Author information

Authors and Affiliations

School of System Engineering, San Agustin National University, Perú
Roque Enrique López
School of Computer Science, San Pablo Catholic University, Perú
Dennis Barreda, Javier Tejada & Ernesto Cuadros

Authors

Roque Enrique López
View author publications
You can also search for this author in PubMed Google Scholar
Dennis Barreda
View author publications
You can also search for this author in PubMed Google Scholar
Javier Tejada
View author publications
You can also search for this author in PubMed Google Scholar
Ernesto Cuadros
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Mexican Petroleum Institute (IMP), Eje Central Lazaro Cardenas Norte, 152, Col. San Bartolo Atepehuacan, CP 07730,, Mexico DF,, Mexico
Ildar Batyrshin
National Polytechnic Institute (IPN), Center for Computing Research (CIC), Av. Juan Dios Bátiz, s/n, Col. Nueva Industrial Vallejo, CP 07738, Mexico D.F., Mexico
Grigori Sidorov

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

López, R.E., Barreda, D., Tejada, J., Cuadros, E. (2011). MFSRank: An Unsupervised Method to Extract Keyphrases Using Semantic Information. In: Batyrshin, I., Sidorov, G. (eds) Advances in Artificial Intelligence. MICAI 2011. Lecture Notes in Computer Science(), vol 7094. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-25324-9_29

Download citation

DOI: https://doi.org/10.1007/978-3-642-25324-9_29
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-25323-2
Online ISBN: 978-3-642-25324-9
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics