Entity Extraction within Plain-Text Collections WISE 2013 Challenge - T1: Entity Linking Track

Abreu, Carolina; Costa, Flávio; Santos, Laécio; Monteiro, Lucas; de Oliveira, Luiz Fernando Peres; Lustosa, Patrícia; Weigang, Li

doi:10.1007/978-3-642-41230-1_42

Entity Extraction within Plain-Text Collections WISE 2013 Challenge - T1: Entity Linking Track

Carolina Abreu²⁰,
Flávio Costa²⁰,
Laécio Santos²⁰,
Lucas Monteiro²⁰,
Luiz Fernando Peres de Oliveira²⁰,
Patrícia Lustosa²⁰ &
…
Li Weigang²⁰

Conference paper

1979 Accesses
1 Citations

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 8180))

Abstract

The WISE 2013 conference proposed a challenge (T1 Track) in which teams must label entities within plain texts based on Wikilinks dataset which comprises 40 million mentions over 3 million existed entities. This paper describe a straightforward two-fold unsupervised strategy to extract and tag entities, aiming to achieve accurate results in the identification of proper nouns and concrete concepts, regardless the domain. The proposed solution is based on a pipeline of text processing modules that includes a lexical parser. The solution labelled 8824 texts, and the results achieved satisfying precision measures.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Ruiz-Casado, M., Alfonseca, E., Okumura, M., Castells, P.: Information Extraction and Semantic Annotation of Wikipedia. In: Proceeding of the 2008 Conference on Ontology Learning and Population: Bridging the Gap between Text and Knowledge, June 16, pp. 145–169 (2008)
Google Scholar
Shaalan, K., Raza, H.: NERA: Named Entity Recognition for Arabic. J. Am. Soc. Inf. Sci. Technol. 60(8), 1652–1663 (2009)
Article Google Scholar
Singh, S., Subramanya, A., Pereira, F., McCallum, A.: Wikilinks: A Large-scale Cross-Document Coreference Corpus Labeled via Links to Wikipedia. CMPSCI Technical Report, UM-CS-2012-015, University of Massachusetts Amherst (2012)
Google Scholar
Ratinov, L., Roth, D., Downey, D., Anderson, M.: Local and Global Algorithms for Disambiguation to Wikipedia. Computational Linguistics 1, 1375–1384 (2011)
Google Scholar
Bunescu, R., Pasca, M.: Using Encyclopedic Knowledge for Named Entity Disambiguation. In: Proceedings of EACL, vol. 6, pp. 9–16. ACL (2006)
Google Scholar
Cardie, C.: Empirical Methods in Information Extraction. AI Magazine 18(4), 65–79 (1997)
Google Scholar
Marcus, M.P., Beatrice, S., Marcinkiewicz, M.A.: Building a large annotated corpus of English: the Penn Treebank. Computational Linguistics 19, 313–330 (1994)
Google Scholar

Download references

Author information

Authors and Affiliations

University of Brasilia, Brasilia, Brazil
Carolina Abreu, Flávio Costa, Laécio Santos, Lucas Monteiro, Luiz Fernando Peres de Oliveira, Patrícia Lustosa & Li Weigang

Authors

Carolina Abreu
View author publications
You can also search for this author in PubMed Google Scholar
Flávio Costa
View author publications
You can also search for this author in PubMed Google Scholar
Laécio Santos
View author publications
You can also search for this author in PubMed Google Scholar
Lucas Monteiro
View author publications
You can also search for this author in PubMed Google Scholar
Luiz Fernando Peres de Oliveira
View author publications
You can also search for this author in PubMed Google Scholar
Patrícia Lustosa
View author publications
You can also search for this author in PubMed Google Scholar
Li Weigang
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

The University of New South Wales, Sydney, NSW, Australia
Xuemin Lin
Aristotle University of Thessaloniki, Thessaloniki, Greece
Yannis Manolopoulos
AT&T Labs-Research, Florham Park, NJ, USA
Divesh Srivastava
Victoria University, Melbourne, Australia
Guangyan Huang

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Abreu, C. et al. (2013). Entity Extraction within Plain-Text Collections WISE 2013 Challenge - T1: Entity Linking Track. In: Lin, X., Manolopoulos, Y., Srivastava, D., Huang, G. (eds) Web Information Systems Engineering – WISE 2013. WISE 2013. Lecture Notes in Computer Science, vol 8180. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-41230-1_42

Download citation

DOI: https://doi.org/10.1007/978-3-642-41230-1_42
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-41229-5
Online ISBN: 978-3-642-41230-1
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics