Skip to main content

Entity Extraction within Plain-Text Collections WISE 2013 Challenge - T1: Entity Linking Track

  • Conference paper

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 8180))

Abstract

The WISE 2013 conference proposed a challenge (T1 Track) in which teams must label entities within plain texts based on Wikilinks dataset which comprises 40 million mentions over 3 million existed entities. This paper describe a straightforward two-fold unsupervised strategy to extract and tag entities, aiming to achieve accurate results in the identification of proper nouns and concrete concepts, regardless the domain. The proposed solution is based on a pipeline of text processing modules that includes a lexical parser. The solution labelled 8824 texts, and the results achieved satisfying precision measures.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Ruiz-Casado, M., Alfonseca, E., Okumura, M., Castells, P.: Information Extraction and Semantic Annotation of Wikipedia. In: Proceeding of the 2008 Conference on Ontology Learning and Population: Bridging the Gap between Text and Knowledge, June 16, pp. 145–169 (2008)

    Google Scholar 

  2. Shaalan, K., Raza, H.: NERA: Named Entity Recognition for Arabic. J. Am. Soc. Inf. Sci. Technol. 60(8), 1652–1663 (2009)

    Article  Google Scholar 

  3. Singh, S., Subramanya, A., Pereira, F., McCallum, A.: Wikilinks: A Large-scale Cross-Document Coreference Corpus Labeled via Links to Wikipedia. CMPSCI Technical Report, UM-CS-2012-015, University of Massachusetts Amherst (2012)

    Google Scholar 

  4. Ratinov, L., Roth, D., Downey, D., Anderson, M.: Local and Global Algorithms for Disambiguation to Wikipedia. Computational Linguistics 1, 1375–1384 (2011)

    Google Scholar 

  5. Bunescu, R., Pasca, M.: Using Encyclopedic Knowledge for Named Entity Disambiguation. In: Proceedings of EACL, vol. 6, pp. 9–16. ACL (2006)

    Google Scholar 

  6. Cardie, C.: Empirical Methods in Information Extraction. AI Magazine 18(4), 65–79 (1997)

    Google Scholar 

  7. Marcus, M.P., Beatrice, S., Marcinkiewicz, M.A.: Building a large annotated corpus of English: the Penn Treebank. Computational Linguistics 19, 313–330 (1994)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2013 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Abreu, C. et al. (2013). Entity Extraction within Plain-Text Collections WISE 2013 Challenge - T1: Entity Linking Track. In: Lin, X., Manolopoulos, Y., Srivastava, D., Huang, G. (eds) Web Information Systems Engineering – WISE 2013. WISE 2013. Lecture Notes in Computer Science, vol 8180. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-41230-1_42

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-41230-1_42

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-41229-5

  • Online ISBN: 978-3-642-41230-1

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics