Skip to main content

Named Entity Recognition Through Corpus Transformation and System Combination

  • Conference paper
  • First Online:
Advances in Natural Language Processing (EsTAL 2004)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 3230))

Included in the following conference series:

Abstract

In this paper we investigate the way of combining different taggers to improve their performance in the named entity recognition task. The main resources used in our experiments are the publicly available taggers TnT and TBL and a corpus of Spanish texts in which named entities occurrences are tagged with BIO tags. We have defined three transformations that provide us three additional versions of the training corpus. The transformations change either the words or the tags, and the three of them improve the results of TnT and TBL when they are trained with the original version of the corpus. With the four versions of the corpus and the two taggers, we have eight different models that can be combined with several techniques. The experiments carried out show that using machine learning techniques to combine them the performance improves considerably. We improve the baselines for TnT (F β= 1 value of 85.25) and TBL (F β= 1 value of 87.45) up to a value of 90.90 in the best of our experiments.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Brants, T.: TnT. A statistical part-of-speech tagger. In: Proceedings of the 6th Applied NLP Conference (ANLP 2000), USA, pp. 224–231 (2000)

    Google Scholar 

  2. Breiman, L.: Bagging predictors. Machine Learning Journal 24, 123–140 (1996)

    MATH  Google Scholar 

  3. Brill, E.: Transformation-based Error-Driven Learning and Natural Language Processing: A Case Study in Part-of-Speech Tagging. Computational Linguistics 21, 543–565 (1995)

    MathSciNet  Google Scholar 

  4. Carreras, X., Màrquez y, L., Padró, L.: Named Entity Extraction using AdaBoost. In: CoNLL 2002 Computational Natural Language Learning, Taiwan, pp. 167–170 (2002)

    Google Scholar 

  5. Civit, M.: Guía para la anotación morfosintáctica del corpus CLiC-TALP. X-TRACT Working Paper WP-00/06 (2000)

    Google Scholar 

  6. Florian, R., Ittycheriah, A., Jing, H., Zhang, T.: Named Entity Recognition through Classifier Combination. In: Proceedings of CoNLL 2003, Canada, pp. 168–171 (2003)

    Google Scholar 

  7. Halteren, v.H., Zavrel, J., Daelemans, W.: Improving accuracy in word class tagging through the combination of machine learning systems. Computational Linguistics 27, 199–230 (2001)

    Article  Google Scholar 

  8. Henderson, J.C., Brill, E.: Exploiting diversity in natural language processing. Combining parsers. In: 1999 Joint Sigdat Conference on Empirical Methods in Natural Language Processing and Very Large Corpora. ACL, USA, pp. 187–194 (1999)

    Google Scholar 

  9. Pedersen, T.: A simple approach to building ensembles of naive bayesian classifiers for word sense disambiguation. In: Proceedings of NAACL 2000, USA, pp. 63–69 (2000)

    Google Scholar 

  10. Quinlan, J.R.: Induction of decision trees. Machine Learning 1, 81–106 (1986)

    Google Scholar 

  11. Rössler, M.: Using Markov Models for Named Entity recognition in German newspapers. In: Proceedings of the Workshop on Machine Learning Approaches in Computational Linguistics, Italy, pp. 29–37 (2002)

    Google Scholar 

  12. Tjong Kim Sang, E.F., Daelemans, W., Dejean, H., Koeling, R., Krymolowsky, Y., Punyakanok, V., Roth, D.: Applying system combination to base noun phrase identification. In: Proceedings of COLING 2000, Germany, pp. 857–863 (2000)

    Google Scholar 

  13. Sang, E.F.T.K.: Introduction to the CoNLL-2002 Shared Task: Language-Independent Named Entity Recognition. In: Proceedings of CoNLL 2002, Taiwan, pp. 155–158 (2002)

    Google Scholar 

  14. Witten, I.H., Frank, E.: Data Mining. In: Machine Learning Algorithms in Java, Morgan Kaufmann Publishers, San Francisco (2000)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2004 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Troyano, J.A., Carrillo, V., Enríquez, F., Galán, F.J. (2004). Named Entity Recognition Through Corpus Transformation and System Combination. In: Vicedo, J.L., Martínez-Barco, P., Muńoz, R., Saiz Noeda, M. (eds) Advances in Natural Language Processing. EsTAL 2004. Lecture Notes in Computer Science(), vol 3230. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-30228-5_23

Download citation

  • DOI: https://doi.org/10.1007/978-3-540-30228-5_23

  • Published:

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-23498-2

  • Online ISBN: 978-3-540-30228-5

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics