Abstract
In this paper we investigate the way of combining different taggers to improve their performance in the named entity recognition task. The main resources used in our experiments are the publicly available taggers TnT and TBL and a corpus of Spanish texts in which named entities occurrences are tagged with BIO tags. We have defined three transformations that provide us three additional versions of the training corpus. The transformations change either the words or the tags, and the three of them improve the results of TnT and TBL when they are trained with the original version of the corpus. With the four versions of the corpus and the two taggers, we have eight different models that can be combined with several techniques. The experiments carried out show that using machine learning techniques to combine them the performance improves considerably. We improve the baselines for TnT (F β= 1 value of 85.25) and TBL (F β= 1 value of 87.45) up to a value of 90.90 in the best of our experiments.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Brants, T.: TnT. A statistical part-of-speech tagger. In: Proceedings of the 6th Applied NLP Conference (ANLP 2000), USA, pp. 224–231 (2000)
Breiman, L.: Bagging predictors. Machine Learning Journal 24, 123–140 (1996)
Brill, E.: Transformation-based Error-Driven Learning and Natural Language Processing: A Case Study in Part-of-Speech Tagging. Computational Linguistics 21, 543–565 (1995)
Carreras, X., Màrquez y, L., Padró, L.: Named Entity Extraction using AdaBoost. In: CoNLL 2002 Computational Natural Language Learning, Taiwan, pp. 167–170 (2002)
Civit, M.: Guía para la anotación morfosintáctica del corpus CLiC-TALP. X-TRACT Working Paper WP-00/06 (2000)
Florian, R., Ittycheriah, A., Jing, H., Zhang, T.: Named Entity Recognition through Classifier Combination. In: Proceedings of CoNLL 2003, Canada, pp. 168–171 (2003)
Halteren, v.H., Zavrel, J., Daelemans, W.: Improving accuracy in word class tagging through the combination of machine learning systems. Computational Linguistics 27, 199–230 (2001)
Henderson, J.C., Brill, E.: Exploiting diversity in natural language processing. Combining parsers. In: 1999 Joint Sigdat Conference on Empirical Methods in Natural Language Processing and Very Large Corpora. ACL, USA, pp. 187–194 (1999)
Pedersen, T.: A simple approach to building ensembles of naive bayesian classifiers for word sense disambiguation. In: Proceedings of NAACL 2000, USA, pp. 63–69 (2000)
Quinlan, J.R.: Induction of decision trees. Machine Learning 1, 81–106 (1986)
Rössler, M.: Using Markov Models for Named Entity recognition in German newspapers. In: Proceedings of the Workshop on Machine Learning Approaches in Computational Linguistics, Italy, pp. 29–37 (2002)
Tjong Kim Sang, E.F., Daelemans, W., Dejean, H., Koeling, R., Krymolowsky, Y., Punyakanok, V., Roth, D.: Applying system combination to base noun phrase identification. In: Proceedings of COLING 2000, Germany, pp. 857–863 (2000)
Sang, E.F.T.K.: Introduction to the CoNLL-2002 Shared Task: Language-Independent Named Entity Recognition. In: Proceedings of CoNLL 2002, Taiwan, pp. 155–158 (2002)
Witten, I.H., Frank, E.: Data Mining. In: Machine Learning Algorithms in Java, Morgan Kaufmann Publishers, San Francisco (2000)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2004 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Troyano, J.A., Carrillo, V., Enríquez, F., Galán, F.J. (2004). Named Entity Recognition Through Corpus Transformation and System Combination. In: Vicedo, J.L., Martínez-Barco, P., Muńoz, R., Saiz Noeda, M. (eds) Advances in Natural Language Processing. EsTAL 2004. Lecture Notes in Computer Science(), vol 3230. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-30228-5_23
Download citation
DOI: https://doi.org/10.1007/978-3-540-30228-5_23
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-23498-2
Online ISBN: 978-3-540-30228-5
eBook Packages: Springer Book Archive