Abstract
With the rapid growth of the available information on the Internet, it is more difficult for us to find the relevant information quickly on the Web. Named Entity Recognition (NER), one of the key techniques in some web information processing tools such as information retrieval and information extraction, has been paid more and more attention. In this paper we address the problem of Chinese NER using a hybrid-statistical model. This study is concentrated on entity names (personal names, location names and organization names), temporal expressions (dates and times) and number expressions. The method is characterized as follows: firstly, NER and Part-of-Speech tagging have been integrated into a unified framework; secondly, it combines Hidden Markov Model (HMM) with Maximum Entropy Model (MEM) by taking MEM as a sub-model invoked in Viterbi algorithm; thirdly, the Part-of-Speech information of the context has been used in MEM. The experiment shows that the hybrid-statistical model could achieve preferable results of Chinese NER, in which the F1 value ranges from 74% to 92% for all kinds of named entities on an open-test data.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsPreview
Unable to display preview. Download preview PDF.
References
Sundheim, B.M.: Named entity task definition, version 2.1. In: Proceedings of the Sixth Message Understanding Conference, pp. 319–332 (1995)
Sun, J., Gao, J., Zhang, L., Zhou, M., Huang, C.: Chinese Named Entity Identification Using Class-based Language Model1. In: Proceedings of COLING 2002 (2002)
Chen, H.H., Ding, Y.W., Tsai, S.C., Bian, G.W.: Description of the NTU System Used for MET2. In: Proceedings of 7th Message Understanding Conference (1998)
Black, W.J., Rinaldi, F., Mowatt, D.: Facile: Description of the NE System Used For MUC- 7. In: Proceedings of 7th Message Understanding Conference (1998)
Fukumoto, J., Shimohata, M., Masui, F., Sasaki, M.: Oki Electric Industry: Description of the Oki System as Used for MET-2. In: Proceedings of 7th Message Understanding Conference (1998)
GuoDong, Z., Jian, S.: Named Entity Recognition using an HMM-based Chunk Tagger. In: Proceedings of the 40th Annual Meeting of the ACL, Philadelphia, PA, pp. 473–480 (2002)
Adwait, R.: A simple introduction to maximum entropy models for natural language processing. Technical Report 97-08, Institute for Research in Cognitive Science, University of Pennsylvania (1997)
Sekine, S., Grishman, R., Shinou, H.: A decision tree method for finding and classifying names in Japanese texts. In: Proceedings of the Sixth Workshop on Very Large Corpora, Montreal, Canada (1998)
Brill, E.: Transform-based Error-Driven Learning and Natural Language Processing: A Case Study in Part-of-speech Tagging. Computational Linguistics 21(4), 543–565 (1995)
Collins, M.: Ranking Algorithms for Named-Entity Extraction: Boosting and the Voted Perceptron. In: Proceedings of the 40th Annual Meeting of the ACL, Philadelphia, July 2002, pp. 489–496 (2002)
Jansche, M.: Named Entity Extraction with Conditional Markov Models and Classifiers. In: The 6th Conference on Natural Language Learning (2002)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2005 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Zhang, X., Wang, T., Tang, J., Zhou, H., Chen, H. (2005). Chinese Named Entity Recognition with a Hybrid-Statistical Model. In: Zhang, Y., Tanaka, K., Yu, J.X., Wang, S., Li, M. (eds) Web Technologies Research and Development - APWeb 2005. APWeb 2005. Lecture Notes in Computer Science, vol 3399. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-31849-1_86
Download citation
DOI: https://doi.org/10.1007/978-3-540-31849-1_86
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-25207-8
Online ISBN: 978-3-540-31849-1
eBook Packages: Computer ScienceComputer Science (R0)