ABSTRACT
Word Identification has been an important and active issue in Chinese Natural Language Processing. In this paper, a new mechanism, based on the concept of sublanguage, is proposed for identifying unknown words, especially personal names, in Chinese newspapers. The proposed mechanism includes title-driven name recognition, adaptive dynamic word formation, identification of 2-character and 3-character Chinese names without title. We will show the experimental results for two corpora and compare them with the results by the NTHU's statistic-based system, the only system that we know has attacked the same problem. The experimental results have shown significant improvements over the WI systems without the name identification capability.
- ACCC. The Status and Progress of Chinese Language Processing Technology. Association for Common Chinese Code, International, Beijing, China, 1991.Google Scholar
- J.-S. Chang, S.-D. Chen, Y. Chen, J. S. Liu, and S.-J. Ker. A Multiple-corpus Approach to Identification of Chinese Surname-names. In Proceedings of Natural Language Processing Pacific Rim Symposium, pages 87--91, 1991.Google Scholar
- J.-S. Chang, C.-D. Chen and S.-D. Chang. Chinese word segmentation through constraint satisfaction and statistical optimization. In Proc. of ROCLING IV, pages 147--165, 1991.Google Scholar
- {see pdf for reference}Google Scholar
- C. K. Fan and W. H. Tsai. Automatic word identification in Chinese sentences by the relaxation technique. In Proc. of National Computer Symposium, pages 423--431, Taipei, Taiwan, 1987.Google Scholar
- R. Grishman and R. Kittredge, editors. Analyzing Language in Restricted Domains: Sublanguage Description and Processing. Lawrence Erlbaum Associates, Hillsdale, NJ, 1986.Google Scholar
- R. Kittredge and J. Lehrberger, editors. Sublanguage: Studies of language in restricted domains. Walter de Gruyter, Berlin, 1982.Google Scholar
- N. Liang. On the automatic segmentation of Chinese words and related theory. In Proc. of the 1987 International Conference on Chinese information processing, pages 454--459, Beijing, 1987.Google Scholar
- R. Sproat and C. Shih. A statistic method for finding word boundaries in Chinese text. Computer Processing of Chinese & Oriental Languages, 4(4):336--351, March, 1990.Google Scholar
- M. Tomita. Efficient Parsing for Natural Language. Kluwer Academic Publishers, 1986. Google ScholarDigital Library
- L.-J. Wang, T. Pei, W.-C. Li, and L.-C. Huang. A parsing method for identifying words in Mandarin Chinese. In Proc. of IJCAI-91, pages 1018--1023, 1991.Google Scholar
- C.-L. Yeh and H.-J. Lee. Unification-based word identification for Mandarin Chinese sentences. Proc. of 1988 ICCPCOL, pages 27--32, Toronto, Canada, 1988.Google Scholar
- Recognizing unregistered names for Mandarin word identification
Recommendations
Compound Noun Analysis for Process of Korean Unregistered Word
ICCIS '12: Proceedings of the 2012 Fourth International Conference on Computational and Information SciencesIn this paper, a new method of compound noun analysis is proposed. It uses decomposition model and unregistered words recognition. The latter contains loanword nouns, name nouns and place name. Loanword noun is recognized based on it's formed by ...
Recognizing names in biomedical texts: a machine learning approach
Motivation: With an overwhelming amount of textual information in molecular biology and biomedicine, there is a need for effective and efficient literature mining and knowledge discovery that can help biologists to gather and make use of the ...
Word identification for Mandarin Chinese sentences
COLING '92: Proceedings of the 14th conference on Computational linguistics - Volume 1Chinese sentences are composed with string of characters without blanks to mark words. However the basic unit for sentence parsing and understanding is word. Therefore the first step of processing Chinese sentences is to identify the words. The ...
Comments