Article

Free Access

Recognizing unregistered names for Mandarin word identification

Authors:
Liang-Jyh Wang

Industrial Technology Research Institute (ITRI), Hsinchu, Taiwan, R.O.C.

Industrial Technology Research Institute (ITRI), Hsinchu, Taiwan, R.O.C.
View Profile

,
Wei-Chuan Li

Industrial Technology Research Institute (ITRI), Hsinchu, Taiwan, R.O.C.

Industrial Technology Research Institute (ITRI), Hsinchu, Taiwan, R.O.C.
View Profile

,
Chao-Huang Chang

Industrial Technology Research Institute (ITRI), Hsinchu, Taiwan, R.O.C.

Industrial Technology Research Institute (ITRI), Hsinchu, Taiwan, R.O.C.
View Profile

COLING '92: Proceedings of the 14th conference on Computational linguistics - Volume 4August 1992Pages 1239–1243https://doi.org/10.3115/992424.992473

Published:23 August 1992Publication History

COLING '92: Proceedings of the 14th conference on Computational linguistics - Volume 4

Pages 1239–1243

ABSTRACT

Word Identification has been an important and active issue in Chinese Natural Language Processing. In this paper, a new mechanism, based on the concept of sublanguage, is proposed for identifying unknown words, especially personal names, in Chinese newspapers. The proposed mechanism includes title-driven name recognition, adaptive dynamic word formation, identification of 2-character and 3-character Chinese names without title. We will show the experimental results for two corpora and compare them with the results by the NTHU's statistic-based system, the only system that we know has attacked the same problem. The experimental results have shown significant improvements over the WI systems without the name identification capability.

References

ACCC. The Status and Progress of Chinese Language Processing Technology. Association for Common Chinese Code, International, Beijing, China, 1991.Google Scholar
J.-S. Chang, S.-D. Chen, Y. Chen, J. S. Liu, and S.-J. Ker. A Multiple-corpus Approach to Identification of Chinese Surname-names. In Proceedings of Natural Language Processing Pacific Rim Symposium, pages 87--91, 1991.Google Scholar
J.-S. Chang, C.-D. Chen and S.-D. Chang. Chinese word segmentation through constraint satisfaction and statistical optimization. In Proc. of ROCLING IV, pages 147--165, 1991.Google Scholar
{see pdf for reference}Google Scholar
C. K. Fan and W. H. Tsai. Automatic word identification in Chinese sentences by the relaxation technique. In Proc. of National Computer Symposium, pages 423--431, Taipei, Taiwan, 1987.Google Scholar
R. Grishman and R. Kittredge, editors. Analyzing Language in Restricted Domains: Sublanguage Description and Processing. Lawrence Erlbaum Associates, Hillsdale, NJ, 1986.Google Scholar
R. Kittredge and J. Lehrberger, editors. Sublanguage: Studies of language in restricted domains. Walter de Gruyter, Berlin, 1982.Google Scholar
N. Liang. On the automatic segmentation of Chinese words and related theory. In Proc. of the 1987 International Conference on Chinese information processing, pages 454--459, Beijing, 1987.Google Scholar
R. Sproat and C. Shih. A statistic method for finding word boundaries in Chinese text. Computer Processing of Chinese & Oriental Languages, 4(4):336--351, March, 1990.Google Scholar
M. Tomita. Efficient Parsing for Natural Language. Kluwer Academic Publishers, 1986. Google ScholarDigital Library
L.-J. Wang, T. Pei, W.-C. Li, and L.-C. Huang. A parsing method for identifying words in Mandarin Chinese. In Proc. of IJCAI-91, pages 1018--1023, 1991.Google Scholar
C.-L. Yeh and H.-J. Lee. Unification-based word identification for Mandarin Chinese sentences. Proc. of 1988 ICCPCOL, pages 27--32, Toronto, Canada, 1988.Google Scholar

Recognizing unregistered names for Mandarin word identification
1. Computing methodologies
  1. Artificial intelligence
2. Hardware
  1. Power and energy
    1. Power estimation and optimization

Recommendations

Compound Noun Analysis for Process of Korean Unregistered Word
ICCIS '12: Proceedings of the 2012 Fourth International Conference on Computational and Information Sciences

In this paper, a new method of compound noun analysis is proposed. It uses decomposition model and unregistered words recognition. The latter contains loanword nouns, name nouns and place name. Loanword noun is recognized based on it's formed by ...
Read More
Recognizing names in biomedical texts: a machine learning approach

Motivation: With an overwhelming amount of textual information in molecular biology and biomedicine, there is a need for effective and efficient literature mining and knowledge discovery that can help biologists to gather and make use of the ...
Read More
Word identification for Mandarin Chinese sentences
COLING '92: Proceedings of the 14th conference on Computational linguistics - Volume 1

Chinese sentences are composed with string of characters without blanks to mark words. However the basic unit for sentence parsing and understanding is word. Therefore the first step of processing Chinese sentences is to identify the words. The ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
COLING '92: Proceedings of the 14th conference on Computational linguistics - Volume 4
August 1992
243 pages
Program Chair:
Antonio Zampolli
Pisa
Sponsors
In-Cooperation
Publisher
Association for Computational Linguistics
United States
Publication History
- Published: 23 August 1992
Qualifiers
- Article
Conference

Acceptance Rates
Overall Acceptance Rate1,537of1,537submissions,100%
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 6
  Total Citations
  View Citations
- 238
  Total Downloads
- Downloads (Last 12 months)15
- Downloads (Last 6 weeks)3
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Recognizing unregistered names for Mandarin word identification

COLING '92: Proceedings of the 14th conference on Computational linguistics - Volume 4

ABSTRACT

References

Cited By

Recommendations

Compound Noun Analysis for Process of Korean Unregistered Word

Recognizing names in biomedical texts: a machine learning approach

Word identification for Mandarin Chinese sentences

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Qualifiers

Conference

Acceptance Rates

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

Recognizing unregistered names for Mandarin word identification

COLING '92: Proceedings of the 14th conference on Computational linguistics - Volume 4

ABSTRACT

References

Cited By

Recommendations

Compound Noun Analysis for Process of Korean Unregistered Word

Recognizing names in biomedical texts: a machine learning approach

Word identification for Mandarin Chinese sentences

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Qualifiers

Conference

Acceptance Rates

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media