skip to main content
10.3115/992424.992473dlproceedingsArticle/Chapter ViewAbstractPublication PagescolingConference Proceedingsconference-collections
Article
Free Access

Recognizing unregistered names for Mandarin word identification

Authors Info & Claims
Published:23 August 1992Publication History

ABSTRACT

Word Identification has been an important and active issue in Chinese Natural Language Processing. In this paper, a new mechanism, based on the concept of sublanguage, is proposed for identifying unknown words, especially personal names, in Chinese newspapers. The proposed mechanism includes title-driven name recognition, adaptive dynamic word formation, identification of 2-character and 3-character Chinese names without title. We will show the experimental results for two corpora and compare them with the results by the NTHU's statistic-based system, the only system that we know has attacked the same problem. The experimental results have shown significant improvements over the WI systems without the name identification capability.

References

  1. ACCC. The Status and Progress of Chinese Language Processing Technology. Association for Common Chinese Code, International, Beijing, China, 1991.Google ScholarGoogle Scholar
  2. J.-S. Chang, S.-D. Chen, Y. Chen, J. S. Liu, and S.-J. Ker. A Multiple-corpus Approach to Identification of Chinese Surname-names. In Proceedings of Natural Language Processing Pacific Rim Symposium, pages 87--91, 1991.Google ScholarGoogle Scholar
  3. J.-S. Chang, C.-D. Chen and S.-D. Chang. Chinese word segmentation through constraint satisfaction and statistical optimization. In Proc. of ROCLING IV, pages 147--165, 1991.Google ScholarGoogle Scholar
  4. {see pdf for reference}Google ScholarGoogle Scholar
  5. C. K. Fan and W. H. Tsai. Automatic word identification in Chinese sentences by the relaxation technique. In Proc. of National Computer Symposium, pages 423--431, Taipei, Taiwan, 1987.Google ScholarGoogle Scholar
  6. R. Grishman and R. Kittredge, editors. Analyzing Language in Restricted Domains: Sublanguage Description and Processing. Lawrence Erlbaum Associates, Hillsdale, NJ, 1986.Google ScholarGoogle Scholar
  7. R. Kittredge and J. Lehrberger, editors. Sublanguage: Studies of language in restricted domains. Walter de Gruyter, Berlin, 1982.Google ScholarGoogle Scholar
  8. N. Liang. On the automatic segmentation of Chinese words and related theory. In Proc. of the 1987 International Conference on Chinese information processing, pages 454--459, Beijing, 1987.Google ScholarGoogle Scholar
  9. R. Sproat and C. Shih. A statistic method for finding word boundaries in Chinese text. Computer Processing of Chinese & Oriental Languages, 4(4):336--351, March, 1990.Google ScholarGoogle Scholar
  10. M. Tomita. Efficient Parsing for Natural Language. Kluwer Academic Publishers, 1986. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. L.-J. Wang, T. Pei, W.-C. Li, and L.-C. Huang. A parsing method for identifying words in Mandarin Chinese. In Proc. of IJCAI-91, pages 1018--1023, 1991.Google ScholarGoogle Scholar
  12. C.-L. Yeh and H.-J. Lee. Unification-based word identification for Mandarin Chinese sentences. Proc. of 1988 ICCPCOL, pages 27--32, Toronto, Canada, 1988.Google ScholarGoogle Scholar
  1. Recognizing unregistered names for Mandarin word identification

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in
      • Published in

        cover image DL Hosted proceedings
        COLING '92: Proceedings of the 14th conference on Computational linguistics - Volume 4
        August 1992
        243 pages

        Publisher

        Association for Computational Linguistics

        United States

        Publication History

        • Published: 23 August 1992

        Qualifiers

        • Article

        Acceptance Rates

        Overall Acceptance Rate1,537of1,537submissions,100%

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader